Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow memmapped arrays to keep the backing mmap file open #1668

Merged
merged 4 commits into from
Dec 18, 2023

Conversation

braingram
Copy link
Contributor

@braingram braingram commented Oct 26, 2023

asdf uses a mmap to memory map files to produce np.memmap arrays (when copy_arrays is True, the current default). np.memmap does not check if the backing mmap is closed before accessing array data and asdf closes the mmap when the asdf file is closed. This results in segfaults (see issue #1334).

This PR changes when happens to the mmap when the asdf file is closed. asdf will no longer force closure of the mmap. This will result in an open file pointer if any array still references the mmap (the mmap will likely be closed by the garbage collector when all arrays are collected which matches the behavior of np.memmap which does not have an API for closing the underlying mmap).

Windows does not allow a file to be trunacated (as can occur during update) if it is open more than once. The PR adds an os check and closes the mmap on windows which (with these changes) passes all tests:
https://github.com/asdf-format/asdf/actions/runs/6657397904/job/18092170804?pr=1668
including the regression test added with this PR that segfaults on linux and mac.

The failure of the build job is unrelated and addressed in: #1670
The python 3.12 failure is unrelated (due to aiohttp having no released version that supports 3.12).

The stdatamodels failure should be addressed prior to this PR being merged. The failures are all related to the change in the number of open files (which stdatamodels has specific tests for). spacetelescope/stdatamodels#232 addresses these tests. EDIT: spacetelescope/stdatamodels#232 is merged

Fixes #1334
Fixes #1530

@braingram braingram requested a review from a team as a code owner October 26, 2023 15:12
@braingram braingram marked this pull request as draft October 26, 2023 15:12
@braingram braingram changed the title Memmap open Allow memmapped arrays to keep the backing mmap file open Oct 26, 2023
@braingram braingram force-pushed the memmap_open branch 3 times, most recently from 303c6c9 to 3fa2ea2 Compare October 30, 2023 14:56
@braingram braingram marked this pull request as ready for review October 30, 2023 14:56
@braingram braingram marked this pull request as draft October 30, 2023 14:56
@braingram braingram marked this pull request as ready for review October 30, 2023 17:45
Copy link
Member

@zacharyburnett zacharyburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (though I'm not familiar with these inner workings). Tests pass

if hasattr(self, "_mmap") and not self._mmap.closed:
self._mmap.close()
if hasattr(self, "_mmap"):
# we no longer close the _mmap here. This does mean that views of arrays
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we got the better end of that trade.

if fd is not None:
if getattr(fd, "_mmap", None) is not base.base:
self._array = None
del fd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question for my own educational purposes, what does explicitly deleting the fd variable accomplish?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The _fd attr is a weakef so I think I added this mostly out of habit. The explicit del of fd isn't needed here as fd isn't used or stored (so the reference will be released at the end of the function).

@braingram
Copy link
Contributor Author

braingram commented Dec 5, 2023

Thanks for the reviews! Marking this as draft until spacetelescope/stdatamodels#232 can be merged then this can be brought out of draft and merged. EDIT: the stdatamodels PR is merged

@braingram braingram marked this pull request as draft December 5, 2023 20:30
@braingram braingram marked this pull request as ready for review December 7, 2023 13:38
@braingram
Copy link
Contributor Author

braingram commented Dec 7, 2023

spacetelescope/stdatamodels#232 is merged. I reopened this, will resolve the conflicts, and run regtests (I believe both jwst and roman have mostly if not entirely disabled memory mapping). EDIT: regtest ran without any new issues (see below)

@braingram
Copy link
Contributor Author

braingram commented Dec 7, 2023

Roman regtests ran with no errors:
https://plwishmaster.stsci.edu:8081/job/RT/job/Roman-Developers-Pull-Requests/505/
JWST regtests ran:
https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FJWST-Developers-Pull-Requests/detail/JWST-Developers-Pull-Requests/1092/tests
These showed 25 errors, 22 appear without this PR and the 3 different ones are:

  • 1 and 2 the nirspec_missing_msa errors that pop up randomly
  • test_duplicate_names which failed with AssertionError: assert 'Following associations have the same product name but significant differences' in ''

I re-ran the test_duplicate_names test and it passed with this PR: https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FJWST-Developers-Pull-Requests/detail/JWST-Developers-Pull-Requests/1094/pipeline/199

At this point I'm claiming the regtests all 'pass' with this PR and any failures are unrelated.

asdf uses a mmap to memory map files to produce np.memmap
arrays (when copy_arrays is True, the current default).
np.memmap does not check if the backing mmap is closed before
accessing array data and asdf closes the mmap when the asdf
file is closed. This results in segfaults (see issue 1334).

This commit changes when happens to the mmap when the asdf
file is closed. asdf will no longer force closure of the mmap.
This will result in an open file pointer if any array still
references the mmap (the mmap will likely be closed by the
garbage collector when all arrays are collected which matches
the behavior of np.memmap).
@braingram braingram merged commit 8078076 into asdf-format:main Dec 18, 2023
45 checks passed
@braingram braingram deleted the memmap_open branch December 18, 2023 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants