Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a memory leak when reading wcs that grew memory to over 10 Gb #200

Closed
wants to merge 3 commits into from
Closed

Fixed a memory leak when reading wcs that grew memory to over 10 Gb #200

wants to merge 3 commits into from

Conversation

bernie-simon
Copy link
Contributor

User reported a that when asdf was used to read an image with multiple
spectra, each with its own wcs, memory use grew until it crashed the
program. The error was traced to the code which read the binary wcs
coefficients stored in the image. That code opened a pseudo file
descriptor which did a seek on the underlying image and called the
numpy frombuffer method to read the coefficients. Each time this
operation was performed memory use grew by 6 Mb, and it was done
almost 1500 times. I believe the cause of the problem is due to an
optimization in the numpy code that allows a slice of an array to use
the same data as the underlying array. The pseudo file descriptor
allocates a large buffer for the image data but never returns it, as
the slice keeps the reference count of the underlying buffer from
going to zero, even though the slice is typically much smaller than
the buffer it is extracted from. The fix replaces the call to
numpy.frombuffer with a call to array.array, which performs the same,
but lacks the optimization in numpy which causes problems in this
case. A test shows the change eliminates the memory leak mentioned,
although there may still be further memory leaks and speed
optimizations in the asdf code.

User reported a that when asdf was used to read an image with multiple
spectra, each with its own wcs, memory use grew until it crashed the
program. The error was traced to the code which read the binary wcs
coeeficients storedin the image. That code opened a pseudo file
descriptor which did a seek on the underlying image and called the
numpy frombuffer method to read the coefficients. Each time this
operation was performed memory use grew by 6 Mb, and it was done
almost 1500 times. I believe the cause of the problem is due to an
uptimization in the numpy code that allows a slice of an array to use
the same data as the underlying array. The pseudo file descriptor
allocates a large buffer for the image data but never returns it, as
the slice keeps the reference count of the underlying buffer from
going to zero, even though the slice is typically much smaller than
the buffer it is extracted from. The fix replaces the call to
numpy.frombuffer with a call to array.array, which performs the same,
but lacks the optimization in numpy which causes problems in this
case. A test shows the change eliminates the memory leak mentioned,
although there may still be further memory leaks and speed
optimizations in the asdf code.
@bernie-simon
Copy link
Contributor Author

Good job, Bernie! That sounds like it would have been a bear to find.

It was difficult, at least for me. For anyone who might need to look for memory leaks in the future, I did not find Pympler, a python package that was designed to track memory leaks, very helpful, mostly because it takes so long to run. The way I located the error was to call

resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

which returns the amount of memory used by the process and use binary search to track down the spot where memory usage grew dramatically.

@nden
Copy link
Contributor

nden commented Jun 14, 2016

@bernie-simon Thanks for finding the problem! I looked at it for some time and did not figure it out.

I just tested your solution and can confirm that it works for me. However the tests fail. I don't understand the code enough to tell whether the failure is significant or whether we can change the tests.

Perhaps an alternative solution to the problem would be to always make a copy of the array (using the original code). Slices in numpy always return a view in the array not a copy. Is there a reason why we should not use a copy?

@bernie-simon
Copy link
Contributor Author

On Jun 14, 2016, at 10:54 AM, Nadia Dencheva [email protected] wrote:

@bernie-simon Thanks for finding the problem! I looked at it for some time and did not figure it out.

I just tested your solution and can confirm that it works for me. However the tests fail. I don't understand the code enough to tell whether the failure is significant or whether we can change the tests.

Perhaps an alternative solution to the problem would be to always make a copy of the array (using the original code). Slices in numpy always return a view in the array not a copy. Is there a reason why we should not use a copy?

Which tests failed? I will look at the alternate solution and see if it works.

@nden
Copy link
Contributor

nden commented Jun 14, 2016

The asdf generic_io tests failed.

While my previous fix for the memory leak problem fixed that issue,
other code in generic_io.py and its tests expects that the returned
object be an ndarray. Since using the frombuffer call followed by a
copy to break the reference achieves the same result as using array,
I've replaced my pervious change to MemryIO with this new change.
@bernie-simon
Copy link
Contributor Author

Because of the failing tests, I'm removing this pull request in place of a new one, coming shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants