Fixed a memory leak when reading wcs that grew memory to over 10 Gb #200

bernie-simon · 2016-06-06T19:22:35Z

User reported a that when asdf was used to read an image with multiple
spectra, each with its own wcs, memory use grew until it crashed the
program. The error was traced to the code which read the binary wcs
coefficients stored in the image. That code opened a pseudo file
descriptor which did a seek on the underlying image and called the
numpy frombuffer method to read the coefficients. Each time this
operation was performed memory use grew by 6 Mb, and it was done
almost 1500 times. I believe the cause of the problem is due to an
optimization in the numpy code that allows a slice of an array to use
the same data as the underlying array. The pseudo file descriptor
allocates a large buffer for the image data but never returns it, as
the slice keeps the reference count of the underlying buffer from
going to zero, even though the slice is typically much smaller than
the buffer it is extracted from. The fix replaces the call to
numpy.frombuffer with a call to array.array, which performs the same,
but lacks the optimization in numpy which causes problems in this
case. A test shows the change eliminates the memory leak mentioned,
although there may still be further memory leaks and speed
optimizations in the asdf code.

User reported a that when asdf was used to read an image with multiple spectra, each with its own wcs, memory use grew until it crashed the program. The error was traced to the code which read the binary wcs coeeficients storedin the image. That code opened a pseudo file descriptor which did a seek on the underlying image and called the numpy frombuffer method to read the coefficients. Each time this operation was performed memory use grew by 6 Mb, and it was done almost 1500 times. I believe the cause of the problem is due to an uptimization in the numpy code that allows a slice of an array to use the same data as the underlying array. The pseudo file descriptor allocates a large buffer for the image data but never returns it, as the slice keeps the reference count of the underlying buffer from going to zero, even though the slice is typically much smaller than the buffer it is extracted from. The fix replaces the call to numpy.frombuffer with a call to array.array, which performs the same, but lacks the optimization in numpy which causes problems in this case. A test shows the change eliminates the memory leak mentioned, although there may still be further memory leaks and speed optimizations in the asdf code.

bernie-simon · 2016-06-06T19:40:31Z

Good job, Bernie! That sounds like it would have been a bear to find.

It was difficult, at least for me. For anyone who might need to look for memory leaks in the future, I did not find Pympler, a python package that was designed to track memory leaks, very helpful, mostly because it takes so long to run. The way I located the error was to call

resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

which returns the amount of memory used by the process and use binary search to track down the spot where memory usage grew dramatically.

nden · 2016-06-14T14:54:24Z

@bernie-simon Thanks for finding the problem! I looked at it for some time and did not figure it out.

I just tested your solution and can confirm that it works for me. However the tests fail. I don't understand the code enough to tell whether the failure is significant or whether we can change the tests.

Perhaps an alternative solution to the problem would be to always make a copy of the array (using the original code). Slices in numpy always return a view in the array not a copy. Is there a reason why we should not use a copy?

bernie-simon · 2016-06-14T15:02:08Z

On Jun 14, 2016, at 10:54 AM, Nadia Dencheva [email protected] wrote:

@bernie-simon Thanks for finding the problem! I looked at it for some time and did not figure it out.

I just tested your solution and can confirm that it works for me. However the tests fail. I don't understand the code enough to tell whether the failure is significant or whether we can change the tests.

Perhaps an alternative solution to the problem would be to always make a copy of the array (using the original code). Slices in numpy always return a view in the array not a copy. Is there a reason why we should not use a copy?

Which tests failed? I will look at the alternate solution and see if it works.

nden · 2016-06-14T15:09:45Z

The asdf generic_io tests failed.

While my previous fix for the memory leak problem fixed that issue, other code in generic_io.py and its tests expects that the returned object be an ndarray. Since using the frombuffer call followed by a copy to break the reference achieves the same result as using array, I've replaced my pervious change to MemryIO with this new change.

bernie-simon · 2016-06-16T17:16:51Z

Because of the failing tests, I'm removing this pull request in place of a new one, coming shortly.

bernie-simon added 2 commits June 16, 2016 12:08

Change expected result to match the actual result, which is not in error

f9735b6

bernie-simon closed this Jun 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed a memory leak when reading wcs that grew memory to over 10 Gb #200

Fixed a memory leak when reading wcs that grew memory to over 10 Gb #200

bernie-simon commented Jun 6, 2016

bernie-simon commented Jun 6, 2016

nden commented Jun 14, 2016

bernie-simon commented Jun 14, 2016

nden commented Jun 14, 2016

bernie-simon commented Jun 16, 2016

Fixed a memory leak when reading wcs that grew memory to over 10 Gb #200

Fixed a memory leak when reading wcs that grew memory to over 10 Gb #200

Conversation

bernie-simon commented Jun 6, 2016

bernie-simon commented Jun 6, 2016

nden commented Jun 14, 2016

bernie-simon commented Jun 14, 2016

nden commented Jun 14, 2016

bernie-simon commented Jun 16, 2016