Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak when decompressing with copy_stream on 0.8.1 #35

Closed
jessehersch opened this issue Jan 3, 2018 · 3 comments
Closed

memory leak when decompressing with copy_stream on 0.8.1 #35

jessehersch opened this issue Jan 3, 2018 · 3 comments

Comments

@jessehersch
Copy link

env: linux
version: 0.8.1 (this is what's currently on pypi as of Jan 3 2018: https://pypi.python.org/pypi/zstandard)
python: 3.6.1

Here's a repro:

import os
import gc
import io
import zstd
import tempfile
import resource
import subprocess


def main():
    with tempfile.NamedTemporaryFile('wb') as compressed:
        uncompressed = os.urandom(1024)
        compressed.write(zstd.ZstdCompressor().compress(uncompressed))
        compressed.flush()
        
        print('using the zstd python bindings leaks')
        for i in range(10001):
            decompressed = io.BytesIO()
            with open(compressed.name, 'rb') as file:
                zstd.ZstdDecompressor().copy_stream(file, decompressed)
            decompressed.seek(0)
            result = decompressed.read()
            assert result == uncompressed
            del result, decompressed
            if i % 1000 == 0:
                print(i, resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
                gc.collect()

        print('workaround is to launch zstd as subprocess and skip the python bindings :(')
        for i in range(10001):
            with subprocess.Popen(['zstd', '-dcq', compressed.name], stdout=subprocess.PIPE, stderr=subprocess.PIPE) as p:
                stdout, stderr = p.communicate()
                p.wait()
                assert p.returncode == 0
                assert stdout == uncompressed
                del stdout, stderr
                if i % 1000 == 0:
                    print(i, resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
                    gc.collect()


if __name__ == '__main__':
    main()

Here's the output:

$ python3 main.py
using the zstd python bindings leaks
0 13000
1000 14020
2000 15340
3000 16396
4000 17452
5000 18508
6000 19564
7000 20884
8000 21940
9000 22996
10000 24052
workaround is to launch zstd as subprocess and skip the python bindings :(
0 24224
1000 24224
2000 24224
3000 24224
4000 24224
5000 24224
6000 24224
7000 24224
8000 24224
9000 24224
10000 24224
@indygreg
Copy link
Owner

indygreg commented Jan 8, 2018

I haven't verified the memory leak, but another workaround is to use one of the other available methods for decompressing. e.g.

dctx = zstd.ZstdDecompressor()
decompressed = io.BytesIO()
for chunk in dctx.read_to_iter():
    decompressed.write(chunk)

I'll look into this further for the 0.9 release. Thanks for the bug report!

@jessehersch
Copy link
Author

I had thought read_to_iter() wasn't in 0.8.1

@indygreg
Copy link
Owner

indygreg commented Jan 8, 2018

Sorry - read_from() is the equivalent method in 0.8. read_to_iter() will be introduced in 0.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants