-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace libsnappy with cramjam #130
Conversation
cc @milesgranger , a long time coming! |
|
||
|
||
class UncompressError(Exception): | ||
pass | ||
|
||
|
||
py3k = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
py2 stuff is for pypy only - maybe that should be dropped too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking a followup PR for current Python versions (and dropping Python2). I actually started with #129
Looks like the pypy tests need an explicit cast from cramjam buffer to bytes (i.e., |
@martindurant I had that earlier and it broke in a different way. let me restore |
https://github.com/andrix/python-snappy/actions/runs/7787155361/job/21233634917?pr=130
https://github.com/andrix/python-snappy/actions/runs/7787155357/job/21233634312?pr=130
|
For the sdist (and actually any real install), will need to actually add cramjam to required deps in https://github.com/andrix/python-snappy/blob/master/setup.py#L66 |
For pypy, we might need @milesgranger 's help, perhaps the errors ring a bell. Is it possible we are confusing framed Vs raw? |
https://github.com/andrix/python-snappy/actions/runs/7787502576/job/21234796426?pr=130
|
Ah, that's a bit unfortunate w/ the buffer comparisons not working. Can just cast by I'll try to look more deeply at the other errors. But the De/Compressor classes work on framed snappy format FWIW. |
_________________ SnappyCompressionTest.test_uncompress_error __________________
self = <test_snappy.SnappyCompressionTest testMethod=test_uncompress_error>
def test_uncompress_error(self):
> self.assertRaises(snappy.UncompressError, snappy.uncompress,
"hoa".encode('utf-8'))
test_snappy.py:70: This is just a matter of catching cramjam's error and re-raising as python-snappy's error, no? def test_compression(self):
# test that we can add compressed chunks
compressor = snappy.StreamCompressor()
data = b"\0" * 50
compressed_data = snappy.compress(data)
crc = struct.pack("<L", snappy.snappy._masked_crc32c(data))
self.assertEqual(crc, b"\x8f)H\xbd")
> self.assertEqual(len(compressed_data), 6)
E AssertionError: 24 != 6 This is compressing with framed and comparing against what snappy raw would output (checked, this will be 6 w/ raw format) And ya, I think the other two are raw vs framed comparison errors. |
thank you @milesgranger I added bytes cast and exception wrapping would you have guidance on correcting "framed vs. raw" errors? |
You bet! python-snappy's That ends up looking like this: In [1]: import cramjam
In [2]: import snappy
In [3]: data = b"\0" * 50
In [4]: snappy.compress(data)
Out[4]: b'2\x00\x00\xc2\x01\x00'
In [5]: bytes(cramjam.snappy.compress(data))
Out[5]: b'\xff\x06\x00\x00sNaPpY\x00\n\x00\x00\x8f)H\xbd2\x00\x00\xc2\x01\x00'
In [6]: bytes(cramjam.snappy.compress_raw(data))
Out[6]: b'2\x00\x00\xc2\x01\x00' So it appears, if python-snappy is to keep it's current API, then behind the covers, For the De/Compressor classes, I'm less sure how to maintain the current API. I think I took inspiration from lz4 or something, can't really remember. But effectively cramjam's all have a In [1]: import snappy
In [2]: import cramjam
In [3]: data = b"\0" * 50
In [4]: compressor = snappy.StreamCompressor()
In [5]: compressor.compress(data)
Out[5]: b'\xff\x06\x00\x00sNaPpY\x00\n\x00\x00\x8f)H\xbd2\x00\x00\xc2\x01\x00'
In [6]: compressor.compress(data)
Out[6]: b'\x00\n\x00\x00\x8f)H\xbd2\x00\x00\xc2\x01\x00'
In [7]: c = cramjam.snappy.Compressor()
In [8]: c.compress(data)
Out[8]: 50
In [9]: c.compress(data)
Out[9]: 50
In [10]: bytes(c.finish())
Out[10]: b'\xff\x06\x00\x00sNaPpY\x00\r\x00\x00o\xfan\xe2d\x00\x00\xfe\x01\x00\x8a\x01\x00'
In [11]: python_snappy = b'\xff\x06\x00\x00sNaPpY\x00\n\x00\x00\x8f)H\xbd2\x00\x00\xc2\x01\x00' + b'\x00\n\x00
...: \x00\x8f)H\xbd2\x00\x00\xc2\x01\x00'
In [12]: cramjam.snappy.decompress(python_snappy)
Out[12]: cramjam.Buffer<len=100> You can see cramjam's is different, but still able to decompress python-snappy's so it seems okay just adding more padding it seems around each chunk and then it's more on the user to concatenate the total stream. Whereas with cramjam it's a bit more "managed". Also I note that So, that's my understanding. If you wanted to maintain the current API that might be as easy (hacky?) as having the first call to In [1]: import cramjam
In [2]: import snappy
In [3]: data = b"\0" * 50
In [4]: compressor = snappy.StreamCompressor()
In [5]: compressor.compress(data)
Out[5]: b'\xff\x06\x00\x00sNaPpY\x00\n\x00\x00\x8f)H\xbd2\x00\x00\xc2\x01\x00'
In [6]: compressor.compress(data)
Out[6]: b'\x00\n\x00\x00\x8f)H\xbd2\x00\x00\xc2\x01\x00'
In [7]: bytes(cramjam.snappy.compress(data))
Out[7]: b'\xff\x06\x00\x00sNaPpY\x00\n\x00\x00\x8f)H\xbd2\x00\x00\xc2\x01\x00'
In [8]: bytes(cramjam.snappy.compress(data))[10:]
Out[8]: b'\x00\n\x00\x00\x8f)H\xbd2\x00\x00\xc2\x01\x00' As a separate note, the first suggestion to cast |
Signed-off-by: Gábor Lipták <[email protected]>
seems to be new
|
thank you @milesgranger I updated with @martindurant do you see current tests providing strong coverage for PS also dropped |
well apparently not! I am not surprised, and I am prepared to provide only "best effort" here; it would be best if docstrings are updated to say so, however. Were you thinking to drop them altogether? |
I wasn't thinking of removing functionality. https://github.com/andrix/python-snappy/blob/master/test_snappy.py#L119-L259 seems to be exercising stream functionality to good extent with please let me know how you would like to proceed @martindurant (and provide "best effort" docstrings if appropriate) |
StreamCompressor.compress is an alias for .add_chunk , so that's fine. |
I believe StreamDe/Compressor could be replaced by cramjam.snappy.De/Compressor. The current code is doing the same framing as before, but calling cramjam's _raw codecs internally, so the exact locations of the chunk boundaries might change, but it produces valid output. In other words, since there are after all some tests, I think all is well! Any remaining reservations preventing merge? |
thank you @martindurant no reservations from my side |
the semantics need to be lined up
#129 #124