-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup old compression workarounds #6259
Conversation
These have been around since python-lz4's 0.23.1 release, which came out in February 2018. Given these have been around for a while, it seems reasonable to just require them and drop the workarounds.
In addition to including the `lz4.block` API, the python-lz4 0.23.1 release included proper support for the Python Buffer Protocol. So if we are going to require the new API, it seems reasonable to drop these workarounds as well.
Raise an `ImportError` if python-lz4 is too old. Since this is in a `suppress` block, the `ImportError` will be captured, but the effect will be python-lz4 won't be used in this case.
As of python-snappy 0.5.3 (released July 2018), any Python Buffer Protocol object (like `memoryview`s) is supported. Given how long this release has been out, it seems reasonable to drop the prior workaround. Should avoid unneeded copying for other objects (like `memoryview`s).
0da4a61
to
b7041b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided a bit more context about when things were fixed and why the workarounds can be dropped below
def _fixed_snappy_decompress(data): | ||
# snappy.decompress() doesn't accept memoryviews | ||
if isinstance(data, (memoryview, bytearray)): | ||
data = bytes(data) | ||
return snappy.decompress(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support for the Python Buffer Protocol was added in 0.5.3 with PR ( intake/python-snappy#72 ). This version was released Jul 2018 so should be old enough to rely upon. With Python Buffer Protocol support, there is no longer a need to copy too bytes
here and we can just hand data
to snappy.decompress
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
try: | ||
snappy.compress(memoryview(b"")) | ||
except TypeError: | ||
raise ImportError("Need snappy >= 0.5.3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recently snappy.__version__
was added with PR ( intake/python-snappy#119 ). Though it is not currently in a release and hasn't been around long enough to rely upon.
So just do a simple test that will fail without python-snappy
version 0.5.3
or later. If python-snappy
is not new enough (unlikely given how long this release has been out), simply skip using python-snappy
. Test results on 0.5.2
& 0.5.3
below:
python-snappy 0.5.2:
In [1]: import snappy
In [2]: snappy.compress(memoryview(b""))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-077d330e1c70> in <module>
----> 1 snappy.compress(memoryview(b""))
~/miniconda/envs/snap52/lib/python3.6/site-packages/snappy/snappy.py in compress(data, encoding)
82 data = data.encode(encoding)
83
---> 84 return _compress(data)
85
86 def uncompress(data, decoding=None):
TypeError: argument 1 must be read-only bytes-like object, not memoryview
python-snappy 0.5.3:
In [1]: import snappy
In [2]: snappy.compress(memoryview(b""))
Out[2]: b'\x00'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
(do we want to use cramjam for all of the de/compressors??)
try: | ||
# try using the new lz4 API | ||
import lz4.block | ||
|
||
lz4_compress = lz4.block.compress | ||
lz4_decompress = lz4.block.decompress | ||
except ImportError: | ||
# fall back to old one | ||
lz4_compress = lz4.LZ4_compress | ||
lz4_decompress = lz4.LZ4_uncompress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lz4.block
APIs were added in commit ( python-lz4/python-lz4@bb2c7ea ) and have been around since 0.23.1, which was released Feb 2018. So should be old enough to rely upon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
# helper to bypass missing memoryview support in current lz4 | ||
# (fixed in later versions) | ||
|
||
def _fixed_lz4_compress(data): | ||
try: | ||
return lz4_compress(data) | ||
except TypeError: | ||
if isinstance(data, (memoryview, bytearray)): | ||
return lz4_compress(bytes(data)) | ||
else: | ||
raise | ||
|
||
def _fixed_lz4_decompress(data): | ||
try: | ||
return lz4_decompress(data) | ||
except (ValueError, TypeError): | ||
if isinstance(data, (memoryview, bytearray)): | ||
return lz4_decompress(bytes(data)) | ||
else: | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also Python Buffer Protocol support was added in PR ( python-lz4/python-lz4#38 ), which was also included in 0.23.1. So drop these workarounds as these objects should work without copying to bytes
first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
# Required to use `lz4.block` APIs and Python Buffer Protocol support. | ||
if parse_version(lz4.__version__) < parse_version("0.23.1"): | ||
raise ImportError("Need lz4 >= 0.23.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is a bit more explicit, but this arguably would happen by just trying the import
line below.
Ideally we would test `__version__` however not in a release atm (though it was added recently). In any even it can't reliably be checked for current versions. Instead just test some fast code that reliably fails for versions prior to `0.5.3`. If the code fails, simply raise an `ImportError` noting this requirement. As this is in a `suppress` block, the `ImportError` will be caught. Though the net effect will be disabling python-snappy use if it is too old.
These should already be satisfied when installed. Though this makes it clearer that these are the required versions (particularly for any users poking around trying to understand things).
2ec2fbf
to
201410e
Compare
201410e
to
d2a18ad
Compare
In `zstandard`'s `0.9.0` release, support for the Python Buffer Protocol was added. As we are supporting this functionality with other compressors, go ahead and support it here too.
# Required for Python Buffer Protocol support. | ||
if parse_version(zstandard.__version__) < parse_version("0.9.0"): | ||
raise ImportError("Need zstandard >= 0.9.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the other compressors already support the Python Buffer Protocol, make sure we support it here too. This was added in a series of commits referenced in issue ( indygreg/python-zstandard#26 ) that were included in 0.9.0, which came out Apr 2018. Thus has been around as long as the other releases being required here. So seems like a reasonable minimum.
@dask/maintenance , thoughts? 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good
Repeating comment: any thoughts on using cramjam instead of all these specific packages?
def _fixed_snappy_decompress(data): | ||
# snappy.decompress() doesn't accept memoryviews | ||
if isinstance(data, (memoryview, bytearray)): | ||
data = bytes(data) | ||
return snappy.decompress(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
try: | ||
snappy.compress(memoryview(b"")) | ||
except TypeError: | ||
raise ImportError("Need snappy >= 0.5.3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
(do we want to use cramjam for all of the de/compressors??)
try: | ||
# try using the new lz4 API | ||
import lz4.block | ||
|
||
lz4_compress = lz4.block.compress | ||
lz4_decompress = lz4.block.decompress | ||
except ImportError: | ||
# fall back to old one | ||
lz4_compress = lz4.LZ4_compress | ||
lz4_decompress = lz4.LZ4_uncompress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
# helper to bypass missing memoryview support in current lz4 | ||
# (fixed in later versions) | ||
|
||
def _fixed_lz4_compress(data): | ||
try: | ||
return lz4_compress(data) | ||
except TypeError: | ||
if isinstance(data, (memoryview, bytearray)): | ||
return lz4_compress(bytes(data)) | ||
else: | ||
raise | ||
|
||
def _fixed_lz4_decompress(data): | ||
try: | ||
return lz4_decompress(data) | ||
except (ValueError, TypeError): | ||
if isinstance(data, (memoryview, bytearray)): | ||
return lz4_decompress(bytes(data)) | ||
else: | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Thanks Martin! 😄
No strong feelings. Mostly was looking to simplify the existing code. Maybe we can raise this idea as a new issue and see if others agree? If so, guessing it is simple to swap in. Edit: Raised issue ( #6265 ) |
Planning on merging EOD tomorrow if no comments |
With these changes in place, cleaning up some unneeded copies in the compression code paths with PR ( #6273 ). |
Various workarounds have accumulated in
distributed.protocol.compression
to handle issues with prior versions of different libraries. However these libraries have since fixed these various issues and made several releases since. Given this, clean out some of these old workarounds. This should simplify the code here (and avoid some no longer needed copies). Also make sure to clarify version expectations and provide context.pre-commit run --all-files