-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup old compression workarounds #6259
Changes from all commits
26c1243
c130510
00020a4
31ee08a
3147bfd
0b263f6
d2a18ad
3aa8e2f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,7 @@ | |
from contextlib import suppress | ||
from typing import Literal | ||
|
||
from packaging.version import parse as parse_version | ||
from tlz import identity | ||
|
||
import dask | ||
|
@@ -39,63 +40,47 @@ | |
with suppress(ImportError): | ||
import snappy | ||
|
||
def _fixed_snappy_decompress(data): | ||
# snappy.decompress() doesn't accept memoryviews | ||
if isinstance(data, (memoryview, bytearray)): | ||
data = bytes(data) | ||
return snappy.decompress(data) | ||
# In python-snappy 0.5.3, support for the Python Buffer Protocol was added. | ||
# This is needed to handle other objects (like `memoryview`s) without | ||
# copying to `bytes` first. | ||
# | ||
# Note: `snappy.__version__` doesn't exist in a release yet. | ||
# So do a little test that will fail if snappy is not 0.5.3 or later. | ||
try: | ||
snappy.compress(memoryview(b"")) | ||
except TypeError: | ||
raise ImportError("Need snappy >= 0.5.3") | ||
Comment on lines
+49
to
+52
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Recently So just do a simple test that will fail without python-snappy 0.5.2: In [1]: import snappy
In [2]: snappy.compress(memoryview(b""))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-077d330e1c70> in <module>
----> 1 snappy.compress(memoryview(b""))
~/miniconda/envs/snap52/lib/python3.6/site-packages/snappy/snappy.py in compress(data, encoding)
82 data = data.encode(encoding)
83
---> 84 return _compress(data)
85
86 def uncompress(data, decoding=None):
TypeError: argument 1 must be read-only bytes-like object, not memoryview python-snappy 0.5.3: In [1]: import snappy
In [2]: snappy.compress(memoryview(b""))
Out[2]: b'\x00' There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 (do we want to use cramjam for all of the de/compressors??) |
||
|
||
compressions["snappy"] = { | ||
"compress": snappy.compress, | ||
"decompress": _fixed_snappy_decompress, | ||
"decompress": snappy.decompress, | ||
} | ||
default_compression = "snappy" | ||
|
||
with suppress(ImportError): | ||
import lz4 | ||
|
||
try: | ||
# try using the new lz4 API | ||
import lz4.block | ||
|
||
lz4_compress = lz4.block.compress | ||
lz4_decompress = lz4.block.decompress | ||
except ImportError: | ||
# fall back to old one | ||
lz4_compress = lz4.LZ4_compress | ||
lz4_decompress = lz4.LZ4_uncompress | ||
Comment on lines
-57
to
-66
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 |
||
|
||
# helper to bypass missing memoryview support in current lz4 | ||
# (fixed in later versions) | ||
|
||
def _fixed_lz4_compress(data): | ||
try: | ||
return lz4_compress(data) | ||
except TypeError: | ||
if isinstance(data, (memoryview, bytearray)): | ||
return lz4_compress(bytes(data)) | ||
else: | ||
raise | ||
|
||
def _fixed_lz4_decompress(data): | ||
try: | ||
return lz4_decompress(data) | ||
except (ValueError, TypeError): | ||
if isinstance(data, (memoryview, bytearray)): | ||
return lz4_decompress(bytes(data)) | ||
else: | ||
raise | ||
Comment on lines
-68
to
-87
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also Python Buffer Protocol support was added in PR ( python-lz4/python-lz4#38 ), which was also included in 0.23.1. So drop these workarounds as these objects should work without copying to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 |
||
# Required to use `lz4.block` APIs and Python Buffer Protocol support. | ||
if parse_version(lz4.__version__) < parse_version("0.23.1"): | ||
raise ImportError("Need lz4 >= 0.23.1") | ||
Comment on lines
+63
to
+65
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This error is a bit more explicit, but this arguably would happen by just trying the |
||
|
||
from lz4.block import compress as lz4_compress | ||
from lz4.block import decompress as lz4_decompress | ||
|
||
compressions["lz4"] = { | ||
"compress": _fixed_lz4_compress, | ||
"decompress": _fixed_lz4_decompress, | ||
"compress": lz4_compress, | ||
"decompress": lz4_decompress, | ||
} | ||
default_compression = "lz4" | ||
|
||
|
||
with suppress(ImportError): | ||
import zstandard | ||
|
||
# Required for Python Buffer Protocol support. | ||
if parse_version(zstandard.__version__) < parse_version("0.9.0"): | ||
raise ImportError("Need zstandard >= 0.9.0") | ||
Comment on lines
+80
to
+82
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As the other compressors already support the Python Buffer Protocol, make sure we support it here too. This was added in a series of commits referenced in issue ( indygreg/python-zstandard#26 ) that were included in 0.9.0, which came out Apr 2018. Thus has been around as long as the other releases being required here. So seems like a reasonable minimum. |
||
|
||
zstd_compressor = zstandard.ZstdCompressor( | ||
level=dask.config.get("distributed.comm.zstd.level"), | ||
threads=dask.config.get("distributed.comm.zstd.threads"), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support for the Python Buffer Protocol was added in 0.5.3 with PR ( intake/python-snappy#72 ). This version was released Jul 2018 so should be old enough to rely upon. With Python Buffer Protocol support, there is no longer a need to copy too
bytes
here and we can just handdata
tosnappy.decompress
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1