Cleanup old compression workarounds #6259

jakirkham · 2022-05-03T07:24:45Z

Various workarounds have accumulated in distributed.protocol.compression to handle issues with prior versions of different libraries. However these libraries have since fixed these various issues and made several releases since. Given this, clean out some of these old workarounds. This should simplify the code here (and avoid some no longer needed copies). Also make sure to clarify version expectations and provide context.

Tests added / passed
Passes pre-commit run --all-files

These have been around since python-lz4's 0.23.1 release, which came out in February 2018. Given these have been around for a while, it seems reasonable to just require them and drop the workarounds.

In addition to including the `lz4.block` API, the python-lz4 0.23.1 release included proper support for the Python Buffer Protocol. So if we are going to require the new API, it seems reasonable to drop these workarounds as well.

Raise an `ImportError` if python-lz4 is too old. Since this is in a `suppress` block, the `ImportError` will be captured, but the effect will be python-lz4 won't be used in this case.

As of python-snappy 0.5.3 (released July 2018), any Python Buffer Protocol object (like `memoryview`s) is supported. Given how long this release has been out, it seems reasonable to drop the prior workaround. Should avoid unneeded copying for other objects (like `memoryview`s).

jakirkham

Provided a bit more context about when things were fixed and why the workarounds can be dropped below

jakirkham · 2022-05-03T07:28:56Z

distributed/protocol/compression.py

-    def _fixed_snappy_decompress(data):
-        # snappy.decompress() doesn't accept memoryviews
-        if isinstance(data, (memoryview, bytearray)):
-            data = bytes(data)
-        return snappy.decompress(data)


Support for the Python Buffer Protocol was added in 0.5.3 with PR ( intake/python-snappy#72 ). This version was released Jul 2018 so should be old enough to rely upon. With Python Buffer Protocol support, there is no longer a need to copy too bytes here and we can just hand data to snappy.decompress.

jakirkham · 2022-05-03T07:34:20Z

distributed/protocol/compression.py

+    try:
+        snappy.compress(memoryview(b""))
+    except TypeError:
+        raise ImportError("Need snappy >= 0.5.3")


Recently snappy.__version__ was added with PR ( intake/python-snappy#119 ). Though it is not currently in a release and hasn't been around long enough to rely upon.

So just do a simple test that will fail without python-snappy version 0.5.3 or later. If python-snappy is not new enough (unlikely given how long this release has been out), simply skip using python-snappy. Test results on 0.5.2 & 0.5.3 below:

python-snappy 0.5.2:

In [1]: import snappy In [2]: snappy.compress(memoryview(b"")) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-2-077d330e1c70> in <module> ----> 1 snappy.compress(memoryview(b"")) ~/miniconda/envs/snap52/lib/python3.6/site-packages/snappy/snappy.py in compress(data, encoding) 82 data = data.encode(encoding) 83 ---> 84 return _compress(data) 85 86 def uncompress(data, decoding=None): TypeError: argument 1 must be read-only bytes-like object, not memoryview

python-snappy 0.5.3:

In [1]: import snappy In [2]: snappy.compress(memoryview(b"")) Out[2]: b'\x00'

+1

(do we want to use cramjam for all of the de/compressors??)

jakirkham · 2022-05-03T07:36:36Z

distributed/protocol/compression.py

-    try:
-        # try using the new lz4 API
-        import lz4.block
-
-        lz4_compress = lz4.block.compress
-        lz4_decompress = lz4.block.decompress
-    except ImportError:
-        # fall back to old one
-        lz4_compress = lz4.LZ4_compress
-        lz4_decompress = lz4.LZ4_uncompress


The lz4.block APIs were added in commit ( python-lz4/python-lz4@bb2c7ea ) and have been around since 0.23.1, which was released Feb 2018. So should be old enough to rely upon.

jakirkham · 2022-05-03T07:38:18Z

distributed/protocol/compression.py

-    # helper to bypass missing memoryview support in current lz4
-    # (fixed in later versions)
-
-    def _fixed_lz4_compress(data):
-        try:
-            return lz4_compress(data)
-        except TypeError:
-            if isinstance(data, (memoryview, bytearray)):
-                return lz4_compress(bytes(data))
-            else:
-                raise
-
-    def _fixed_lz4_decompress(data):
-        try:
-            return lz4_decompress(data)
-        except (ValueError, TypeError):
-            if isinstance(data, (memoryview, bytearray)):
-                return lz4_decompress(bytes(data))
-            else:
-                raise


Also Python Buffer Protocol support was added in PR ( python-lz4/python-lz4#38 ), which was also included in 0.23.1. So drop these workarounds as these objects should work without copying to bytes first.

jakirkham · 2022-05-03T07:39:08Z

distributed/protocol/compression.py

+    # Required to use `lz4.block` APIs and Python Buffer Protocol support.
+    if parse_version(lz4.__version__) < parse_version("0.23.1"):
+        raise ImportError("Need lz4 >= 0.23.1")


This error is a bit more explicit, but this arguably would happen by just trying the import line below.

Ideally we would test `__version__` however not in a release atm (though it was added recently). In any even it can't reliably be checked for current versions. Instead just test some fast code that reliably fails for versions prior to `0.5.3`. If the code fails, simply raise an `ImportError` noting this requirement. As this is in a `suppress` block, the `ImportError` will be caught. Though the net effect will be disabling python-snappy use if it is too old.

These should already be satisfied when installed. Though this makes it clearer that these are the required versions (particularly for any users poking around trying to understand things).

In `zstandard`'s `0.9.0` release, support for the Python Buffer Protocol was added. As we are supporting this functionality with other compressors, go ahead and support it here too.

jakirkham · 2022-05-03T08:29:25Z

distributed/protocol/compression.py

+    # Required for Python Buffer Protocol support.
+    if parse_version(zstandard.__version__) < parse_version("0.9.0"):
+        raise ImportError("Need zstandard >= 0.9.0")


As the other compressors already support the Python Buffer Protocol, make sure we support it here too. This was added in a series of commits referenced in issue ( indygreg/python-zstandard#26 ) that were included in 0.9.0, which came out Apr 2018. Thus has been around as long as the other releases being required here. So seems like a reasonable minimum.

github-actions · 2022-05-03T09:14:58Z

Unit Test Results

      16 files ±0       16 suites ±0 7h 43m 55s ⏱️ - 21m 57s
  2 745 tests ±0   2 660 ✔️ - 1     81 💤 +1 4 ❌ ±0
21 861 runs ±0 20 808 ✔️ - 3 1 046 💤 ±0 7 ❌ +3

For more details on these failures, see this check.

Results for commit 3aa8e2f. ± Comparison against base commit c11c8ee.

jakirkham · 2022-05-03T16:47:20Z

@dask/maintenance , thoughts? 🙂

martindurant

All good

Repeating comment: any thoughts on using cramjam instead of all these specific packages?

martindurant · 2022-05-03T16:52:11Z

distributed/protocol/compression.py

-    def _fixed_snappy_decompress(data):
-        # snappy.decompress() doesn't accept memoryviews
-        if isinstance(data, (memoryview, bytearray)):
-            data = bytes(data)
-        return snappy.decompress(data)


martindurant · 2022-05-03T16:53:05Z

distributed/protocol/compression.py

+    try:
+        snappy.compress(memoryview(b""))
+    except TypeError:
+        raise ImportError("Need snappy >= 0.5.3")


+1

(do we want to use cramjam for all of the de/compressors??)

martindurant · 2022-05-03T16:53:16Z

distributed/protocol/compression.py

-    try:
-        # try using the new lz4 API
-        import lz4.block
-
-        lz4_compress = lz4.block.compress
-        lz4_decompress = lz4.block.decompress
-    except ImportError:
-        # fall back to old one
-        lz4_compress = lz4.LZ4_compress
-        lz4_decompress = lz4.LZ4_uncompress


martindurant · 2022-05-03T16:53:33Z

distributed/protocol/compression.py

-    # helper to bypass missing memoryview support in current lz4
-    # (fixed in later versions)
-
-    def _fixed_lz4_compress(data):
-        try:
-            return lz4_compress(data)
-        except TypeError:
-            if isinstance(data, (memoryview, bytearray)):
-                return lz4_compress(bytes(data))
-            else:
-                raise
-
-    def _fixed_lz4_decompress(data):
-        try:
-            return lz4_decompress(data)
-        except (ValueError, TypeError):
-            if isinstance(data, (memoryview, bytearray)):
-                return lz4_decompress(bytes(data))
-            else:
-                raise


jakirkham · 2022-05-03T17:30:56Z

Thanks Martin! 😄

Repeating comment: any thoughts on using cramjam instead of all these specific packages?

No strong feelings. Mostly was looking to simplify the existing code. Maybe we can raise this idea as a new issue and see if others agree? If so, guessing it is simple to swap in.

Edit: Raised issue ( #6265 )

jakirkham · 2022-05-04T06:30:28Z

Planning on merging EOD tomorrow if no comments

jakirkham · 2022-05-05T10:43:37Z

With these changes in place, cleaning up some unneeded copies in the compression code paths with PR ( #6273 ).

jakirkham added 4 commits May 2, 2022 23:37

Just use lz4.block API

26c1243

These have been around since python-lz4's 0.23.1 release, which came out in February 2018. Given these have been around for a while, it seems reasonable to just require them and drop the workarounds.

Drop python-lz4 workarounds for memoryviews

c130510

In addition to including the `lz4.block` API, the python-lz4 0.23.1 release included proper support for the Python Buffer Protocol. So if we are going to require the new API, it seems reasonable to drop these workarounds as well.

Be explicit about requiring python-lz4 0.23.1+

00020a4

Raise an `ImportError` if python-lz4 is too old. Since this is in a `suppress` block, the `ImportError` will be captured, but the effect will be python-lz4 won't be used in this case.

jakirkham force-pushed the cleanup_compress branch from 0da4a61 to b7041b2 Compare May 3, 2022 07:31

jakirkham commented May 3, 2022

View reviewed changes

jakirkham added 2 commits May 3, 2022 00:42

Add python-snappy & lz4 requirements to CI

0b263f6

These should already be satisfied when installed. Though this makes it clearer that these are the required versions (particularly for any users poking around trying to understand things).

jakirkham force-pushed the cleanup_compress branch from 2ec2fbf to 201410e Compare May 3, 2022 07:44

Make linter happy

d2a18ad

jakirkham force-pushed the cleanup_compress branch from 201410e to d2a18ad Compare May 3, 2022 07:50

Require zstandard version 0.9.0+

3aa8e2f

In `zstandard`'s `0.9.0` release, support for the Python Buffer Protocol was added. As we are supporting this functionality with other compressors, go ahead and support it here too.

jakirkham commented May 3, 2022

View reviewed changes

martindurant approved these changes May 3, 2022

View reviewed changes

jakirkham mentioned this pull request May 3, 2022

Use compressors collection library #6265

Open

jakirkham merged commit f779854 into dask:main May 4, 2022

jakirkham deleted the cleanup_compress branch May 4, 2022 23:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup old compression workarounds #6259

Cleanup old compression workarounds #6259

jakirkham commented May 3, 2022 •

edited

Loading

jakirkham left a comment

jakirkham May 3, 2022

martindurant May 3, 2022

jakirkham May 3, 2022 •

edited

Loading

martindurant May 3, 2022

jakirkham May 3, 2022

martindurant May 3, 2022

jakirkham May 3, 2022

martindurant May 3, 2022

jakirkham May 3, 2022

jakirkham May 3, 2022

github-actions bot commented May 3, 2022

jakirkham commented May 3, 2022

martindurant left a comment

martindurant May 3, 2022

martindurant May 3, 2022

martindurant May 3, 2022

martindurant May 3, 2022

jakirkham commented May 3, 2022 •

edited

Loading

jakirkham commented May 4, 2022

jakirkham commented May 5, 2022

Cleanup old compression workarounds #6259

Cleanup old compression workarounds #6259

Conversation

jakirkham commented May 3, 2022 • edited Loading

jakirkham left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakirkham May 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented May 3, 2022

Unit Test Results

jakirkham commented May 3, 2022

martindurant left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakirkham commented May 3, 2022 • edited Loading

jakirkham commented May 4, 2022

jakirkham commented May 5, 2022

jakirkham commented May 3, 2022 •

edited

Loading

jakirkham May 3, 2022 •

edited

Loading

jakirkham commented May 3, 2022 •

edited

Loading