Support arbitrary buffer-like objects #26

pitrou · 2017-06-09T10:01:43Z

The methods in this package (such as ZStdCompressor.compress() and ZStdDecompressor.decompress()) typically don't support bytearray or memoryview inputs. Though for some strange reason they seem to support Numpy arrays:

>>> c = zstd.ZstdCompressor(level=1, write_content_size=True)
>>> b = c.compress(b'12345678')
>>> b == c.compress(np.frombuffer(b'12345678', dtype='int8'))
True
>>> b == c.compress(memoryview(b'12345678'))
Traceback (most recent call last):
  File "<ipython-input-40-9ae8177a1d5c>", line 1, in <module>
    b == c.compress(memoryview(b'12345678'))
TypeError: compress() argument 1 must be read-only bytes-like object, not memoryview

>>> b == c.compress(bytearray(b'12345678'))
Traceback (most recent call last):
  File "<ipython-input-41-ddc68fad54ac>", line 1, in <module>
    b == c.compress(bytearray(b'12345678'))
TypeError: compress() argument 1 must be read-only bytes-like object, not bytearray

Note that other compression libraries typically accept arbitrary buffer-like objects (which allows passing them without copying to bytes before):

>>> zlib.compress(memoryview(b'12345678'))
b'x\x9c3426153\xb7\x00\x00\x07@\x01\xa5'
>>> bz2.compress(memoryview(b'12345678'))
b'BZh91AY&SY\xb6\x1c=\x04\x00\x00\x00\x08\x00?\xc0 \x001\x0c\x08\x19\x1ai\x935s\xf9E\xdc\x91N\x14$-\x87\x0fA\x00'
>>> lzma.compress(memoryview(b'12345678'))
b'\xfd7zXZ\x00\x00\x04\xe6\xd6\xb4F\x02\x00!\x01\x16\x00\x00\x00t/\xe5\xa3\x01\x00\x0712345678\x00\tx\xac+H\x80\x8b\\\x00\x01 \x08\xbb\x19\xd9\xbb\x1f\xb6\xf3}\x01\x00\x00\x00\x00\x04YZ'
>>> blosc.compress(memoryview(b'12345678'))
b'\x02\x01\x13\x08\x08\x00\x00\x00\x08\x00\x00\x00\x18\x00\x00\x0012345678'

The text was updated successfully, but these errors were encountered:

indygreg · 2017-06-09T17:11:04Z

Eh? I assume you are talking about Python 3, because zlib.compress() doesn't accept memoryview in Python 2 (sadly).

Anyway, it is my intent for this to work. I /thought/ that using the y# format unit accepted anything conforming to the buffer protocol, as https://docs.python.org/3/c-api/arg.html#strings-and-buffers says it will accept any "bytes-like object" and anything implementing the buffer protocol is a "bytes-like object." I see that CPython uses y* for these functions. While the docs say y* is recommended, they don't say there are any feature differences. But I guess there must be. Boo.

Anyway, y* is the superior mechanism. So the code should be changed to that. This will require changing Python 2 to s* so we don't have multiple code paths for a Py_buffer and char * source.

pitrou · 2017-06-10T17:25:46Z

Yes, I would recommend to use y* on Python 3 and s* on Python 2. Should I take a look or do you think it'll be much quicker for you to do? I'm not familiar with this source base.

(for the record, y# tries the buffer API but refuses any object that has a buffer-release hook, because the buffer is released before PyArg_ParseTuple returns -- in practice it probably means it works only with bytes objects).

indygreg · 2017-06-10T18:18:33Z

FWIW, I filed a docs bug at http://bugs.python.org/issue30625 because this limitation of y# was not obvious from the docs. In fact, the docs seem contradictory in saying that a "read only bytes-like object" (which they define memoryview(bytes) to be) should work for y#!

indygreg · 2017-06-11T17:15:25Z

I'm actively working on this.

Without this, we may not accept all objects implementing the buffer protocol.

The y#/s# format unit don't appear to accept all types exposing a read-only bytes-like object. The Python docs do say that y*/s* is the recommended way to accept binary data. So we switch to that.

This is slightly more involved than other changes because of existing checking for unicode types. Python 2 happily allows unicode to be cast to a Py_buffer. Python 3 doesn't. So we need to update the test accordingly. The CFFI code has also been updated to use ffi.from_buffer() so it is consistent with the C code. And since this is the last function using y#/s# for argument parsing, update the NEWS.

indygreg · 2017-06-12T18:13:04Z

And I pushed those commits to the master branch. Calling this one done.

Thanks for the bug report and the background on y*/y#.

indygreg added a commit that referenced this issue Jun 11, 2017

cffi: use ffi.from_buffer() in ZstdCompressor.compress() (#26)

7740715

Without this, we may not accept all objects implementing the buffer protocol.

indygreg added a commit that referenced this issue Jun 11, 2017

compressiondict: use buffer protocol in __init__ (#26)

f90f206

indygreg added a commit that referenced this issue Jun 11, 2017

compressionwriter: use buffer protocol in write() (#26)

547d138

indygreg added a commit that referenced this issue Jun 11, 2017

compressobj: use buffer protocol in compress() (#26)

9d1ecb7

indygreg added a commit that referenced this issue Jun 11, 2017

decompressor: use buffer protocol in decompress() (#26)

ae9405a

indygreg added a commit that referenced this issue Jun 11, 2017

decompressionwriter: use buffer protocol in write() (#26)

1fb1367

indygreg added a commit that referenced this issue Jun 11, 2017

decompressobj: use buffer protocol in decompress() (#26)

d248077

indygreg closed this as completed Jun 12, 2017

This was referenced Jan 20, 2018

Does the encode method have the distributive property? samuelcolvin/xdelta3-python#5

Closed

Support memoryview ranges in decode and encode samuelcolvin/xdelta3-python#7

Open

himikof mentioned this issue Mar 25, 2018

Can snappy.decompress(buffer) but cannot snappy.decompress(memoryview) intake/python-snappy#65

Closed

jakirkham mentioned this issue May 3, 2022

Cleanup old compression workarounds dask/distributed#6259

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support arbitrary buffer-like objects #26

Support arbitrary buffer-like objects #26

pitrou commented Jun 9, 2017 •

edited

Loading

indygreg commented Jun 9, 2017

pitrou commented Jun 10, 2017

indygreg commented Jun 10, 2017

indygreg commented Jun 11, 2017

indygreg commented Jun 12, 2017

Support arbitrary buffer-like objects #26

Support arbitrary buffer-like objects #26

Comments

pitrou commented Jun 9, 2017 • edited Loading

indygreg commented Jun 9, 2017

pitrou commented Jun 10, 2017

indygreg commented Jun 10, 2017

indygreg commented Jun 11, 2017

indygreg commented Jun 12, 2017

pitrou commented Jun 9, 2017 •

edited

Loading