Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PyBufferProtocol for RustyBuffer #48

Merged
merged 18 commits into from
Mar 30, 2021
Merged

Conversation

milesgranger
Copy link
Owner

@milesgranger milesgranger commented Mar 29, 2021

  • Implement buffer protocol for RustyBuffer
  • Update de/compress functions to return RustyBuffer (It's zero copy to use bytes, bytearray or others implementing buffer protocol to view the underlying bytes in RustyBuffer, so the user can choose what they want to view it as)
  • Update benchmarks
  • Update CI to use matrix of Python versions, as it seems abi3 is not supported for implementing PyBufferProtocol

@milesgranger
Copy link
Owner Author

@martindurant

This is probably a performance improvement that you'd care about; good news, it's at least as fast as python-snappy. And specific to your testing data (Oh beautiful day...) it's decisively faster.

------------------------------------------------------------------------------------------------------- benchmark: 28 tests ---------------------------------------------------------------------------------------------------------------
Name (time in us)                                              Min                    Max                   Mean                StdDev                 Median                   IQR            Outliers          OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_snappy_raw[Mark.Twain-Tom.Sawyer.txt-cramjam]         51.9661 (3.33)         99.0310 (2.15)         54.3640 (3.20)         5.3599 (1.91)         52.6900 (3.27)         0.4065 (1.0)      735;1321  18,394.5102 (0.31)       7484           1
test_snappy_raw[Mark.Twain-Tom.Sawyer.txt-snappy]          53.0300 (3.40)        130.9831 (2.84)         56.7154 (3.34)         7.1006 (2.52)         53.9960 (3.35)         1.6890 (4.15)    1257;1540  17,631.8867 (0.30)      10693           1
test_snappy_raw[alice29.txt-cramjam]                      617.7521 (39.62)       938.0881 (20.37)       648.3404 (38.19)       36.7101 (13.05)       641.3945 (39.82)       36.3700 (89.47)      135;77   1,542.3997 (0.03)       1444           1
test_snappy_raw[alice29.txt-snappy]                       617.7831 (39.62)       919.2750 (19.96)       644.0131 (37.94)       30.2733 (10.76)       640.6730 (39.77)       32.0346 (78.80)      132;56   1,552.7635 (0.03)       1535           1
test_snappy_raw[asyoulik.txt-cramjam]                     547.3850 (35.10)       845.3510 (18.36)       573.9166 (33.81)       37.2616 (13.25)       566.2280 (35.15)       30.8370 (75.86)      145;94   1,742.4133 (0.03)       1728           1
test_snappy_raw[asyoulik.txt-snappy]                      548.3460 (35.17)       805.4390 (17.49)       573.4750 (33.78)       32.5686 (11.58)       567.2210 (35.21)       30.2345 (74.37)      169;92   1,743.7550 (0.03)       1772           1
test_snappy_raw[fifty-four-mb-random-cramjam]          37,636.5781 (>1000.0)  41,024.9011 (890.84)   38,631.4990 (>1000.0)    842.6126 (299.62)   38,363.5870 (>1000.0)  1,203.6384 (>1000.0)       6;1      25.8856 (0.00)         27           1
test_snappy_raw[fifty-four-mb-random-snappy]           54,153.4640 (>1000.0)  56,788.9699 (>1000.0)  55,255.3144 (>1000.0)    781.3597 (277.84)   55,137.5925 (>1000.0)  1,044.3189 (>1000.0)       8;0      18.0978 (0.00)         18           1
test_snappy_raw[fifty-four-mb-repeating-cramjam]       18,921.9690 (>1000.0)  20,660.1609 (448.63)   19,703.1325 (>1000.0)    497.2913 (176.83)   19,655.2780 (>1000.0)    682.7679 (>1000.0)      10;0      50.7534 (0.00)         27           1
test_snappy_raw[fifty-four-mb-repeating-snappy]        32,008.1390 (>1000.0)  36,384.3100 (790.07)   34,214.1525 (>1000.0)  1,133.4969 (403.06)   34,460.8220 (>1000.0)  1,502.2595 (>1000.0)       9;0      29.2277 (0.00)         28           1
test_snappy_raw[fireworks.jpeg-cramjam]                    25.8710 (1.66)         59.4810 (1.29)         27.5527 (1.62)         3.8046 (1.35)         26.1579 (1.62)         0.8320 (2.05)      923;992  36,294.0652 (0.62)       9185           1
test_snappy_raw[fireworks.jpeg-snappy]                     15.5929 (1.0)          46.0520 (1.0)          16.9755 (1.0)          2.8123 (1.0)          16.1080 (1.0)          0.4510 (1.11)    2578;2913  58,908.3526 (1.0)       30317           1
test_snappy_raw[geo.protodata-cramjam]                    160.7880 (10.31)       282.3471 (6.13)        171.4483 (10.10)       17.4146 (6.19)        162.9830 (10.12)       11.0575 (27.20)     570;544   5,832.6632 (0.10)       5424           1
test_snappy_raw[geo.protodata-snappy]                     146.7620 (9.41)        261.3820 (5.68)        157.0178 (9.25)        14.8278 (5.27)        150.9425 (9.37)        10.3321 (25.42)     678;590   6,368.7047 (0.11)       5968           1
test_snappy_raw[html-cramjam]                             169.9220 (10.90)       272.1080 (5.91)        179.8722 (10.60)       15.4079 (5.48)        171.8199 (10.67)       11.4760 (28.23)     556;433   5,559.5025 (0.09)       5003           1
test_snappy_raw[html-snappy]                              158.8060 (10.18)       273.5379 (5.94)        170.6671 (10.05)       14.4343 (5.13)        163.0040 (10.12)       11.6350 (28.62)     622;441   5,859.3618 (0.10)       5620           1
test_snappy_raw[html_x_4-cramjam]                         661.2011 (42.40)     1,077.2691 (23.39)       691.0037 (40.71)       41.2041 (14.65)       682.9470 (42.40)       38.6368 (95.04)      141;83   1,447.1703 (0.02)       1389           1
test_snappy_raw[html_x_4-snappy]                          633.8150 (40.65)       942.4259 (20.46)       666.0294 (39.23)       38.2292 (13.59)       656.6941 (40.77)       40.5195 (99.67)      146;67   1,501.4352 (0.03)       1327           1
test_snappy_raw[kppkn.gtb-cramjam]                        513.5919 (32.94)       805.1810 (17.48)       538.2725 (31.71)       30.8071 (10.95)       532.0259 (33.03)       29.7979 (73.30)      151;78   1,857.7950 (0.03)       1645           1
test_snappy_raw[kppkn.gtb-snappy]                         518.2280 (33.23)       769.1929 (16.70)       539.9226 (31.81)       24.9703 (8.88)        536.1760 (33.29)       29.0381 (71.43)      191;60   1,852.1173 (0.03)       1810           1
test_snappy_raw[lcet10.txt-cramjam]                     1,584.6450 (101.63)    2,666.9119 (57.91)     1,712.9024 (100.90)     128.8460 (45.82)     1,682.8649 (104.47)      64.0220 (157.49)      52;58     583.8044 (0.01)        600           1
test_snappy_raw[lcet10.txt-snappy]                      1,635.8980 (104.91)    2,136.3860 (46.39)     1,722.9559 (101.50)      82.4266 (29.31)     1,697.8655 (105.40)      68.9620 (169.64)      86;43     580.3979 (0.01)        554           1
test_snappy_raw[paper-100k.pdf-cramjam]                    28.8460 (1.85)         62.1950 (1.35)         30.3223 (1.79)         3.4838 (1.24)         29.1889 (1.81)         0.7970 (1.96)      895;946  32,979.0266 (0.56)      10936           1
test_snappy_raw[paper-100k.pdf-snappy]                     20.8659 (1.34)         54.9251 (1.19)         22.2160 (1.31)         2.9008 (1.03)         21.4539 (1.33)         0.5307 (1.31)    1605;1744  45,012.6255 (0.76)      24363           1
test_snappy_raw[plrabn12.txt-cramjam]                   2,173.5800 (139.40)    3,214.5870 (69.80)     2,299.2646 (135.45)     128.0979 (45.55)     2,269.2570 (140.88)     118.3042 (291.02)      30;18     434.9217 (0.01)        385           1
test_snappy_raw[plrabn12.txt-snappy]                    2,188.6230 (140.36)    2,477.7141 (53.80)     2,267.8710 (133.60)      67.6542 (24.06)     2,247.2980 (139.51)      74.6210 (183.56)     113;26     440.9422 (0.01)        387           1
test_snappy_raw[urls.10K-cramjam]                       1,858.9939 (119.22)    2,810.8520 (61.04)     1,929.0995 (113.64)      94.8150 (33.71)     1,905.8019 (118.31)      64.2513 (158.05)      29;24     518.3766 (0.01)        483           1
test_snappy_raw[urls.10K-snappy]                        1,816.7460 (116.51)    2,802.7340 (60.86)     1,885.2621 (111.06)      89.2952 (31.75)     1,862.3880 (115.62)      65.9582 (162.25)      39;27     530.4302 (0.01)        423           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@milesgranger
Copy link
Owner Author

@messense I had to drop abi3 feature support b/c use of PyBufferProtocol. Do you have a simple way to update the linux and linux-cross builds? I see it's using docker with python 3.8 and will need to update each with matrix for all versions without abi3 now I suppose.

I'll poke around and fix it myself if you didn't have time, just thought you probably already know more about this area. 😃

@martindurant
Copy link

That's the one!

@milesgranger milesgranger changed the title Implement buffer protocol for RustyBuffer Implement PyBufferProtocol for RustyBuffer Mar 29, 2021
@messense
Copy link
Contributor

messense commented Mar 29, 2021

@messense I had to drop abi3 feature support b/c use of PyBufferProtocol. Do you have a simple way to update the linux and linux-cross builds? I see it's using docker with python 3.8 and will need to update each with matrix for all versions without abi3 now I suppose.

I'll poke around and fix it myself if you didn't have time, just thought you probably already know more about this area. 😃

Working on it. linux cross build is tricky, we could just disable it for now and fix it later.

Edit: #49

@martindurant
Copy link

Please ping me when this makes it into release.

@martindurant
Copy link

Also, quick question: did this happen to implement passing of python buffer-like objects as inputs too, for the origin of the compressed data and/or for the _into functions? Asking for curiosity.

@martindurant
Copy link

_maybe_open_or_copy_to_local is taking a disproportionate amount of time, considering no work should be happening within the function - but maybe the profiler is not able to follow the contexts very well. Is this is local cache or no?

* CI: Add python-version matrix

* CI: Fix linux cross build

* CI: Use maturin 0.10 prerelease, it has i686 wheels
@milesgranger
Copy link
Owner Author

@martindurant Sure thing, I'll let you know!

Also, quick question: did this happen to implement passing of python buffer-like objects as inputs too, for the origin of the compressed data and/or for the _into functions? Asking for curiosity.

Depends what you mean exactly; all variants, to include _into, can accept numpy.array, bytes, bytearray, cramjam.File or cramjam.Buffer. The _into options, allow for any combination of input/output of the aforementioned buffer-like objects. However, if you're thinking something like io.BytesIO that won't work, no.

_maybe_open_or_copy_to_local is taking a disproportionate amount of time, considering no work should be happening within the function - but maybe the profiler is not able to follow the contexts very well. Is this is local cache or no?

I'm not sure what you mean, can you elaborate?

@martindurant
Copy link

_into, can accept numpy.array, bytes, bytearray, cramjam.File or cramjam.Buffer

That covers all possibilities except a raw buffer, any class supporting the low-level buffer API. I don't think that includes BytesIO anyway. Since the list has bytearray and you can memoryview that or numpy, I think all is good.

In [5]: np.frombuffer(io.BytesIO(b"\x00\x00"), dtype='uint8')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-28ba73ab06d2> in <module>
----> 1 np.frombuffer(io.BytesIO(b"\x00\x00"), dtype='uint8')

TypeError: a bytes-like object is required, not '_io.BytesIO'

In [6]: np.frombuffer(bytearray(b"\x00\x00"), dtype='uint8')
Out[6]: array([0, 0], dtype=uint8)

In [7]: np.frombuffer(memoryview(bytearray(b"\x00\x00")), dtype='uint8')
Out[7]: array([0, 0], dtype=uint8)

I'm not sure what you mean, can you elaborate?

Freak paste into the wrong window, please ignore!

@milesgranger milesgranger marked this pull request as ready for review March 30, 2021 09:44
@milesgranger milesgranger merged commit f90565c into master Mar 30, 2021
@milesgranger milesgranger deleted the buffer-protocol branch March 30, 2021 18:02
@messense
Copy link
Contributor

messense commented Mar 31, 2021

FYI, PyO3 will support IOError on PyPy in future releases, see PyO3/pyo3#1533

In the meantime, you can change it to OSError since IOError is just an alias of OSError on Python 3.

@milesgranger
Copy link
Owner Author

Thanks for that, much better solution. 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants