Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for custom unpickler in load(s)_compressed #69

Merged
merged 3 commits into from
Jul 18, 2023
Merged

Conversation

StefanSorgQC
Copy link
Contributor

This is useful to restrict possible imports or to allow unpickling when required module or function names have been refactored.

This is useful to restrict possible imports or to allow unpickling when required module or function names have been refactored.
@StefanSorgQC StefanSorgQC added the enhancement New feature or request label Jul 13, 2023
@github-actions
Copy link

github-actions bot commented Jul 13, 2023

(benchmark 5591280767 / attempt 1)
Base results / Our results / Change

Model Size Dump Time Load Time
sklearn rf 20M 20.8 MiB / 3.0 MiB / 6.87 x 0.01 s / 0.04 s / 2.49 x 0.01 s / 0.02 s / 1.85 x
sklearn rf 20M lzma 6.5 MiB / 2.0 MiB / 3.26 x 12.35 s / 1.39 s / 0.11 x 0.61 s / 0.20 s / 0.33 x
sklearn rf 200M 212.3 MiB / 30.6 MiB / 6.94 x 0.12 s / 0.31 s / 2.51 x 0.16 s / 0.30 s / 1.90 x
sklearn rf 200M lzma 47.4 MiB / 14.6 MiB / 3.24 x 108.60 s / 19.66 s / 0.18 x 4.63 s / 1.61 s / 0.35 x
sklearn rf 1G 1157.5 MiB / 166.8 MiB / 6.94 x 1.13 s / 1.60 s / 1.42 x 1.04 s / 1.48 s / 1.43 x
sklearn rf 1G lzma 258.1 MiB / 98.1 MiB / 2.63 x 546.95 s / 127.44 s / 0.23 x 25.41 s / 9.34 s / 0.37 x
sklearn gb 2M 2.2 MiB / 1.1 MiB / 2.08 x 0.03 s / 0.26 s / 8.02 x 0.03 s / 0.13 s / 3.72 x
sklearn gb 2M lzma 0.6 MiB / 0.2 MiB / 3.80 x 1.03 s / 0.43 s / 0.41 x 0.09 s / 0.14 s / 1.59 x
lgbm gbdt 2M 2.6 MiB / 1.0 MiB / 2.78 x 0.08 s / 0.24 s / 2.91 x 0.01 s / 0.12 s / 9.36 x
lgbm gbdt 2M lzma 0.9 MiB / 0.5 MiB / 1.90 x 1.54 s / 0.53 s / 0.35 x 0.09 s / 0.15 s / 1.80 x
lgbm gbdt 5M 5.3 MiB / 1.9 MiB / 2.81 x 0.16 s / 0.47 s / 2.96 x 0.02 s / 0.24 s / 9.55 x
lgbm gbdt 5M lzma 1.7 MiB / 0.8 MiB / 1.96 x 3.72 s / 1.11 s / 0.30 x 0.16 s / 0.31 s / 1.90 x
lgbm gbdt 20M 22.7 MiB / 7.6 MiB / 3.00 x 0.63 s / 1.91 s / 3.04 x 0.11 s / 0.96 s / 9.05 x
lgbm gbdt 20M lzma 6.3 MiB / 3.0 MiB / 2.09 x 20.33 s / 5.31 s / 0.26 x 0.65 s / 1.21 s / 1.85 x
lgbm gbdt 100M 101.1 MiB / 33.0 MiB / 3.06 x 2.84 s / 8.85 s / 3.12 x 0.52 s / 37.83 s / 72.48 x
lgbm gbdt 100M lzma 25.6 MiB / 10.6 MiB / 2.41 x 91.18 s / 24.25 s / 0.27 x 2.60 s / 5.06 s / 1.95 x
lgbm rf 10M 10.9 MiB / 3.2 MiB / 3.46 x 0.33 s / 0.62 s / 1.89 x 0.04 s / 0.37 s / 8.78 x
lgbm rf 10M lzma 0.7 MiB / 0.4 MiB / 1.85 x 1.98 s / 0.89 s / 0.45 x 0.13 s / 0.41 s / 3.24 x

@@ -13,6 +13,10 @@ def __init__(self):
self.open = open
self.compress = lambda data: data

@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this missing and untested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems that loads_compressed was benchmarked (with lzma compression), but untested prior to my introduction of test_loads_compressed_custom_unpickler.

Without compression it failed due to the missing decompress method in _NoCompression.

@@ -129,29 +135,39 @@ def load_compressed(
set to the compression method and other key-value pairs which are forwarded
to open() of the compression library.
Inspired by the pandas.to_csv interface.
:param unpickler_class: custom unpickler class derived from pickle.Unpickler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I wonder if we should pass a callable load(file) -> unpickled_obj here instead of a class, and users may pass load=lambda file: pickle.Unpickler(file).load().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As shown in the tests, a custom unpickling scheme will be defined via a class derived from pickle.Unpickler. Hence, I find it the most straightforward to pass this class directly, rather than taking the function detour.

But I'm open to other solutions as well. What do you think, @pavelzw?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since we expect users to always provide some kind of Unpickler, this approach is fine.
Otherwise, everytime you want to specify custom pickling behavior, you would need to explicitly pass a lambda function lambda file: CustomUnpickler(file).load().

Copy link
Member

@pavelzw pavelzw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @StefanSorgQC!

@@ -129,29 +135,39 @@ def load_compressed(
set to the compression method and other key-value pairs which are forwarded
to open() of the compression library.
Inspired by the pandas.to_csv interface.
:param unpickler_class: custom unpickler class derived from pickle.Unpickler.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since we expect users to always provide some kind of Unpickler, this approach is fine.
Otherwise, everytime you want to specify custom pickling behavior, you would need to explicitly pass a lambda function lambda file: CustomUnpickler(file).load().

@pavelzw
Copy link
Member

pavelzw commented Jul 18, 2023

It seems that the tests are failing for lightgbm 4.0 :/ #72

@pavelzw pavelzw merged commit 056ea90 into main Jul 18, 2023
@pavelzw pavelzw deleted the s.custom_unpickler branch July 18, 2023 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants