Allow for custom unpickler in load(s)_compressed #69

StefanSorgQC · 2023-07-13T16:10:51Z

This is useful to restrict possible imports or to allow unpickling when required module or function names have been refactored.

github-actions · 2023-07-13T16:44:32Z

(benchmark 5591280767 / attempt 1)
Base results / Our results / Change

Model	Size	Dump Time	Load Time
sklearn rf 20M	20.8 MiB / 3.0 MiB / 6.87 x	0.01 s / 0.04 s / 2.49 x	0.01 s / 0.02 s / 1.85 x
sklearn rf 20M lzma	6.5 MiB / 2.0 MiB / 3.26 x	12.35 s / 1.39 s / 0.11 x	0.61 s / 0.20 s / 0.33 x
sklearn rf 200M	212.3 MiB / 30.6 MiB / 6.94 x	0.12 s / 0.31 s / 2.51 x	0.16 s / 0.30 s / 1.90 x
sklearn rf 200M lzma	47.4 MiB / 14.6 MiB / 3.24 x	108.60 s / 19.66 s / 0.18 x	4.63 s / 1.61 s / 0.35 x
sklearn rf 1G	1157.5 MiB / 166.8 MiB / 6.94 x	1.13 s / 1.60 s / 1.42 x	1.04 s / 1.48 s / 1.43 x
sklearn rf 1G lzma	258.1 MiB / 98.1 MiB / 2.63 x	546.95 s / 127.44 s / 0.23 x	25.41 s / 9.34 s / 0.37 x
sklearn gb 2M	2.2 MiB / 1.1 MiB / 2.08 x	0.03 s / 0.26 s / 8.02 x	0.03 s / 0.13 s / 3.72 x
sklearn gb 2M lzma	0.6 MiB / 0.2 MiB / 3.80 x	1.03 s / 0.43 s / 0.41 x	0.09 s / 0.14 s / 1.59 x
lgbm gbdt 2M	2.6 MiB / 1.0 MiB / 2.78 x	0.08 s / 0.24 s / 2.91 x	0.01 s / 0.12 s / 9.36 x
lgbm gbdt 2M lzma	0.9 MiB / 0.5 MiB / 1.90 x	1.54 s / 0.53 s / 0.35 x	0.09 s / 0.15 s / 1.80 x
lgbm gbdt 5M	5.3 MiB / 1.9 MiB / 2.81 x	0.16 s / 0.47 s / 2.96 x	0.02 s / 0.24 s / 9.55 x
lgbm gbdt 5M lzma	1.7 MiB / 0.8 MiB / 1.96 x	3.72 s / 1.11 s / 0.30 x	0.16 s / 0.31 s / 1.90 x
lgbm gbdt 20M	22.7 MiB / 7.6 MiB / 3.00 x	0.63 s / 1.91 s / 3.04 x	0.11 s / 0.96 s / 9.05 x
lgbm gbdt 20M lzma	6.3 MiB / 3.0 MiB / 2.09 x	20.33 s / 5.31 s / 0.26 x	0.65 s / 1.21 s / 1.85 x
lgbm gbdt 100M	101.1 MiB / 33.0 MiB / 3.06 x	2.84 s / 8.85 s / 3.12 x	0.52 s / 37.83 s / 72.48 x
lgbm gbdt 100M lzma	25.6 MiB / 10.6 MiB / 2.41 x	91.18 s / 24.25 s / 0.27 x	2.60 s / 5.06 s / 1.95 x
lgbm rf 10M	10.9 MiB / 3.2 MiB / 3.46 x	0.33 s / 0.62 s / 1.89 x	0.04 s / 0.37 s / 8.78 x
lgbm rf 10M lzma	0.7 MiB / 0.4 MiB / 1.85 x	1.98 s / 0.89 s / 0.45 x	0.13 s / 0.41 s / 3.24 x

jonashaag · 2023-07-14T08:57:03Z

slim_trees/pickling.py

@@ -13,6 +13,10 @@ def __init__(self):
        self.open = open
        self.compress = lambda data: data

+    @staticmethod


Was this missing and untested?

Yes, it seems that loads_compressed was benchmarked (with lzma compression), but untested prior to my introduction of test_loads_compressed_custom_unpickler.

Without compression it failed due to the missing decompress method in _NoCompression.

jonashaag · 2023-07-14T09:00:18Z

slim_trees/pickling.py

@@ -129,29 +135,39 @@ def load_compressed(
                        set to the compression method and other key-value pairs which are forwarded
                        to open() of the compression library.
                        Inspired by the pandas.to_csv interface.
+    :param unpickler_class: custom unpickler class derived from pickle.Unpickler.


Hm, I wonder if we should pass a callable load(file) -> unpickled_obj here instead of a class, and users may pass load=lambda file: pickle.Unpickler(file).load().

As shown in the tests, a custom unpickling scheme will be defined via a class derived from pickle.Unpickler. Hence, I find it the most straightforward to pass this class directly, rather than taking the function detour.

But I'm open to other solutions as well. What do you think, @pavelzw?

I think since we expect users to always provide some kind of Unpickler, this approach is fine.
Otherwise, everytime you want to specify custom pickling behavior, you would need to explicitly pass a lambda function lambda file: CustomUnpickler(file).load().

pavelzw

Thanks @StefanSorgQC!

pavelzw · 2023-07-18T12:31:32Z

slim_trees/pickling.py

@@ -129,29 +135,39 @@ def load_compressed(
                        set to the compression method and other key-value pairs which are forwarded
                        to open() of the compression library.
                        Inspired by the pandas.to_csv interface.
+    :param unpickler_class: custom unpickler class derived from pickle.Unpickler.


I think since we expect users to always provide some kind of Unpickler, this approach is fine.
Otherwise, everytime you want to specify custom pickling behavior, you would need to explicitly pass a lambda function lambda file: CustomUnpickler(file).load().

pavelzw · 2023-07-18T18:56:48Z

It seems that the tests are failing for lightgbm 4.0 :/ #72

Allow for custom unpickler in load(s)_compressed

9340ca0

This is useful to restrict possible imports or to allow unpickling when required module or function names have been refactored.

StefanSorgQC added the enhancement New feature or request label Jul 13, 2023

StefanSorgQC requested a review from jonashaag July 13, 2023 16:10

StefanSorgQC requested a review from pavelzw as a code owner July 13, 2023 16:10

jonashaag reviewed Jul 14, 2023

View reviewed changes

pavelzw approved these changes Jul 18, 2023

View reviewed changes

pavelzw added 2 commits July 18, 2023 20:28

Bump version

088cd0e

Pin lightgbm <4.0

b0c098c

pavelzw merged commit 056ea90 into main Jul 18, 2023

pavelzw deleted the s.custom_unpickler branch July 18, 2023 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for custom unpickler in load(s)_compressed #69

Allow for custom unpickler in load(s)_compressed #69

StefanSorgQC commented Jul 13, 2023

github-actions bot commented Jul 13, 2023 •

edited

Loading

jonashaag Jul 14, 2023

StefanSorgQC Jul 14, 2023

jonashaag Jul 14, 2023

StefanSorgQC Jul 14, 2023

pavelzw Jul 18, 2023

pavelzw left a comment

pavelzw Jul 18, 2023

pavelzw commented Jul 18, 2023

Allow for custom unpickler in load(s)_compressed #69

Allow for custom unpickler in load(s)_compressed #69

Conversation

StefanSorgQC commented Jul 13, 2023

github-actions bot commented Jul 13, 2023 • edited Loading

jonashaag Jul 14, 2023

Choose a reason for hiding this comment

StefanSorgQC Jul 14, 2023

Choose a reason for hiding this comment

jonashaag Jul 14, 2023

Choose a reason for hiding this comment

StefanSorgQC Jul 14, 2023

Choose a reason for hiding this comment

pavelzw Jul 18, 2023

Choose a reason for hiding this comment

pavelzw left a comment

Choose a reason for hiding this comment

pavelzw Jul 18, 2023

Choose a reason for hiding this comment

pavelzw commented Jul 18, 2023

github-actions bot commented Jul 13, 2023 •

edited

Loading