Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dtype casts safe #38

Merged
merged 1 commit into from
Mar 20, 2023
Merged

Make dtype casts safe #38

merged 1 commit into from
Mar 20, 2023

Conversation

pavelzw
Copy link
Member

@pavelzw pavelzw commented Mar 20, 2023

Fixes #23

@pavelzw pavelzw requested a review from jonashaag March 20, 2023 15:36
@pavelzw pavelzw added the enhancement New feature or request label Mar 20, 2023
@pavelzw
Copy link
Member Author

pavelzw commented Mar 20, 2023

This adds safe casts to the old version of the LGBM compression algorithm (as well as the sklearn algorithm). @YYYasin19 needs to do something like this in #15 too.

@github-actions
Copy link

(benchmark 4470028469 / attempt 1)
Base results / Our results / Change

Model Size Dump Time Load Time
sklearn rf 77.2 MiB / 11.1 MiB / 6.94 x 0.04 s / 0.11 s / 2.43 x 0.06 s / 0.11 s / 1.76 x
sklearn rf LZMA 17.3 MiB / 5.3 MiB / 3.25 x 38.61 s / 6.69 s / 0.17 x 1.69 s / 0.58 s / 0.34 x
sklearn gb 2.2 MiB / 1.1 MiB / 2.08 x 0.03 s / 0.25 s / 7.47 x 0.04 s / 0.13 s / 3.66 x
sklearn gb LZMA 0.6 MiB / 0.2 MiB / 3.82 x 1.02 s / 0.45 s / 0.44 x 0.09 s / 0.15 s / 1.61 x
LGBM gbdt 5.3 MiB / 1.9 MiB / 2.81 x 0.17 s / 0.49 s / 2.91 x 0.03 s / 0.32 s / 12.11 x
LGBM gbdt LZMA 1.7 MiB / 0.8 MiB / 1.96 x 4.11 s / 1.15 s / 0.28 x 0.16 s / 0.38 s / 2.33 x
LGBM gbdt large 57.6 MiB / 18.9 MiB / 3.05 x 1.72 s / 5.11 s / 2.96 x 0.28 s / 3.55 s / 12.59 x
LGBM gbdt large LZMA 15.3 MiB / 6.7 MiB / 2.29 x 56.13 s / 14.17 s / 0.25 x 1.62 s / 4.00 s / 2.47 x
LGBM rf 1.6 MiB / 0.5 MiB / 3.42 x 0.05 s / 0.09 s / 1.91 x 0.01 s / 0.08 s / 12.93 x
LGBM rf LZMA 0.1 MiB / 0.1 MiB / 1.70 x 0.24 s / 0.13 s / 0.55 x 0.02 s / 0.08 s / 4.22 x

@@ -102,6 +103,10 @@ def _extract_feature(feature_line):
feats_map = dict(_extract_feature(fl) for fl in features)

def parse(str_list, dtype):
if np.can_cast(dtype, np.int64):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this special case?

@pavelzw pavelzw merged commit e85e0fd into main Mar 20, 2023
@pavelzw pavelzw deleted the safe-cast branch March 20, 2023 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check that dtype conversions don't invalidate data
2 participants