-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error loading model file produced on different version/platform #3137
Comments
LightGBM 2.3.1 has version 3 of model file, while 2.2.3 produces version 2. However, if I'm not mistaken, new version of LightGBM is able to load old model: #2269 (comment). @yalwan-iqvia Do you manually modify text model before loading back? |
So I thought this might be related, so I loaded and re-saved (via linux) to get a V3 model file. Same error occurred. Then I tried to truncate number of trees in boosted model (cause I wondered if that would be related) and same error occurred Loaded and re-saved model: https://github.com/IQVIA-ML/LightGBM.jl/blob/afc7cc18e9a69b4a47ed52902cf50960ec2c8719/test/ffi/data/gain_test_booster Truncated boosters model: https://github.com/IQVIA-ML/LightGBM.jl/blob/d98c963b3e99357e8c014c543d12808b3de60b25/test/ffi/data/gain_test_booster In each occasion you can see the error differs. I am wondering if it has something to do with length of model data, but it would be weird. Happy to run whatever experiments to help troubleshoot, but you can see from those status checks its working well in Linux/Mac. The binary is the binary obtained from https://github.com/microsoft/LightGBM/releases/download/v2.3.1/lib_lightgbm.dll in case it is relevant. |
I can add that testing this locally on a windows machine worked ok ... so it seems to be an issue with the status checks system (docker image?) but what, I don't know |
Each time I have truncated the model (to a point prior to "met ..., expected Tree") it gets a new error earlier on in the model parsing, see for example: https://github.com/IQVIA-ML/LightGBM.jl/pull/52/checks?check_run_id=730609619#step:7:111 I've also tried reading the model into memory first and then using |
@yalwan-iqvia I can confirm that the error happens on my local Windows machine. However, if I remove
I guess that the original model file was modified and |
@StrikerRUS I only ever produced these files by calling The tip worked (thank you!) and allowed our development branch to pass on CI server, so I'm happy to accept it as workaround, but perhaps the underlying issue might still need to considered by the LightGBM team -- I leave that decision to you guys. |
@guolinke Is it possible that calculated tree sizes on one platform are incorrect on another? |
I think they should be identical in different platforms. |
@StrikerRUS for your example (#3137 (comment)), I think it will break the newline symbols for cross-platform. |
@guolinke |
@StrikerRUS I am not a python expert, but I guess it could be. |
@StrikerRUS can you confirm whether the |
@guolinke I'll try to get Linux machine in next days and reproduce the issue. But TBH, I don't think that |
@guolinke sorry i took this long to reply. I can say that I tried various things, including using julia wrapper to C API to re-save the model and it did not change the result. I only initially produced model using python for a convenience, and I tried a lot of things to make the issue go away (including saving with truncation by using a lower num_iterations than what was used for saving, not a manual truncation) -- but @StrikerRUS suggestion to remove tree_sizes field was the only thing which worked for the system. |
Finally got an access to a Linux machine. Can confirm that the model However, I cannot reproduce the issue with random data and model. I trained and saved a model on Linux with
IDK, maybe that original model suffers from some edge case of tree size calculation that differs on Linux and Windows. Or maybe that file is just corrupted somehow. |
@yalwan-iqvia |
My take is that if this is apparently occasionally correct and the workaround is to remove I don't regularly anticipate a need to produce a file on one system and consume it on another, so this doesn't affect me personally. I just needed tests to pass, but this does look like a potential problem for users who might be doing cross platform production/consumption. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
On trying to load a LightGBM model produced using python/LightGBM 2.2.3/linux, loading into LightGBM 2.3.1 via Julia FFI on windows fails.
Specifically, this error is encountered while trying load a file for use in the test suite for the julia wrapper.
How you are using LightGBM?
Julia FFI wrapper to C -library
LightGBM component:
Environment info
Operating System: Windows
CPU/GPU model: CPU
LightGBM version or commit hash: 2.3.1
Error message and / or logs
https://github.com/IQVIA-ML/LightGBM.jl/pull/52/checks?check_run_id=727774501#step:7:110
Pastebin: https://pastebin.com/H2cucgHA
Reproducible example(s)
Please use model from here: https://github.com/IQVIA-ML/LightGBM.jl/blob/132d9eaebb6fba44f1cbc377ab0a00d4ac0d3244/test/ffi/data/gain_test_booster
Pastebin: https://pastebin.com/CQsDdR0P
This model was produced using Python LightGBM 2.2.3 on linux
Steps to reproduce
The text was updated successfully, but these errors were encountered: