-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[air] update xgboost test (catch test failures properly). #27023
Conversation
remove `path` from `from_model` for XGBoostCheckpoint and LightGbmCheckpoint. Signed-off-by: xwjiang2010 <[email protected]>
Signed-off-by: xwjiang2010 <[email protected]>
@@ -54,14 +53,16 @@ def from_model( | |||
>>> | |||
>>> predictor = LightGBMPredictor.from_checkpoint(checkpoint) # doctest: +SKIP # noqa: #501 | |||
""" | |||
booster.save_model(os.path.join(path, MODEL_KEY)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't this lead to saving full checkpoint to a new temp directory rather than a user given path ? Seems current behavior is better that user has control over where they want to save preprocessor to, and this PR's change is changing it to an ephemeral path we created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this proposal, the user doesn't supply a path anymore.
This is a simple API that solves a lot of headaches (e.g. user managing temporary directories). If users need more efficiency by specifying their own non-ephemeral path, we can add this if this is requested. Most xgboost models are small, the biggest ones I've seen in production are about 50MB. So ser/de should be relatively fast.
with tempfile.TemporaryDirectory() as tmpdir: | ||
checkpoint = LightGBMCheckpoint.from_model(booster=model, path=tmpdir) | ||
predictor = LightGBMPredictor.from_checkpoint(checkpoint) | ||
checkpoint = LightGBMCheckpoint.from_model(booster=model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we need to test checkpoint, creating tmpdir in unittest is better than surfacing it to class checkpoint implementation level
is there a run attached in PR that we verify exceptions will be surfaced and fail the command in nightly release tests ? |
Let me rebase on current master since Cheng should have a revert that fixes the training issue. |
@jiaodong I think the reason for such conversion is that for xgboost/lightgbm trainers, preprocessor is saved to a directory path, whereas for tensorflow/torch trainers, preprocessor is added as an entry of dict. Instead of what is being done in this PR (since it is not very efficient), an alternative would be @krfricke @jiaodong any preferences? Update: |
Discussed offline, seems like we have a more lower level issue about checkpoint and framework checkpoint that essentially a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks
…t#27023) - Update xgboost test (catch test failures properly) - Remove `path` from `from_model` for XGBoostCheckpoint and LightGbmCheckpoint. Signed-off-by: xwjiang2010 <[email protected]> Signed-off-by: Rohan138 <[email protected]>
…t#27023) - Update xgboost test (catch test failures properly) - Remove `path` from `from_model` for XGBoostCheckpoint and LightGbmCheckpoint. Signed-off-by: xwjiang2010 <[email protected]> Signed-off-by: Stefan van der Kleij <[email protected]>
path
fromfrom_model
for XGBoostCheckpoint and LightGbmCheckpoint.Signed-off-by: xwjiang2010 [email protected]
Why are these changes needed?
Related issue number
#26612
Checks
scripts/format.sh
to lint the changes in this PR.