Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[air] update xgboost test (catch test failures properly). #27023

Merged
merged 2 commits into from
Jul 27, 2022

Conversation

xwjiang2010
Copy link
Contributor

@xwjiang2010 xwjiang2010 commented Jul 26, 2022

  • Update xgboost test (catch test failures properly)
  • Remove path from from_model for XGBoostCheckpoint and LightGbmCheckpoint.

Signed-off-by: xwjiang2010 [email protected]

Why are these changes needed?

Related issue number

#26612

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

remove `path` from `from_model` for XGBoostCheckpoint and LightGbmCheckpoint.

Signed-off-by: xwjiang2010 <[email protected]>
Signed-off-by: xwjiang2010 <[email protected]>
@@ -54,14 +53,16 @@ def from_model(
>>>
>>> predictor = LightGBMPredictor.from_checkpoint(checkpoint) # doctest: +SKIP # noqa: #501
"""
booster.save_model(os.path.join(path, MODEL_KEY))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this lead to saving full checkpoint to a new temp directory rather than a user given path ? Seems current behavior is better that user has control over where they want to save preprocessor to, and this PR's change is changing it to an ephemeral path we created

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this proposal, the user doesn't supply a path anymore.

This is a simple API that solves a lot of headaches (e.g. user managing temporary directories). If users need more efficiency by specifying their own non-ephemeral path, we can add this if this is requested. Most xgboost models are small, the biggest ones I've seen in production are about 50MB. So ser/de should be relatively fast.

with tempfile.TemporaryDirectory() as tmpdir:
checkpoint = LightGBMCheckpoint.from_model(booster=model, path=tmpdir)
predictor = LightGBMPredictor.from_checkpoint(checkpoint)
checkpoint = LightGBMCheckpoint.from_model(booster=model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we need to test checkpoint, creating tmpdir in unittest is better than surfacing it to class checkpoint implementation level

@jiaodong
Copy link
Member

is there a run attached in PR that we verify exceptions will be surfaced and fail the command in nightly release tests ?

@xwjiang2010
Copy link
Contributor Author

is there a run attached in PR that we verify exceptions will be surfaced and fail the command in nightly release tests ?

Let me rebase on current master since Cheng should have a revert that fixes the training issue.

@xwjiang2010
Copy link
Contributor Author

xwjiang2010 commented Jul 26, 2022

@jiaodong I think the reason for such conversion is that for xgboost/lightgbm trainers, preprocessor is saved to a directory path, whereas for tensorflow/torch trainers, preprocessor is added as an entry of dict.

Instead of what is being done in this PR (since it is not very efficient), an alternative would be
from_model(model_path, preprocessor) with the assumption that Ray AIR can just write preprocessor to the model directory and then do Checkpoint.from_directory(model_dir).

@krfricke @jiaodong any preferences?

Update:
Actually thinking more, this is less optimal as we may end up pickle a lot more files than needed, if user has more than just the model file under the directory path.

@jiaodong
Copy link
Member

Discussed offline, seems like we have a more lower level issue about checkpoint and framework checkpoint that essentially a set_processor equivalent operation needs to go through more hops than needed :/ Let me read the past checkpoint design docs a bit and try to synthesize all context first.

@xwjiang2010 xwjiang2010 added this to the Ray AIR milestone Jul 27, 2022
Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks

@krfricke krfricke merged commit 4c30325 into ray-project:master Jul 27, 2022
Rohan138 pushed a commit to Rohan138/ray that referenced this pull request Jul 28, 2022
…t#27023)

- Update xgboost test (catch test failures properly)
- Remove `path` from `from_model` for XGBoostCheckpoint and LightGbmCheckpoint.

Signed-off-by: xwjiang2010 <[email protected]>
Signed-off-by: Rohan138 <[email protected]>
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
…t#27023)

- Update xgboost test (catch test failures properly)
- Remove `path` from `from_model` for XGBoostCheckpoint and LightGbmCheckpoint.

Signed-off-by: xwjiang2010 <[email protected]>
Signed-off-by: Stefan van der Kleij <[email protected]>
@xwjiang2010 xwjiang2010 deleted the xgboost_followup branch July 26, 2023 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants