[air] update xgboost test (catch test failures properly). #27023

xwjiang2010 · 2022-07-26T18:26:33Z

Update xgboost test (catch test failures properly)
Remove path from from_model for XGBoostCheckpoint and LightGbmCheckpoint.

Signed-off-by: xwjiang2010 [email protected]

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

remove `path` from `from_model` for XGBoostCheckpoint and LightGbmCheckpoint. Signed-off-by: xwjiang2010 <[email protected]>

Signed-off-by: xwjiang2010 <[email protected]>

jiaodong · 2022-07-26T20:35:04Z

python/ray/train/lightgbm/lightgbm_checkpoint.py

@@ -54,14 +53,16 @@ def from_model(
            >>>
            >>> predictor = LightGBMPredictor.from_checkpoint(checkpoint)  # doctest: +SKIP # noqa: #501
        """
-        booster.save_model(os.path.join(path, MODEL_KEY))


wouldn't this lead to saving full checkpoint to a new temp directory rather than a user given path ? Seems current behavior is better that user has control over where they want to save preprocessor to, and this PR's change is changing it to an ephemeral path we created

In this proposal, the user doesn't supply a path anymore.

This is a simple API that solves a lot of headaches (e.g. user managing temporary directories). If users need more efficiency by specifying their own non-ephemeral path, we can add this if this is requested. Most xgboost models are small, the biggest ones I've seen in production are about 50MB. So ser/de should be relatively fast.

jiaodong · 2022-07-26T20:35:48Z

python/ray/train/tests/test_lightgbm_predictor.py

-    with tempfile.TemporaryDirectory() as tmpdir:
-        checkpoint = LightGBMCheckpoint.from_model(booster=model, path=tmpdir)
-        predictor = LightGBMPredictor.from_checkpoint(checkpoint)
+    checkpoint = LightGBMCheckpoint.from_model(booster=model)


I think if we need to test checkpoint, creating tmpdir in unittest is better than surfacing it to class checkpoint implementation level

jiaodong · 2022-07-26T20:36:42Z

is there a run attached in PR that we verify exceptions will be surfaced and fail the command in nightly release tests ?

xwjiang2010 · 2022-07-26T21:31:45Z

is there a run attached in PR that we verify exceptions will be surfaced and fail the command in nightly release tests ?

Let me rebase on current master since Cheng should have a revert that fixes the training issue.

xwjiang2010 · 2022-07-26T21:35:15Z

@jiaodong I think the reason for such conversion is that for xgboost/lightgbm trainers, preprocessor is saved to a directory path, whereas for tensorflow/torch trainers, preprocessor is added as an entry of dict.

Instead of what is being done in this PR (since it is not very efficient), an alternative would be
from_model(model_path, preprocessor) with the assumption that Ray AIR can just write preprocessor to the model directory and then do Checkpoint.from_directory(model_dir).

@krfricke @jiaodong any preferences?

Update:
Actually thinking more, this is less optimal as we may end up pickle a lot more files than needed, if user has more than just the model file under the directory path.

jiaodong · 2022-07-26T22:38:23Z

Discussed offline, seems like we have a more lower level issue about checkpoint and framework checkpoint that essentially a set_processor equivalent operation needs to go through more hops than needed :/ Let me read the past checkpoint design docs a bit and try to synthesize all context first.

krfricke

Looks great, thanks

…t#27023) - Update xgboost test (catch test failures properly) - Remove `path` from `from_model` for XGBoostCheckpoint and LightGbmCheckpoint. Signed-off-by: xwjiang2010 <[email protected]> Signed-off-by: Rohan138 <[email protected]>

…t#27023) - Update xgboost test (catch test failures properly) - Remove `path` from `from_model` for XGBoostCheckpoint and LightGbmCheckpoint. Signed-off-by: xwjiang2010 <[email protected]> Signed-off-by: Stefan van der Kleij <[email protected]>

[air] update xgboost test (catch test failures properly).

76facaa

remove `path` from `from_model` for XGBoostCheckpoint and LightGbmCheckpoint. Signed-off-by: xwjiang2010 <[email protected]>

xwjiang2010 assigned krfricke and jiaodong Jul 26, 2022

fix test

30e236b

Signed-off-by: xwjiang2010 <[email protected]>

jiaodong reviewed Jul 26, 2022

View reviewed changes

xwjiang2010 added this to the Ray AIR milestone Jul 27, 2022

krfricke approved these changes Jul 27, 2022

View reviewed changes

krfricke merged commit 4c30325 into ray-project:master Jul 27, 2022

matthewdeng mentioned this pull request Jul 28, 2022

[air] fix xgboost_benchmark script by passing in args #27146

Merged

7 tasks

xwjiang2010 deleted the xgboost_followup branch July 26, 2023 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[air] update xgboost test (catch test failures properly). #27023

[air] update xgboost test (catch test failures properly). #27023

xwjiang2010 commented Jul 26, 2022 •

edited

Loading

jiaodong Jul 26, 2022

krfricke Jul 27, 2022

jiaodong Jul 26, 2022

jiaodong commented Jul 26, 2022

xwjiang2010 commented Jul 26, 2022

xwjiang2010 commented Jul 26, 2022 •

edited

Loading

jiaodong commented Jul 26, 2022

krfricke left a comment

[air] update xgboost test (catch test failures properly). #27023

[air] update xgboost test (catch test failures properly). #27023

Conversation

xwjiang2010 commented Jul 26, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

jiaodong Jul 26, 2022

Choose a reason for hiding this comment

krfricke Jul 27, 2022

Choose a reason for hiding this comment

jiaodong Jul 26, 2022

Choose a reason for hiding this comment

jiaodong commented Jul 26, 2022

xwjiang2010 commented Jul 26, 2022

xwjiang2010 commented Jul 26, 2022 • edited Loading

jiaodong commented Jul 26, 2022

krfricke left a comment

Choose a reason for hiding this comment

xwjiang2010 commented Jul 26, 2022 •

edited

Loading

xwjiang2010 commented Jul 26, 2022 •

edited

Loading