-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[train] Simplify ray.train.xgboost/lightgbm
(1/n): Align frequency-based and checkpoint_at_end
checkpoint formats
#42111
Conversation
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
…xgboost_ckpting Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
…ve an alias in tune Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
XGBoostTrainer
and LightGBMTrainer
checkpointingray.train.xgboost/lightgbm
(1/n): Align frequency-based and checkpoint_at_end
checkpoint formats
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
… hook gets called) Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
@@ -23,12 +23,17 @@ def from_model( | |||
booster: lightgbm.Booster, | |||
*, | |||
preprocessor: Optional["Preprocessor"] = None, | |||
path: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need these changes if we're centralizing on the Callbacks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope I can get rid of it. If anybody does use this, specifying your own temp dir might be useful though if you want it to be cleaned up after.
from ray.train.lightgbm import RayTrainReportCallback | ||
|
||
# Get a `Checkpoint` object that is saved by the callback during training. | ||
result = trainer.fit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: For consistency with this, should we update the training example to use the LightGBMTrainer
? Same for xgboost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to add the *Trainer
examples once I add in a v2 xgboost/lightgbm trainer, since then it'll actually show the callback usage in the training func. Right now the user doesn't need to create the callback themselves.
@@ -38,6 +38,7 @@ class RayTrainReportCallback: | |||
independent xgboost trials (without data parallelism within a trial). | |||
|
|||
.. testcode:: | |||
:skipif: True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to add them back later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This used to be a code-block that didn't run 😅 I just wanted to show a mock xgboost.train
call with the callback inside, without needing to specify the dataset and everything.
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
|
||
|
||
@PublicAPI(stability="beta") | ||
class RayTrainReportCallback: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TuneCallback
for lgbm was originally an empty class that wasn't referenced anywhere else so I just removed it.
Signed-off-by: Justin Yu <[email protected]>
Why are these changes needed?
This PR fixes
XGBoostTrainer
andLightGBMTrainer
checkpointing:ray.train.xgboost/lightgbm.RayTrainReportCallback
as the standard utilities to define checkpoints save/load format.XGBoostCheckpoint
, (2)ray.tune.integration.xgboost.TuneReportCheckpointCallback
, and (3)XGBoostTrainer._save_model
.XGBoostCheckpoint.MODEL_FILENAME
constant in some places. But, we re-implemented thefrom_model
andget_model
logic for some reason.CheckpointConfig(checkpoint_frequency)
ray.train.*.RayTrainReportCallback
) that handles bothcheckpoint_frequency
andcheckpoint_at_end
. This codepath standardizes on the framework specific checkpoint implementation of checkpoint saving.TuneReportCallback
). The migration is simple:TuneReportCallback() -> TuneReportCheckpointCallback(frequency=0)
.ray.tune
andxgboost_ray
/lightgbm_ray
.xgboost_ray -> ray.tune.* -> ray.train.* -> ray.train.xgboost -> xgboost_ray
ImportError
whichxgboost_ray
incorrectly used to determine whether Ray Train/Tune were installed.xgboost_ray
andlightgbm_ray
dependencies by re-implementing simple versions of these trainers asDataParallelTrainer
s. See: [train] Simplifyray.train.xgboost/lightgbm
(2/n): Re-implementXGBoostTrainer
as a lightweightDataParallelTrainer
#42767.API Change Summary
ray.train.xgboost.RayTrainReportCallback
ray.train.lightning.RayTrainReportCallback
. This will be exposed to users if they have full control over the training loop in the new simplifiedXGBoostTrainer
.ray.train.xgboost.RayTrainReportCallback.get_model(filename)
XGBoostTrainer.get_model
in the future.ray.tune.integration.xgboost.TuneReportCheckpointCallback
The same APIs are introduced for the lightgbm counterparts.
TODOs left for followups
xgboost_ray
right now.checkpoint_at_end
vs.checkpoint_frequency
overlap logic for the test case with a TODO intest_xgboost_trainer
after switching to the simplified xgboost trainer.Related issue number
Closes #41608
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.