-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIR/train] Use new Train API #25735
Merged
Merged
Changes from 87 commits
Commits
Show all changes
89 commits
Select commit
Hold shift + click to select a range
b39a864
Use new Train API for examples
Yard1 b31399e
Fix FailureConfig not being a dataclass
Yard1 5cc9229
Fix errors
Yard1 baaf8a5
Merge branch 'master' into use_new_train_api
Yard1 5230218
Fix
Yard1 ef4a3fc
Fix link
Yard1 f5cfe62
Fix simple example
Yard1 468f7e8
train loop utils
Yard1 4ef6302
Remove tensorboard example
Yard1 5db3c14
PBT test update
Yard1 cb805f2
WIP
Yard1 2f69e37
Do not use pipeline
Yard1 0d8eeb4
Remove callback test
Yard1 4a3103e
Examples tests
Yard1 f7f3ea8
Move tests
Yard1 50ca40b
Fixture fix
Yard1 1872f73
Merge branch 'master' into use_new_train_api
Yard1 10d88d3
Merge branch 'master' into use_new_train_api
Yard1 20b7075
CI fixes
Yard1 c3b7d42
Fix
Yard1 33f8fd1
Merge branch 'master' into use_new_train_api
Yard1 37b8182
Apply suggestions from code review
Yard1 6f8d7e0
Fix tracked checkpoint error
Yard1 85cb1a7
CI fixes
Yard1 86a71d6
Add checkpoint configuration to `RunConfig`
Yard1 41eb780
Add `best_checkpoint` and `dataframe` to `Result`
Yard1 eb2eb67
Tests, fixes
Yard1 024932e
Result grid tweaks
Yard1 abf2cdc
Extend
Yard1 1f1d28b
Merge branch 'ray-project:master' into more_checkpoint_configurability
Yard1 563bc33
Update result_grid.py
Yard1 d0261be
Fix
Yard1 56df493
Lint
Yard1 ef0c75a
Lint
Yard1 3464c93
WIP
Yard1 ee87c12
Renaming
Yard1 fe9d68e
Merge branch 'more_checkpoint_configurability' into use_new_train_api
Yard1 b10fe1e
Improve test coverage
Yard1 4dbccca
Simplify
Yard1 27e531c
Docstring tweak
Yard1 7d1abfe
Remove docstring
Yard1 b0dd3ba
Fix
Yard1 1c2e4b1
Merge branch 'more_checkpoint_configurability' into use_new_train_api
Yard1 5b226ab
Tweak docstring
Yard1 65ce1d3
Fix
Yard1 555f705
Merge branch 'more_checkpoint_configurability' into use_new_train_api
Yard1 1e1fbea
Use CheckpointStrategy
Yard1 3aa277d
Merge branch 'master' into more_checkpoint_configurability
Yard1 e19d40f
Fix
Yard1 5cbb15f
Merge branch 'master' into more_checkpoint_configurability
Yard1 fd96174
dataframe -> metrics_dataframe
Yard1 8d5f1b3
CheckpointStrategy -> CheckpointConfig
Yard1 0482bce
Missed this
Yard1 207d8d1
Merge branch 'more_checkpoint_configurability' into use_new_train_api
Yard1 0cb579d
Update test_result_grid.py
Yard1 7ade7e4
Fix
Yard1 0937dc8
Apply feeedback from code review
Yard1 49ffb18
Merge branch 'more_checkpoint_configurability' into use_new_train_api
Yard1 b993627
Fix lint
Yard1 9244b8e
Merge branch 'more_checkpoint_configurability' into use_new_train_api
Yard1 ed870bd
Update python/ray/train/__init__.py
Yard1 ad90782
Merge branch 'master' into more_checkpoint_configurability
Yard1 c777bb5
Merge branch 'master' into use_new_train_api
Yard1 77305b2
Merge branch 'more_checkpoint_configurability' into use_new_train_api
Yard1 a4fd532
Fix CI
Yard1 d0ae2ba
Use warnings.warn
Yard1 d44f750
Make method privat
Yard1 c9d3380
Update python/ray/util/ml_utils/checkpoint_manager.py
Yard1 5c0a753
Update checkpoint_manager.py
Yard1 19108f4
Merge branch 'more_checkpoint_configurability' into use_new_train_api
Yard1 44f62e0
Merge branch 'master' into use_new_train_api
Yard1 c7b783b
Fix test
Yard1 2e9ec66
Rename files
Yard1 2bf89d2
Use keras callback
Yard1 375790e
Revert docstring changes
Yard1 de5103e
Merge branch 'master' into use_new_train_api
Yard1 baaaf47
Rename example files in docs
Yard1 d931a50
Merge branch 'master' into use_new_train_api
Yard1 691ce99
Add legacy tests
Yard1 b407873
Merge branch 'master' into use_new_train_api
Yard1 2c7611c
Merge branch 'ray-project:master' into use_new_train_api
Yard1 587ad56
Add todo
Yard1 0b05727
Merge branch 'master' into use_new_train_api
Yard1 139f44d
Use `trial_logdir` instead
Yard1 3a4d3f3
Fix
Yard1 a064f96
Merge branch 'ray-project:master' into use_new_train_api
Yard1 302d336
Merge branch 'ray-project:master' into use_new_train_api
Yard1 2ea93d7
Only print metrics
Yard1 f0d3beb
Merge branch 'master' into use_new_train_api
Yard1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
:orphan: | ||
|
||
torch_fashion_mnist_example | ||
=========================== | ||
|
||
.. literalinclude:: /../../python/ray/train/examples/torch_fashion_mnist_example.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
:orphan: | ||
|
||
torch_linear_dataset_example | ||
============================ | ||
|
||
.. literalinclude:: /../../python/ray/train/examples/torch_linear_dataset_example.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
:orphan: | ||
|
||
torch_linear_example | ||
==================== | ||
|
||
.. literalinclude:: /../../python/ray/train/examples/torch_linear_example.py |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
:orphan: | ||
|
||
tune_cifar_torch_pbt_example | ||
============================ | ||
|
||
.. literalinclude:: /../../python/ray/train/examples/tune_cifar_torch_pbt_example.py |
This file was deleted.
Oops, something went wrong.
6 changes: 6 additions & 0 deletions
6
doc/source/train/examples/tune_torch_linear_dataset_example.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
:orphan: | ||
|
||
tune_torch_linear_dataset_example | ||
================================= | ||
|
||
.. literalinclude:: /../../python/ray/air/examples/pytorch/tune_torch_linear_dataset_example.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,40 +1,53 @@ | ||
from ray import train | ||
from ray.train import Trainer | ||
from ray.train.callbacks import MLflowLoggerCallback, TBXLoggerCallback | ||
from ray.air import RunConfig | ||
from ray.train.torch import TorchTrainer | ||
from ray.tune.integration.mlflow import MLflowLoggerCallback | ||
from ray.tune.logger import TBXLoggerCallback | ||
|
||
|
||
def train_func(): | ||
for i in range(3): | ||
train.report(epoch=i) | ||
|
||
|
||
trainer = Trainer(backend="torch", num_workers=2) | ||
trainer.start() | ||
trainer = TorchTrainer( | ||
train_func, | ||
scaling_config={"num_workers": 2}, | ||
run_config=RunConfig( | ||
callbacks=[ | ||
MLflowLoggerCallback(experiment_name="train_experiment"), | ||
TBXLoggerCallback(), | ||
], | ||
), | ||
) | ||
|
||
# Run the training function, logging all the intermediate results | ||
# to MLflow and Tensorboard. | ||
result = trainer.run( | ||
train_func, | ||
callbacks=[ | ||
MLflowLoggerCallback(experiment_name="train_experiment"), | ||
TBXLoggerCallback(), | ||
], | ||
) | ||
result = trainer.fit() | ||
|
||
# Print the latest run directory and keep note of it. | ||
# For example: /home/ray_results/train_2021-09-01_12-00-00/run_001 | ||
print("Run directory:", trainer.latest_run_dir) | ||
# For MLFLow logs: | ||
|
||
# MLFlow logs will by default be saved in an `mlflow` directory | ||
# in the current working directory. | ||
|
||
trainer.shutdown() | ||
# $ cd mlflow | ||
# # View the MLflow UI. | ||
# $ mlflow ui | ||
|
||
# You can change the directory by setting the `tracking_uri` argument | ||
# in `MLflowLoggerCallback`. | ||
|
||
# For TensorBoard logs: | ||
|
||
# Print the latest run directory and keep note of it. | ||
# For example: /home/ubuntu/ray_results/TorchTrainer_2022-06-13_20-31-06 | ||
print("Run directory:", result.log_dir.parent) # TensorBoard is saved in parent dir | ||
|
||
# How to visualize the logs | ||
|
||
# Navigate to the run directory of the trainer. | ||
# For example `cd /home/ray_results/train_2021-09-01_12-00-00/run_001` | ||
# For example `cd /home/ubuntu/ray_results/TorchTrainer_2022-06-13_20-31-06` | ||
# $ cd <TRAINER_RUN_DIR> | ||
# | ||
# # View the MLflow UI. | ||
# $ mlflow ui | ||
# | ||
# # View the tensorboard UI. | ||
# $ tensorboard --logdir . |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice