Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[train][docs] update docstrings/quickstarts to work when use_gpu=True #31692

Merged
merged 4 commits into from
Jan 26, 2023

Conversation

matthewdeng
Copy link
Contributor

Signed-off-by: Matthew Deng [email protected]

Fixes Trainer docstrings and quickstarts to work when use_gpu=True.

Why are these changes needed?

  1. Updated iter_torch_batches to move to the proper device.
    1. This is needed for both torch and horovod which use iter_torch_batches.
    2. This is not needed for tensorlfow and huggingface which handle device transfer natively.
  2. Extracted use_gpu to the top of each code snippet for easier control handling.

Related issue number

Closes #31684

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@matthewdeng matthewdeng changed the title Train quickstart gpu [train][docs] update docstrings/quickstarts to work when use_gpu=True Jan 16, 2023
Copy link
Member

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orthogonal to this, but I think that if we detect we are in a train session, iter_torch_batches should automatically use the default device unless specified otherwise.

@@ -12,6 +12,10 @@
from ray.train.huggingface import HuggingFaceTrainer
from ray.air.config import ScalingConfig


# If using GPUs, set this to True.
use_gpu = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumb, yet effective trick for cases you want to show users you can use gpus, but want cpus on CI:

use_gpu = True  # include in docs

use_gpu = False # exclude

<code using use_gpu>

@matthewdeng
Copy link
Contributor Author

Orthogonal to this, but I think that if we detect we are in a train session, iter_torch_batches should automatically use the default device unless specified otherwise.

@Yard1 yeah I was thinking the same thing, cc @stephanie-wang this could be a nice extension to the DatasetIterator!

@Yard1
Copy link
Member

Yard1 commented Jan 18, 2023

I've got a PR here, comments appreciated - #31745

@matthewdeng
Copy link
Contributor Author

matthewdeng commented Jan 26, 2023

@amogkam let me know how you want to coordinate this with #31753. If you're able to get your changes in by 2.3 I can update this PR to not set device.

@amogkam
Copy link
Contributor

amogkam commented Jan 26, 2023

Ah sorry- let's get this in ASAP...let me know when it's ready to merge.

I can update this documentation in that PR.

@matthewdeng
Copy link
Contributor Author

@amogkam I think it's good to merge as is!

@amogkam amogkam merged commit da79ae9 into ray-project:master Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Train] Ray Train PyTorch documentation example does not work out of the box with GPUs
5 participants