Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc][AIR] Improve visibility of Trainer restore and stateful callback restoration #34350

Merged
merged 18 commits into from
Apr 20, 2023

Conversation

justinvyu
Copy link
Contributor

@justinvyu justinvyu commented Apr 13, 2023

Why are these changes needed?

This PR moves the FAQ section on Train experiment restoration to the DL/GBDT user guides. Plus this PR makes these code snippets tested. Secondly, this PR improves the docstrings of methods to implement for saving/restoring stateful Callbacks.

Links for reviewers

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@matthewdeng matthewdeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

doc/source/train/config_guide.rst Show resolved Hide resolved
doc/source/train/doc_code/key_concepts.py Outdated Show resolved Hide resolved
doc/source/train/dl_guide.rst Outdated Show resolved Hide resolved
doc/source/train/doc_code/dl_guide.py Outdated Show resolved Hide resolved
Comment on lines +55 to +56
Once checkpointing is enabled, you can follow :ref:`this guide <train-fault-tolerance>`
to enable fault tolerance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't want to link this since this is in the Deep Learning guide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was trying to share some content between the two. I linked it to a section that doesn't really depend on DL trainers. I think it's okay for now, and we can rethink the docs/user guide structure as part of the docs side of the layering project. wdyt?

@justinvyu justinvyu added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Apr 19, 2023
@richardliaw richardliaw merged commit 3d94498 into ray-project:master Apr 20, 2023
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
architkulkarni pushed a commit to architkulkarni/ray that referenced this pull request May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants