-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tune] [Doc] Tune checkpointing and Tuner restore docfix #29411
[Tune] [Doc] Tune checkpointing and Tuner restore docfix #29411
Conversation
…inks Signed-off-by: Justin Yu <[email protected]>
…r_restore_docfix
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of nits, but this looks really good!
doc/source/tune/api_docs/checkpointing/function-checkpointing.rst
Outdated
Show resolved
Hide resolved
Signed-off-by: Justin Yu <[email protected]>
Really love this edition. The modularized sections look good to me. |
Signed-off-by: Justin Yu <[email protected]>
…r_restore_docfix Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
|
||
Tune also may copy or move checkpoints during the course of tuning. For this purpose, | ||
it is important not to depend on absolute paths in the implementation of ``save``. | ||
.. include:: checkpointing/function-checkpointing.rst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why break this out into a separate file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reuse this section in the Tune "working with checkpoints" user guide, since that's where I would intuitively look for an example of how to actually checkpoint. I've commented where that is below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you instead link it, instead of re-rendering it in two places?
Otherwise for example, the search results are going to get cluttered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So concretely, don't break it out into a separate file, put it in the "working with checkpoints" part, and use a relative reference here linking to that "working with checkpoints" section
See below for examples of saving and loading trial-level checkpoints. | ||
|
||
|
||
How do I save and load trial checkpoints? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the examples again within the Tune user guide.
Signed-off-by: Justin Yu <[email protected]>
…r_restore_docfix
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
…r_restore_docfix
Signed-off-by: Justin Yu <[email protected]>
…#29411) Signed-off-by: Weichen Xu <[email protected]>
Why are these changes needed?
Tuner
will try to resume from the last checkpoint if an experiment of the samename
is found at<local_dir>/<exp_name>
. This was the case in the oldtune.run
API example since we specifiedresume="AUTO"
, but the newTuner
API always starts a new experiment. See attached issue for more details.CheckpointConfig
andSyncConfig
.Related issue number
Closes #28722
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.