-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tune] Make ResultGrid return cloud checkpoints #31437
[Tune] Make ResultGrid return cloud checkpoints #31437
Conversation
Signed-off-by: Antoni Baum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I have a few suggested changes:
Signed-off-by: Antoni Baum <[email protected]>
@justinvyu different approach, PTAL! |
Signed-off-by: Antoni Baum <[email protected]>
Signed-off-by: Antoni Baum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some nits:
@@ -185,6 +185,15 @@ def get_checkpoints_paths(logdir): | |||
) | |||
return chkpt_df | |||
|
|||
@staticmethod | |||
def get_remote_storage_path( | |||
local_path: str, logdir: str, remote_checkpoint_dir: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename remote_checkpoint_dir
to remote_logdir
? Seems like two different concepts with the current naming but one is just the cloud version of the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted to use the same names as in Trial
.
Co-authored-by: Justin Yu <[email protected]> Signed-off-by: Antoni Baum <[email protected]>
Signed-off-by: Antoni Baum <[email protected]>
Signed-off-by: Antoni Baum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
`Checkpoint`s returned by `ResultGrid` will never point to a cloud URI even if cloud syncing is enabled. This PR fixes this by saving information necessary to turn a local path into a remote path in `_TrackedCheckpoint`, and applying it during conversion to an AIR checkpoint. An alternate approach would be to save `remote_upload_dir` and `logdir` in `_TrackedCheckpoint` instead of a function, or for `Trainable.save` to return a tuple of `(local_path, remote_path)` if saved to cloud. Signed-off-by: Antoni Baum <[email protected]> Co-authored-by: Justin Yu <[email protected]> Signed-off-by: Andrea Pisoni <[email protected]>
Signed-off-by: Antoni Baum [email protected]
Why are these changes needed?
Checkpoint
s returned byResultGrid
will never point to a cloud URI even if cloud syncing is enabled. This PR fixes this by saving information necessary to turn a local path into a remote path in_TrackedCheckpoint
, and applying it during conversion to an AIR checkpoint.An alternate approach would be to save
remote_upload_dir
andlogdir
in_TrackedCheckpoint
instead of a function, or forTrainable.save
to return a tuple of(local_path, remote_path)
if saved to cloud.Related issue number
Closes #31001
Closes #28492
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.