-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tune] Fix storage client creation when sync function tpl is not provided (#26714) #26717
[Tune] Fix storage client creation when sync function tpl is not provided (#26714) #26717
Conversation
…ided (ray-project#26714) Signed-off-by: Rohit Annigeri <[email protected]>
d76ac1a
to
d0015a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rohit-annigeri - actually that part is correct. Your change will use the legacy sync client which is deprecated and even removed on master.
The correct fix here is to not try to sync the directory if it already exists locally. This regression actually exists in master as well. Here is a draft PR fixing it: #26725
I'll add tests to that PR to make sure this doesn't come up again and that everything works.
Could you apply that patch to 1.13.0 and see if it fixes your problem?
Thanks!
|
I mean if you apply the changes from #26725 to The current fix in this PR disables syncing via pyarrow fs completely (as The correct location to add the check would be at https://github.com/ray-project/ray/blob/releases/1.13.0/python/ray/tune/trainable.py#L536 to not trigger syncing if the directory already exists. I can also update this PR if you prefer.
Thanks! |
Signed-off-by: Rohit Annigeri <[email protected]>
I think I follow. I've updated the PR to reflect the change. Let me know if it makes sense. Not really sure if this is the best way to handle the exception |
Signed-off-by: Rohit Annigeri <[email protected]>
Signed-off-by: Rohit Annigeri <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. We will merge this once we commence work on 1.13.1. I may update this PR with test cases when updating #26725
Signed-off-by: Kai Fricke <[email protected]>
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
hey @krfricke is this PR still good to go ? |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
Generally yes, but there is currently no timeline for 1.13.1 |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
Why are these changes needed?
It solves issue of restore from object failing to work during ray tune with sync config and PBT scheduler.
Related issue number
Closes #26714
Checks
scripts/format.sh
to lint the changes in this PR.