-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[serve] Add Google Cloud Storage as a backend #20104
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Can you also add "gcs" uri support to https://github.com/ray-project/ray/blob/master/python/ray/serve/storage/checkpoint_path.py#L20
and doc at
https://github.com/ray-project/ray/blob/master/doc/source/serve/deployment.rst#failure-recovery
Thank you @simon-mo - I've adjusted the code and the doc :) |
Thanks for adding this support ! Can you also update your test plan in this PR with an example of running serve tests using GCS backend for checkpoint ? You can find one such example in https://sourcegraph.com/github.com/ray-project/ray/-/blob/release/serve_tests/workloads/serve_cluster_fault_tolerance.py?L54:59 that we kill the controller after initial deployment is done, then resume from checkpoint in external source. This test can be done using local ray cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending @jiaodong's comments. This is awesome!
@jiaodong I could, but only when I know a bucket name that you own and that the executor for that test has write access to that GCS bucket (so for example, |
@tkaymak that test i sent you is running on internal platform in particular so don't worry too much about writing to it :) What i was suggesting is a sanity e2e check using your own bucket on your laptop, then we can make internal changes needed to cover this case on CI afterwards. |
@jiaodong - what do you think about this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, thanks for adding this ! I will follow up with changing bucket name afterwards.
Why are these changes needed?
Ray serve currently provides the ability to restore from a checkpoint in S3, this PR adds the possibility to use Google Cloud Storage (GCS) as a backend
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.