Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] Add Google Cloud Storage as a backend #20104

Merged
merged 12 commits into from
Nov 11, 2021

Conversation

tkaymak
Copy link
Contributor

@tkaymak tkaymak commented Nov 5, 2021

Why are these changes needed?

Ray serve currently provides the ability to restore from a checkpoint in S3, this PR adds the possibility to use Google Cloud Storage (GCS) as a backend

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests

Copy link
Contributor

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simon-mo simon-mo added the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Nov 6, 2021
@tkaymak
Copy link
Contributor Author

tkaymak commented Nov 8, 2021

Thank you @simon-mo - I've adjusted the code and the doc :)

@jiaodong
Copy link
Member

jiaodong commented Nov 8, 2021

Thanks for adding this support ! Can you also update your test plan in this PR with an example of running serve tests using GCS backend for checkpoint ? You can find one such example in https://sourcegraph.com/github.com/ray-project/ray/-/blob/release/serve_tests/workloads/serve_cluster_fault_tolerance.py?L54:59 that we kill the controller after initial deployment is done, then resume from checkpoint in external source. This test can be done using local ray cluster.

Copy link
Contributor

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending @jiaodong's comments. This is awesome!

@tkaymak
Copy link
Contributor Author

tkaymak commented Nov 8, 2021

@jiaodong I could, but only when I know a bucket name that you own and that the executor for that test has write access to that GCS bucket (so for example, GOOGLE_APPLICATION_CREDENTIALS is set for the test executor) - can you give me a bucket name?

@jiaodong
Copy link
Member

jiaodong commented Nov 9, 2021

@tkaymak that test i sent you is running on internal platform in particular so don't worry too much about writing to it :) What i was suggesting is a sanity e2e check using your own bucket on your laptop, then we can make internal changes needed to cover this case on CI afterwards.

@tkaymak
Copy link
Contributor Author

tkaymak commented Nov 10, 2021

@jiaodong - what do you think about this?

Copy link
Member

@jiaodong jiaodong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, thanks for adding this ! I will follow up with changing bucket name afterwards.

@simon-mo simon-mo merged commit 893f575 into ray-project:master Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants