-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve][Doc] Add Failure Recovery Doc #19166
Conversation
@@ -207,6 +207,46 @@ Please refer to the Kubernetes documentation for more information. | |||
.. _`NodePort`: https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types | |||
|
|||
|
|||
Failure Recovery | |||
================ | |||
Ray Serve is resilient to any component failures within the Ray cluster out of the box. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please specify how process and worker node failures are handled?
doc/source/serve/deployment.rst
Outdated
Failure Recovery | ||
================ | ||
Ray Serve is resilient to any component failures within the Ray cluster out of the box. | ||
However, when the Ray cluster goes down, you would need to recover the state by creating a new Ray cluster and re-deploys all Serve deployments into that cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify that this mean the head node specifically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that Ray is not currently HA, but it's on the long term roadmap so this is a "temporary limitation"
doc/source/serve/deployment.rst
Outdated
While we have native support for on disk and AWS S3 storage, there is no reason we cannot support more. | ||
You can easily try to plug into your own implementation using the ``custom://`` path and inherit the `KVStoreBase`_ class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add call to action to ask people to open a github issue and/or contribute a backend for this.
Why are these changes needed?
Port over the gist and highlight its experimental status.
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.