-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) #21086
Conversation
…uberay-autoscaler
@@ -0,0 +1,26 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be helpful when user run this in non-container environment? If user run this in kubernetes, I feel we can leverage pod restart to achieve the same goal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will try that before merging the PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeating sentiment to avoid crash-looping when possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, Dmitri, do you remember where the crash-looping was coming from? We might be able to avoid it by doing re-tries at a lower level (e.g. when connecting to Redis), that might lead to a more robust solution overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the retries are for connecting to the head node. Waiting for the Ray cluster to be ready in a more targeted / principled way sounds better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks super, assorted minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!!
|
- newName: kuberay/operator | ||
- newTag: nightly | ||
+ newName: rayproject/kuberay-operator | ||
+ newTag: latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we like to maintain the image in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we have versioned releases of the image in kuberay/operator
, either by doing releases with a version number, or just by git commit hash if we want to keep it low overhead that would be fine too. Then we can pin them here and update them whenever there is a new release :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pcmoritz for the great work! The revised version looks good to me now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually let me submit a follow up PR for the minor comments
@@ -13,6 +13,7 @@ Ray with Cluster Managers | |||
:maxdepth: 2 | |||
|
|||
kubernetes.rst | |||
kuberay.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pcmoritz Actually any reason we are using .md for this one vs. .rst?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe because .rst is awful :)
For consistency, we should probably switch back to .rst as we solidify the integration and its documentation.
[Kuberay](https://github.com/ray-project/kuberay) is a set of tools for running Ray on Kubernetes. | ||
It has been used by some larger corporations to deploy Ray on their infrastructure. | ||
Going forward, we would like to make this way of deployment accessible and seamless for | ||
all Ray users and standardize Ray deployment on Kubernetes around Kuberay's operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Kuberay" -> "KubeRay"
It has been used by some larger corporations to deploy Ray on their infrastructure. | ||
Going forward, we would like to make this way of deployment accessible and seamless for | ||
all Ray users and standardize Ray deployment on Kubernetes around Kuberay's operator. | ||
Presently you should consider this integration a minimal viable product that is not polished |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should label this integration as "Experimental" or "Alpha" to be consistent about setting expectations for other Ray features
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header does says "Experimental." Wouldn't hurt to repeat that word in this sentence though.
all Ray users and standardize Ray deployment on Kubernetes around Kuberay's operator. | ||
Presently you should consider this integration a minimal viable product that is not polished | ||
enough for general use and prefer the [Kubernetes integration](kubernetes.rst) for running | ||
Ray on Kubernetes. If you are brave enough to try the Kuberay integration out, this documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Kuberay" -> "KubeRay"
enough for general use and prefer the [Kubernetes integration](kubernetes.rst) for running | ||
Ray on Kubernetes. If you are brave enough to try the Kuberay integration out, this documentation | ||
is for you! We would love your feedback as a [Github issue](https://github.com/ray-project/ray/issues) | ||
including `[Kuberay]` in the title. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Kuberay" -> "KubeRay"
including `[Kuberay]` in the title. | ||
``` | ||
|
||
Here we describe how you can deploy a Ray cluster on Kuberay. The following instructions are for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Kuberay" -> "KubeRay"
Why are these changes needed?
This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md).
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.