-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add liveness and readiness probes for Ray worker pods. #308
Comments
@wuisawesome @iycheng @mwtian come to think of it, Is there a good way to health check a Ray worker node? |
cc @akanso @Jeffwan @wilsonwang371 |
oh whoops i missed the sync, but we should implement it as a flag for There's a question of if we should attempt to directly ping the raylet, or ask GCS/get the status from the NodeTable. Who would be responsible for probing? k8s deployment controller? or kuberay operator? |
The kubelet of the node on which the Ray pod is scheduled.
Maybe both? Especially if there's a possibility that these could fail independently. |
This is being covered in the context of Ray HA work. Closing to deduplicate. |
Search before asking
Description
The KubeRay operator should inject liveness and readiness probes for Ray pods.
Ray provides existing functionality ray health-check that should work for this purpose.
Readiness information should be carried into the RayCluster CR's status field.
The Ray autoscaler's resource heartbeat health-check should be toggled off once this is implemented.
(need to provide a flag in the Ray autoscaler code to turn that function off).
Use case
Better management of Ray worker pods!
Related issues
Exposing status info in the CRD
#223
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: