-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate RayCluster Hash on KubeRay Version Change #2320
Conversation
Signed-off-by: Ryan O'Leary <[email protected]>
cc: @kevin85421, @andrewsykim |
@kevin85421 is the desired behavior here to immediately start a zero-downtime upgrade ( |
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
No, if the RayService CR's
After these updates, we can use the new |
// and new KubeRay version, but do not restart the RayCluster. | ||
activeKubeRayVersion := activeRayCluster.ObjectMeta.Annotations[utils.KubeRayVersion] | ||
if activeKubeRayVersion != utils.KUBERAY_VERSION { | ||
activeRayCluster.ObjectMeta.Annotations[utils.HashWithoutReplicasAndWorkersToDeleteKey] = goalClusterHash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not to update the input argument (activeRayCluster
) inside this function. How about removing the logic for updating annotations here and directly return Update
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 5b5fe07, we now just log and call Update
and the cluster hash and KubeRay version are updated in constructRayClusterForRayService
.
Signed-off-by: Ryan O'Leary <[email protected]>
Thanks! I will manually test it. |
Signed-off-by: ryanaoleary <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding a unit test in rayservice_controller_unit_test.go
? Maybe in TestReconcileRayCluster
, or you can add a new test if needed.
Signed-off-by: Ryan O'Leary <[email protected]>
Added a unit test case in f9c61e7 |
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: ryanaoleary <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Others look good to me! I will manually test it after the comments are addressed.
ray-operator/controllers/ray/rayservice_controller_unit_test.go
Outdated
Show resolved
Hide resolved
runtimeObjects = append(runtimeObjects, tc.activeCluster.DeepCopy()) | ||
} | ||
fakeClient := clientFake.NewClientBuilder().WithScheme(newScheme).WithRuntimeObjects(runtimeObjects...).Build() | ||
r := RayServiceReconciler{ | ||
Client: fakeClient, | ||
Scheme: newScheme, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without setting the Scheme, the test returns a nil pointer error when creating a new RayCluster in constructRayClusterForRayService
which gets called when updating the RayCluster.
ray-operator/controllers/ray/rayservice_controller_unit_test.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Ryan O'Leary <[email protected]>
LGTM |
1 similar comment
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I manually test this PR by following this issue: #2315 (comment).
- Zero downtime upgrade will not be triggered accidentally after upgrading.
- The annotation
ray.io/kuberay-version
is added to the RayCluster CR correctly. - Then, I updated
rayVersion
from2.34.0
to2.100.0
to trigger zero downtime upgrade. It is triggered as expected.
* Re-generate hash when KubeRay version changes Signed-off-by: Ryan O'Leary <[email protected]> * Change logic to DoNothing on KubeRay version mismatch Signed-off-by: Ryan O'Leary <[email protected]> * Add KubeRay version annotation to test Signed-off-by: Ryan O'Leary <[email protected]> * Move update logic Signed-off-by: Ryan O'Leary <[email protected]> * Update rayservice_controller.go Signed-off-by: ryanaoleary <[email protected]> * Add unit test Signed-off-by: Ryan O'Leary <[email protected]> * Add period Signed-off-by: Ryan O'Leary <[email protected]> * Go vet changes Signed-off-by: Ryan O'Leary <[email protected]> * Update rayservice_controller_unit_test.go Signed-off-by: ryanaoleary <[email protected]> * Address test comments Signed-off-by: Ryan O'Leary <[email protected]> --------- Signed-off-by: Ryan O'Leary <[email protected]> Signed-off-by: ryanaoleary <[email protected]>
…) (#2339) * Re-generate hash when KubeRay version changes * Change logic to DoNothing on KubeRay version mismatch * Add KubeRay version annotation to test * Move update logic * Update rayservice_controller.go * Add unit test * Add period * Go vet changes * Update rayservice_controller_unit_test.go * Address test comments --------- Signed-off-by: Ryan O'Leary <[email protected]> Signed-off-by: ryanaoleary <[email protected]> Co-authored-by: ryanaoleary <[email protected]>
Why are these changes needed?
This PR adds a new
ray.io/kuberay-version
annotation to RayClusters created with KubeRay. This annotation is used to check for a KubeRay version change, which can cause a RayService to restart and create a new RayCluster erroneously due to a hash mismatch. If the KubeRay version annotation differs when checkingshouldPrepareNewRayCluster
and the RayCluster hashes do not match, the controller updates theutils.HashWithoutReplicasAndWorkersToDeleteKey
annotation, KubeRay version annotation, and returnsDoNothing
to avoid restarting the RayService on version change. This unblocks the version upgrade to KubeRayv1.2.0
.This PR was tested by following the reproduction script in the linked issue and verifying that
Active RayCluster config doesn't match goal config. RayService operator should prepare a new Ray cluster
does not occur.Related issue number
Closes #2315
Checks