-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rayservice] Add support for getting multi-app status #1136
Conversation
1c75391
to
755c41c
Compare
d2372cd
to
18195e6
Compare
@kevin85421 The tests should be passing, please take a look when you get the chance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Others: Could you ensure that all variables in the tests are valid and avoid hardcoding the values? For example, both Status: "unhealthy"
and deploymentStatus.Status != "HEALTHY"
are present in the codebase.
@@ -648,68 +672,99 @@ func (r *RayServiceReconciler) updateServeDeployment(ctx context.Context, raySer | |||
// updates health timestamps, and checks if the RayCluster is overall healthy. | |||
// It's return values should be interpreted as | |||
// (Serve app healthy?, Serve app ready?, error if any) | |||
func (r *RayServiceReconciler) getAndCheckServeStatus(ctx context.Context, dashboardClient utils.RayDashboardClientInterface, rayServiceServeStatus *rayv1alpha1.RayServiceStatus, unhealthySecondThreshold *int32) (bool, bool, error) { | |||
func (r *RayServiceReconciler) getAndCheckServeStatus(ctx context.Context, dashboardClient utils.RayDashboardClientInterface, rayServiceServeStatus *rayv1alpha1.RayServiceStatus, serveConfigType utils.RayServeConfigType, unhealthySecondThreshold *int32) (bool, bool, error) { | |||
serviceUnhealthySecondThreshold := ServiceUnhealthySecondThreshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the variable names are self-explanatory, it would be helpful to have comments for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added comments for serveConfigType
above the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I am talking about ServiceUnhealthySecondThreshold
and serveConfigTypeForTesting
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see! Added comments now for them too :)
} | ||
|
||
// Check app status | ||
if app.Status != rayv1alpha1.ApplicationStatusEnum.RUNNING { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the definition of RUNNING
for a Serve application?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUNNING = all deployments are HEALTHY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that isHealthy
must be true if app.Status
is not RUNNING
?
Signed-off-by: Cindy Zhang <[email protected]>
Signed-off-by: cindyz <[email protected]>
@kevin85421 I've changed all of the hardcoded strings to the enums. |
Signed-off-by: cindyz <[email protected]>
Signed-off-by: cindyz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave some minor comments. By the way, I have not reviewed changes in apiserver
and proto
. We can either ping some KubeRay API Server folks to review them or revert the changes and do it in a follow-up PR, or ask KubeRay API Server folks for help.
// If the user has set a value for `ServeConfigV2`, the config type is MULTI_APP | ||
// Otherwise, the user should have set a value for `ServeConfig`, in which case the config type is SINGLE_APP | ||
func (r *RayServiceReconciler) determineServeConfigType(ctx context.Context, rayServiceInstance *rayv1alpha1.RayService) utils.RayServeConfigType { | ||
if rayServiceInstance.Spec.ServeConfigV2 == (rayv1alpha1.ServeConfigV2{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to use ==
to compare two structs or should we use DeepEqual
instead?
@@ -648,68 +672,99 @@ func (r *RayServiceReconciler) updateServeDeployment(ctx context.Context, raySer | |||
// updates health timestamps, and checks if the RayCluster is overall healthy. | |||
// It's return values should be interpreted as | |||
// (Serve app healthy?, Serve app ready?, error if any) | |||
func (r *RayServiceReconciler) getAndCheckServeStatus(ctx context.Context, dashboardClient utils.RayDashboardClientInterface, rayServiceServeStatus *rayv1alpha1.RayServiceStatus, unhealthySecondThreshold *int32) (bool, bool, error) { | |||
func (r *RayServiceReconciler) getAndCheckServeStatus(ctx context.Context, dashboardClient utils.RayDashboardClientInterface, rayServiceServeStatus *rayv1alpha1.RayServiceStatus, serveConfigType utils.RayServeConfigType, unhealthySecondThreshold *int32) (bool, bool, error) { | |||
serviceUnhealthySecondThreshold := ServiceUnhealthySecondThreshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the comments?
} | ||
|
||
// Check app status | ||
if app.Status != rayv1alpha1.ApplicationStatusEnum.RUNNING { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that isHealthy
must be true if app.Status
is not RUNNING
?
HealthLastUpdateTime: &timeNow, | ||
} | ||
|
||
if deployment.Status != rayv1alpha1.DeploymentStatusEnum.HEALTHY { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a replica is not healthy, it is unable to serve any requests. Is it correct?
ray-operator/controllers/ray/rayservice_controller_unit_test.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/rayservice_controller_unit_test.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/rayservice_controller_unit_test.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/rayservice_controller_unit_test.go
Outdated
Show resolved
Hide resolved
Hmm, I don't think it's possible to revert the changes (correct me if I'm wrong), because the code in |
Signed-off-by: cindyz <[email protected]>
Signed-off-by: cindyz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (for changes not under apiserver/
and proto/
)
cc @Jeffwan @scarlet25151 @blublinsky would you mind taking a look at the changes for apiserver/
and proto/
? Thanks!
Tomorrow, I will review the changes made under the |
@kevin85421 Resolved the conflicts! As for the changes under apiserver and proto, I only made the bare minimum changes to make the code compile because the structs changed in |
Changes to |
proto/ and apiserver/ parts LGTM. |
Thank @blublinsky and @scarlet25151 for the review! |
CC: @architkulkarni, this pull request should not be included in the minor release v0.5.2. Is it acceptable to merge this pull request at this time? |
@kevin85421 Sounds good, fine with me. |
Add support for getting multi-app status
Why are these changes needed?
Add a field
serveConfigV2
in the RayService CRD which will be filled in in a follow-up PR.serveConfigV2
corresponds to the Serve multi-application schema. If the user specifies the serve config usingserveConfigV2
, we should read the status from the multi-app endpoint (GET /api/serve/applications/).This PR only adds the support for pulling the status from the multi-app endpoint. A follow up PR will add support for submitting the serve config to the multi-app endpoint.
Example Service Status:
Other changes:
generateServeStatus
, since it is (1) unrealistic because in reality the dashboard client would never fill in those values and (2) ineffective because the values are overwritten in the reconcile loop of the rayservice controller anyways.Related issue number
Checks