Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add API for pipeline status check. Fixes #407. #599

Merged
merged 13 commits into from
Mar 15, 2023

Conversation

dpadhiar
Copy link
Contributor

This PR adds an API for a pipeline status check.

The information returned currently is if the pipeline is "healthy" or not and a message accompanying this.

Pipeline health here is determined by checking the vertices of the pipeline and comparing their pending message counts to their processing rates. If the processing rate of a vertex of 0 but there are multiple pending messages left, this indicates there may be an issue with the vertex that should be checked.

@@ -47,6 +47,12 @@ message VertexMetrics {
map<string, int64> pendings = 4;
}

// PipelineStatus
message PipelineStatus {
required bool health = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should go with a string type status, which has more meanings other than only true/false. For example, OK/Unknown/Error/...

vertexReq.Vertex = &vertex.Name
vertexResp, err := ps.GetVertexMetrics(ctx, vertexReq)
if err != nil {
return nil, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If err is not nil, should not return an error, it probably just because autoscaling down to 0 and then metrics are not available. In this case, we could just return the status as Unknown.

return nil, err
}
// check default pending msg and processing rates
if vertexResp.VertexMetrics[0].Pendings["default"] > 0 && vertexResp.VertexMetrics[0].ProcessingRates["default"] == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not only check [0], reduce vertex has more items in the array.

Based on the GetVertexMetrics() implementation, default is not always available in the map, so also need to check if it's existing.

if vertexResp.VertexMetrics[0].Pendings["default"] > 0 && vertexResp.VertexMetrics[0].ProcessingRates["default"] == 0 {
resp.Status = &daemon.PipelineStatus{
Health: pointer.Bool(false),
Message: pointer.String("Pipeline may have issue."),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More information in the message, such as which vertex has what issue.

@dpadhiar dpadhiar requested a review from whynowy March 13, 2023 21:13
@dpadhiar dpadhiar marked this pull request as ready for review March 13, 2023 21:52
@dpadhiar dpadhiar requested a review from vigith as a code owner March 13, 2023 21:52
@@ -47,6 +47,12 @@ message VertexMetrics {
map<string, int64> pendings = 4;
}

// PipelineStatus
message PipelineStatus {
required string health = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use status?

for _, vertexMetrics := range vertexResp.VertexMetrics {
if pending, ok := vertexMetrics.Pendings["default"]; ok {
if processingRate, ok := vertexMetrics.ProcessingRates["default"]; ok {
if pending > 0 && processingRate == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments, that we need to revisit it later, it might also be caused by the processing is very slow, that the rate is 0.

if err != nil {
resp.Status = &daemon.PipelineStatus{
Health: pointer.String(metrics.PipelineStatusUnknown),
Message: pointer.String("Pipeline health is unknown."),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Pipeline status is unknown."

@@ -46,6 +46,10 @@ const (

VertexProcessingRate = "vertex_processing_rate"
VertexPendingMessages = "vertex_pending_messages"

PipelineStatusOK = "OK"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move them to pipeline_metrics_query.go for now?

Signed-off-by: Dillen Padhiar <[email protected]>
@dpadhiar dpadhiar requested review from whynowy and removed request for vigith March 14, 2023 22:36
@@ -110,4 +124,9 @@ service DaemonService {
rpc GetPipelineWatermarks (GetPipelineWatermarksRequest) returns (GetPipelineWatermarksResponse) {
option (google.api.http).get = "/api/v1/pipelines/{pipeline}/watermarks";
};

// GetPipelineStatus () returns (PipelineStatus) // status: bool and string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is not up to date.

}
}
}
if vertexMetrics.Pendings["1m"] > 0 && vertexMetrics.ProcessingRates["1m"] == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was, we don't need to check 1m.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, for future reference, what are the cases "default" won't be in the map and why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default will be always in the map if 1m is there. But if you look at the GetVertexMetrics implementation, there's chance the map is empty when the target pod is scaled down to 0 or something is wrong with that pod.

Signed-off-by: Dillen Padhiar <[email protected]>
Signed-off-by: Dillen Padhiar <[email protected]>
return resp, nil
}
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this else.

It is expected sometimes there's no pending information:

  1. Source like HTTP, it does not have pending at all;
  2. Based on the GetVertexMetrics implementation, when scaling down to 0 replica, the pending is also not available.

For these cases, we should consider it's ok.

@whynowy
Copy link
Member

whynowy commented Mar 15, 2023

@dpadhiar - please also test this api with real pipelines to make sure the output is expected (find the pipeline spec in examples).

@dpadhiar
Copy link
Contributor Author

@dpadhiar - please also test this api with real pipelines to make sure the output is expected (find the pipeline spec in examples).

Will do, I will test the API with most of them tomorrow morning.

@dpadhiar dpadhiar requested a review from whynowy March 15, 2023 15:53
@whynowy whynowy merged commit c981513 into numaproj:main Mar 15, 2023
whynowy pushed a commit that referenced this pull request Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants