-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add API for pipeline status check. Fixes #407. #599
Conversation
Signed-off-by: Dillen Padhiar <[email protected]>
Signed-off-by: Dillen Padhiar <[email protected]>
Signed-off-by: Dillen Padhiar <[email protected]>
pkg/apis/proto/daemon/daemon.proto
Outdated
@@ -47,6 +47,12 @@ message VertexMetrics { | |||
map<string, int64> pendings = 4; | |||
} | |||
|
|||
// PipelineStatus | |||
message PipelineStatus { | |||
required bool health = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should go with a string type status, which has more meanings other than only true/false. For example, OK/Unknown/Error/...
vertexReq.Vertex = &vertex.Name | ||
vertexResp, err := ps.GetVertexMetrics(ctx, vertexReq) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If err is not nil, should not return an error, it probably just because autoscaling down to 0 and then metrics are not available. In this case, we could just return the status as Unknown
.
return nil, err | ||
} | ||
// check default pending msg and processing rates | ||
if vertexResp.VertexMetrics[0].Pendings["default"] > 0 && vertexResp.VertexMetrics[0].ProcessingRates["default"] == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not only check [0], reduce vertex has more items in the array.
Based on the GetVertexMetrics()
implementation, default
is not always available in the map, so also need to check if it's existing.
if vertexResp.VertexMetrics[0].Pendings["default"] > 0 && vertexResp.VertexMetrics[0].ProcessingRates["default"] == 0 { | ||
resp.Status = &daemon.PipelineStatus{ | ||
Health: pointer.Bool(false), | ||
Message: pointer.String("Pipeline may have issue."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More information in the message, such as which vertex has what issue.
Signed-off-by: Dillen Padhiar <[email protected]>
pkg/apis/proto/daemon/daemon.proto
Outdated
@@ -47,6 +47,12 @@ message VertexMetrics { | |||
map<string, int64> pendings = 4; | |||
} | |||
|
|||
// PipelineStatus | |||
message PipelineStatus { | |||
required string health = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use status
?
for _, vertexMetrics := range vertexResp.VertexMetrics { | ||
if pending, ok := vertexMetrics.Pendings["default"]; ok { | ||
if processingRate, ok := vertexMetrics.ProcessingRates["default"]; ok { | ||
if pending > 0 && processingRate == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comments, that we need to revisit it later, it might also be caused by the processing is very slow, that the rate is 0.
if err != nil { | ||
resp.Status = &daemon.PipelineStatus{ | ||
Health: pointer.String(metrics.PipelineStatusUnknown), | ||
Message: pointer.String("Pipeline health is unknown."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Pipeline status is unknown."
pkg/metrics/metrics.go
Outdated
@@ -46,6 +46,10 @@ const ( | |||
|
|||
VertexProcessingRate = "vertex_processing_rate" | |||
VertexPendingMessages = "vertex_pending_messages" | |||
|
|||
PipelineStatusOK = "OK" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move them to pipeline_metrics_query.go for now?
Signed-off-by: Dillen Padhiar <[email protected]>
pkg/apis/proto/daemon/daemon.proto
Outdated
@@ -110,4 +124,9 @@ service DaemonService { | |||
rpc GetPipelineWatermarks (GetPipelineWatermarksRequest) returns (GetPipelineWatermarksResponse) { | |||
option (google.api.http).get = "/api/v1/pipelines/{pipeline}/watermarks"; | |||
}; | |||
|
|||
// GetPipelineStatus () returns (PipelineStatus) // status: bool and string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is not up to date.
} | ||
} | ||
} | ||
if vertexMetrics.Pendings["1m"] > 0 && vertexMetrics.ProcessingRates["1m"] == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was, we don't need to check 1m
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, for future reference, what are the cases "default" won't be in the map and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default
will be always in the map if 1m
is there. But if you look at the GetVertexMetrics
implementation, there's chance the map is empty when the target pod is scaled down to 0 or something is wrong with that pod.
Signed-off-by: Dillen Padhiar <[email protected]>
Signed-off-by: Dillen Padhiar <[email protected]>
return resp, nil | ||
} | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this else
.
It is expected sometimes there's no pending
information:
- Source like
HTTP
, it does not havepending
at all; - Based on the
GetVertexMetrics
implementation, when scaling down to 0 replica, thepending
is also not available.
For these cases, we should consider it's ok.
@dpadhiar - please also test this api with real pipelines to make sure the output is expected (find the pipeline spec in examples). |
Will do, I will test the API with most of them tomorrow morning. |
Signed-off-by: Dillen Padhiar <[email protected]>
Signed-off-by: Dillen Padhiar <[email protected]>
This PR adds an API for a pipeline status check.
The information returned currently is if the pipeline is "healthy" or not and a message accompanying this.
Pipeline health here is determined by checking the vertices of the pipeline and comparing their pending message counts to their processing rates. If the processing rate of a vertex of 0 but there are multiple pending messages left, this indicates there may be an issue with the vertex that should be checked.