-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Degraded mode correctness metrics #2049
Degraded mode correctness metrics #2049
Conversation
Hi @sawsa307. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @swetharepakula |
/assign @bowei |
b5e0b63
to
5a68287
Compare
5a68287
to
230e4f9
Compare
230e4f9
to
9ad5026
Compare
9ab5138
to
5c46e90
Compare
/ok-to-test |
pkg/neg/metrics/metrics.go
Outdated
Subsystem: negControllerSubsystem, | ||
Name: degradedModeCorrectnessKey, | ||
Help: "Number of endpoints differed between current endpoint calculation and degraded mode calculation", | ||
// custom buckets - [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, +Inf] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this more fine grained. Can we in increase +Inf range to 20k endpoints, set number of buckets to 20.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks!
pkg/neg/syncers/transaction.go
Outdated
@@ -280,8 +280,9 @@ func (s *transactionSyncer) syncInternalImpl() error { | |||
if len(notInDegraded) == 0 && len(onlyInDegraded) == 0 { | |||
s.resetErrorState() | |||
} | |||
} else { | |||
computeDegradedModeCorrectness(notInDegraded, onlyInDegraded, string(s.NegSyncerKey.NegType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we always publish instead of only publish when > 0?
As a histogram, each data point corresponds to ? What is the units/denominator of the metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we always publish instead of only publish when > 0?
We are trying to collect metrics when the normal calculation doesn't run into error/yields non-empty result, so we can make a meaningful comparison. It is possible that both results are the same or they are different. We are trying to make sure we are not doing comparison against something that is "meaningless" empty(normal calculation runs into error)
We end up in this line with both=0. When we are not in error state, and normal calculation doesn't run into error and produce endpoints, and it is possible that the normal calculation result is the same as the degraded mode result.
We are missing one condition, when we are in error state and the normal calculation doesn't run into error, so I'll change the condition here.
As a histogram, each data point corresponds to ? What is the units/denominator of the metric?
Each data point corresponds to the difference between normal and degraded mode endpoint calculations WHEN the normal calculation doesn't run into error. The unit is #endpoint, and the denominator is total count of endpoints. We are not recording the denominator here because all we care is if the difference is 0.
5c46e90
to
85fce4d
Compare
Add metrics that reports the number of endpoint differed between current endpoint calculation and degraded mode calculation. It is not emitted when current endpoint calculation returns any errors because the returned map will be empty, while in degraded mode calculation, errors are handled and a non-empty map is returned
85fce4d
to
040ba0c
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bowei, sawsa307 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add metrics that reports the number of endpoints differ between current endpoint calculation and degraded mode calculation. It is not emitted when current endpoint calculation returns any errors because the returned map will be empty, while in degraded mode calculation, errors are handled and a non-empty map is returned.