-
Notifications
You must be signed in to change notification settings - Fork 39.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve apiserver SLI metric name #112679
Conversation
/assign @logicalhan |
/sig instrumentation |
Even though this is alpha, I'm hesitant to rename it because I actually suspect people have already started using it. |
Yes, there should already be quite a huge amount of users since the whole ecosystem that imports https://github.com/kubernetes-monitoring/kubernetes-mixin is already using this metric. My main reason for changing the name is that metrics are hard to discover by default since we don't have a solid documentation upstream. So the principal way to discover metrics is to look for certain patterns in the query platform. Let's say a user is looking for SLI metrics, then by searching for the As for actually breaking the users, I think it should be fine since the metric is still ALPHA and I labeled this PR as an API change so it will appear as a breaking-change in the changelog. |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
/triage accepted |
Absolutely not, I am not in favor of just breaking users. Instead, we should introduce a second metric and deprecate this one and give people an actual chance to migrate. |
7cd5884
to
57cf30d
Compare
/kind cleanup |
/assign @wojtek-t |
@logicalhan makes sense, I addressed your comment. @wojtek-t can you please also take a look? My main concern now that we have added a deprecation period is that we will have to live with 3 SLO metrics in 1.26 which will increase the number of timeseries exposed by Kubernetes by quite a lot since these metrics are already the most expensive one. We can expect to have users complaining about the increase, but since it is only for one release, I guess it should be fine. |
Add new kube-apiserver SLI metric better reflecting that the metric is an SLI and not an SLO and deprecate the existing apiserver_request_slo_duration_seconds in 1.27. Although the metric is still in alpha, we prefer deprecating it for one release since it is a critical metric used for SLOs and to make sure that users that are using it have time to make the transition. Going forward we prefer going with SLI specific metrics, we will use _sli_ instead of _slo_ so for consistency purposes. Signed-off-by: Damien Grisonnet <[email protected]>
57cf30d
to
1493da9
Compare
lgtm but i also share cardinality concerns. @wojtek-t what do you think? |
Just to double check I didn't miss anything - it's effectively the exact same metric - just we're naming it differently? So I'm fine with that, but I would like to get an ACK from at least one of folks who use that extensively internally: |
Yes this is just a renaming |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgrisonnet, logicalhan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The older `apiserver_request_sli_.*` metrics have been name changed starting in Kubernetes v1.26. kubernetes/kubernetes#112679
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Add new kube-apiserver SLI metric better reflecting that the metric is
an SLI and not an SLO and deprecate the existing
apiserver_request_slo_duration_seconds in 1.27. Although the metric is
still in alpha, we prefer deprecating it for one release since it is a
critical metric used for SLOs and to make sure that users that are using
it have time to make the transition.
Going forward we prefer going with SLI-specific metrics, we will use
sli instead of slo so for consistency purposes.
Does this PR introduce a user-facing change?