Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve apiserver SLI metric name #112679

Merged
merged 1 commit into from
Nov 8, 2022

Conversation

dgrisonnet
Copy link
Member

@dgrisonnet dgrisonnet commented Sep 22, 2022

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Add new kube-apiserver SLI metric better reflecting that the metric is
an SLI and not an SLO and deprecate the existing
apiserver_request_slo_duration_seconds in 1.27. Although the metric is
still in alpha, we prefer deprecating it for one release since it is a
critical metric used for SLOs and to make sure that users that are using
it have time to make the transition.

Going forward we prefer going with SLI-specific metrics, we will use
sli instead of slo so for consistency purposes.

Does this PR introduce a user-facing change?

Deprecate the apiserver_request_slo_duration_seconds metric for v1.27 in favor of apiserver_request_sli_duration_seconds for naming consistency purposes with other SLI-specific metrics and to avoid any confusion between SLOs and SLIs.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 22, 2022
@dgrisonnet
Copy link
Member Author

/assign @logicalhan

@dgrisonnet
Copy link
Member Author

/sig instrumentation

@k8s-ci-robot k8s-ci-robot added sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 22, 2022
@logicalhan
Copy link
Member

Even though this is alpha, I'm hesitant to rename it because I actually suspect people have already started using it.

@dgrisonnet
Copy link
Member Author

Yes, there should already be quite a huge amount of users since the whole ecosystem that imports https://github.com/kubernetes-monitoring/kubernetes-mixin is already using this metric.

My main reason for changing the name is that metrics are hard to discover by default since we don't have a solid documentation upstream. So the principal way to discover metrics is to look for certain patterns in the query platform. Let's say a user is looking for SLI metrics, then by searching for the sli pattern if our metrics have consistent names then they should get all our SLI-related metrics.

As for actually breaking the users, I think it should be fine since the metric is still ALPHA and I labeled this PR as an API change so it will appear as a breaking-change in the changelog.

@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@fedebongio
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 27, 2022
@logicalhan
Copy link
Member

As for actually breaking the users, I think it should be fine since the metric is still ALPHA and I labeled this PR as an API change so it will appear as a breaking-change in the changelog.

Absolutely not, I am not in favor of just breaking users. Instead, we should introduce a second metric and deprecate this one and give people an actual chance to migrate.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 28, 2022
@dgrisonnet
Copy link
Member Author

/kind cleanup

@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Sep 28, 2022
@logicalhan
Copy link
Member

/assign @wojtek-t

@dgrisonnet
Copy link
Member Author

@logicalhan makes sense, I addressed your comment.

@wojtek-t can you please also take a look?

My main concern now that we have added a deprecation period is that we will have to live with 3 SLO metrics in 1.26 which will increase the number of timeseries exposed by Kubernetes by quite a lot since these metrics are already the most expensive one. We can expect to have users complaining about the increase, but since it is only for one release, I guess it should be fine.

Add new kube-apiserver SLI metric better reflecting that the metric is
an SLI and not an SLO and deprecate the existing
apiserver_request_slo_duration_seconds in 1.27. Although the metric is
still in alpha, we prefer deprecating it for one release since it is a
critical metric used for SLOs and to make sure that users that are using
it have time to make the transition.

Going forward we prefer going with SLI specific metrics, we will use
_sli_ instead of _slo_ so for consistency purposes.

Signed-off-by: Damien Grisonnet <[email protected]>
@logicalhan
Copy link
Member

lgtm but i also share cardinality concerns.

@wojtek-t what do you think?

@wojtek-t
Copy link
Member

lgtm but i also share cardinality concerns.
@wojtek-t what do you think?

Just to double check I didn't miss anything - it's effectively the exact same metric - just we're naming it differently?
Assuming I'm right, I agree with the cardinality concern, but I don't see a better way of doing that (if we really want to rename it which makes sense to me) - just switching the name sounds too problematic to me.

So I'm fine with that, but I would like to get an ACK from at least one of folks who use that extensively internally:
@mborsz @marseel

@dgrisonnet
Copy link
Member Author

Yes this is just a renaming

@marseel
Copy link
Member

marseel commented Sep 29, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 29, 2022
Copy link
Member

@logicalhan logicalhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrisonnet, logicalhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 8, 2022
@k8s-ci-robot k8s-ci-robot merged commit 7752c3a into kubernetes:master Nov 8, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Nov 8, 2022
@dgrisonnet dgrisonnet deleted the fix-apiserver-sli branch November 23, 2022 14:25
metalmatze added a commit to metalmatze/kubernetes-mixin that referenced this pull request Oct 10, 2023
The older `apiserver_request_sli_.*` metrics have been name changed starting in Kubernetes v1.26.
kubernetes/kubernetes#112679
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants