Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: disable labels and annotations metrics when metric-annotations-… #2145

Conversation

opeco17
Copy link
Contributor

@opeco17 opeco17 commented Aug 14, 2023

What this PR does / why we need it:

kube-state-metrics provides a lot of kube_*_annotations and kube_*_labels by default, and it consumes memory significantly.

kube_pod_annotations{instance="...:8080", job="monitoring/kube-state-metrics", namespace="namespace", pod="podname-bc94dfc78-fcf4j", uid="a482250c-6f6e-4533-a596-3eff484d8969"}

By disabling those metrics by default, kube-state-metrics can reduce memory usage.

Those metrics are provided when kube_*_annotations and kube_*_labels are configured.

How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)

decrease

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1829

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 14, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 14, 2023
…allowlist and metric-labels-allowlist are not provided
@opeco17 opeco17 force-pushed the feature/disable-labels-annotations-metrics-by-default branch from 557d798 to 6de105e Compare August 14, 2023 14:38
Copy link
Member

@dgrisonnet dgrisonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really cool change @opeco17, thank you!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 22, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrisonnet, opeco17

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 22, 2023
@k8s-ci-robot k8s-ci-robot merged commit e8272ce into kubernetes:main Aug 22, 2023
10 checks passed
@mrueg
Copy link
Member

mrueg commented Aug 22, 2023

@dgrisonnet @opeco17 Is this a breaking change to the previous behaviour?

@dgrisonnet
Copy link
Member

It is indeed a breaking change. I was a bit too quick with merging this one.

Although this is a breaking change, I wonder if users really rely on these empty metrics since the information inside of them are already present in other metrics such as the more commonly used _info.

That said, I looked at the options' doc and was expecting that we are saying that we are not exposing labels/annotations metrics when the allowlist are empty. But in the end we say:

By default the metric contains only name and namespace labels.

@dgrisonnet
Copy link
Member

These metrics being EXPERIMENTAL, I would personally move ahead with this change and update the doc accordingly.

What is your opinion on that @mrueg @rexagod?

@mrueg
Copy link
Member

mrueg commented Aug 22, 2023

This works for me, we should mention it in the changelogs though as a breaking change.

@rexagod
Copy link
Member

rexagod commented Aug 22, 2023

#2145 (comment) makes sense. Even if this is a breaking change, we aren't doing anything that goes against the laid out commitments. +1.

@opeco17 opeco17 deleted the feature/disable-labels-annotations-metrics-by-default branch August 23, 2023 00:47
Sohamdg081992 added a commit to Azure/prometheus-collector that referenced this pull request May 24, 2024
[comment]: # (Note that your PR title should follow the conventional
commit format: https://conventionalcommits.org/en/v1.0.0/#summary)
# PR Description
Upgrade ksm to 2.12.0
I have upgraded chart using backdoor deployment and found no issues ,
this is build link :
https://github-private.visualstudio.com/azure/_build/results?buildId=80951&view=logs&s=6884a131-87da-5381-61f3-d7acc3b91d76.

I have compared between old and new KSM chart versions of helm charts
for the charts we use(deployment,role,service,etc) and found no major
changes between the old and new. Mainly the changes are in the
parameterizing some properties based on some properties in values.yaml,
but we do not have those properties in our values-template.yaml in our
addon chart , so we are good there. Here are the comparison of K-S-M
[charts](prometheus-community/helm-charts@kube-state-metrics-5.10.1...kube-state-metrics-5.19.0#diff-964657bf9c31e2d1338046dc10aff7a7d28dc34813c6cd09d84228512e966132L25).

Screenshots of metrics flowing after upgrade:
<img width="1876" alt="ksmupgrade"
src="https://github.com/Azure/prometheus-collector/assets/31517098/7aed7ae8-4a7b-4453-86b2-062293b81344">

<img width="1901" alt="ksmupgrade1"
src="https://github.com/Azure/prometheus-collector/assets/31517098/19771142-1991-4b92-9131-83e957162be5">

This
[change](kubernetes/kube-state-metrics#2145) was
tested using following mechanism.

The annotations are flowing before and after the upgrade(below
screenshot). The metrics are flowing fine and upgrade was successful.

Change:
If the annotation or label has no configured allowed values
(--metric-annotations-allowlist, --metric-labels-allowlist) no
object_annotations or object_labels metrics should be created.

For our testing, no change was made in either flags
(--metric-annotations-allowlist, --metric-labels-allowlist) and upgraded
chart was deployed. The default labels and annotations stopped flowing
after the upgrade.

Before upgrade:
<img width="1738" alt="before"
src="https://github.com/Azure/prometheus-collector/assets/31517098/b4ec863f-3d40-4506-bdaa-ba46f1c64dad">


After upgrade:
<img width="1892" alt="after"
src="https://github.com/Azure/prometheus-collector/assets/31517098/95fe91a5-0f5b-4273-96cd-8d51a6d393ab">





[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
    - [ ] `fips`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
mandre added a commit to shiftstack/cluster-monitoring-operator that referenced this pull request Aug 2, 2024
Since kubernetes/kube-state-metrics#2145,
kube-state-metrics does not collect `kube_*_annotations` metrics by
default. It's no longer necessary to add them to the metrics' deny list.

Removing `kube_*_annotations` from the deny list allows us to enable
scrapping of annotation metrics via the `--metric-annotations-allowlist`
option.

Additionally, we were missing a comma, which might have been a problem
in the comma-separated list of arguments.
mandre added a commit to shiftstack/cluster-monitoring-operator that referenced this pull request Aug 2, 2024
Since kubernetes/kube-state-metrics#2145,
kube-state-metrics does not collect `kube_*_annotations` metrics by
default. It's no longer necessary to add them to the metrics' deny list.

Removing `kube_*_annotations` from the deny list allows us to enable
scrapping of annotation metrics via the `--metric-annotations-allowlist`
option.

Additionally, we were missing a comma, which might have been a problem
in the comma-separated list of arguments.
mandre added a commit to shiftstack/cluster-monitoring-operator that referenced this pull request Aug 2, 2024
Since kubernetes/kube-state-metrics#2145,
kube-state-metrics does not collect `kube_*_annotations` or
`kube_*_labels` metrics by default. It's no longer necessary to add them
to the metrics' deny list.

Removing `kube_*_annotations` from the deny list allows us to enable
scrapping of annotation metrics via the `--metric-annotations-allowlist`
option.

Additionally, we were missing a comma, which might have been a problem
in the comma-separated list of arguments.
@cykl
Copy link

cykl commented Oct 31, 2024

@dgrisonnet

Although this is a breaking change, I wonder if users really rely on these empty metrics

I will put an end to the suspense. At least one user really relied on these metrics, me. 😁

since the information inside of them are already present in other metrics such as the more commonly used _info.

That doesn't sound right to me. AFAIK The only way to retrieve labels associated with a kubernetes object is the kube_*_labels metrics. In my case, some labels are required to aggregate some metrics or route some alerts.

It seems a very common need, and I expect that others have been impacted too (for example search for "kubernetes prometheus cronjob" and you will find dozen of examples relying on them).

While release notes mention it, I'm a bit surprised that kube-state-metrics introduced this breaking changes with such lightness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

do not create annotation and label metrics if none are allowed
6 participants