Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus-stack] additionalRuleLabels shouldn't apply to recording rules #3396

Closed
defenestration opened this issue May 17, 2023 · 21 comments · Fixed by #3400
Closed
Labels
bug Something isn't working

Comments

@defenestration
Copy link

Describe the bug a clear and concise description of what the bug is.

It looks like a recent #3351 for kube-prometheus-stack-45.28.1 applies additionalRuleLabels to recording rules. Which is a new change. We have a additionalRuleLabels label defined, to help us route alert rules based on a namespace annotation. (see #1231 (comment) for background on that.)

In our helm chart we use something like:

additionalRuleLabels: 
  tsc_owner: '{{ with printf `kube_namespace_annotations{namespace="%s"}` .Labels.namespace | query }}{{ with (. | first).Labels.annotation_tsc_owner }}{{ . }}{{else}}none{{end}}{{else}}none{{end}}'

With the recent release, this label also gets applied to recording rules.
This caused an error in for us. The built-in alert PrometheusRuleFailures fires and going to http://prometheus/rules page shows an error with this recording rule in kube-apiserver-availability.rules

record: code_verb:apiserver_request_total:increase30d
expr: avg_over_time(code_verb:apiserver_request_total:increase1h[30d]) * 24 * 30

ERR	
vector contains metrics with the same labelset after applying rule labels

It seems like it might be good to specify different, additionalRuleLabels to apply to alerts and recording rules separately, ex; add additionalAlertRuleLabels additionalRecordingRuleLabels parameters to the helm chart possibly.

What's your helm version?

version.BuildInfo{Version:"v3.11.2", GitCommit:"912ebc1cd10d38d340f048efaf0abda047c3468e", GitTreeState:"clean", GoVersion:"go1.18.10"}

What's your kubectl version?

1.24

Which chart?

kube-prometheus-stack

What's the chart version?

45.28.1

What happened?

The built-in alert PrometheusRuleFailures fires and going to http://prometheus/rules page shows an error with this recording rule in kube-apiserver-availability.rules

record: code_verb:apiserver_request_total:increase30d
expr: avg_over_time(code_verb:apiserver_request_total:increase1h[30d]) * 24 * 30

ERR	
vector contains metrics with the same labelset after applying rule labels

What you expected to happen?

no error

How to reproduce it?

Try to add a similar additionalRuleLabels

defaultRules:
  additionalRuleLabels: 
    tsc_owner: '{{ with printf `kube_namespace_annotations{namespace="%s"}` .Labels.namespace | query }}{{ with (. | first).Labels.annotation_tsc_owner }}{{ . }}{{else}}none{{end}}{{else}}none{{end}}'

This will render correct on alert rules

the metric code_verb:apiserver_request_total:increase1h contains the label text verbatim as well, which is undesired.

Enter the changed values of values.yaml?

None from previous version of chart.

Enter the command that you execute and failing/misfunctioning.

N/A - flux auto upgrades to latest version of helm chart.

Anything else we need to know?

No response

@defenestration defenestration added the bug Something isn't working label May 17, 2023
@defenestration
Copy link
Author

I rolled back to 45.28.0 and this issue stopped, so definitely caused by 45.28.1

@defenestration defenestration changed the title [kube-prometheus-stack] additionalRuleLabels shoudln't apply to recording rules [kube-prometheus-stack] additionalRuleLabels shouldn't apply to recording rules May 17, 2023
@defenestration
Copy link
Author

I see the original issue #3340 for the problem PR suggests what i think should be the desired solution: separate additionalRuleLabels for alert & recording. Probably additionalRuleLabels could be used to apply to both, as long as we can still separate out alert and recording rules.

@buroa
Copy link

buroa commented May 17, 2023

There is something wrong with this version which I needed to roll back as well. Trying to install the Helm release gives errors like this:

failed to diff release against cluster resources: [PrometheusRule/monitoring/kube-prometheus-stack-k8s.rules dry-run failed, reason: Invalid, error: PrometheusRule.monitoring.coreos.com "kube-prometheus-stack-k8s.rules" is invalid: [spec.groups[0].rules[0].labels: Invalid value: "null"

@buroa
Copy link

buroa commented May 18, 2023

@scott-grimes Any idea?

@scott-grimes
Copy link
Contributor

looks like we should split this like @defenestration has suggested, allow additionalRuleLabels to be applied to both, add an additional two config options to target labels specifically for alerts and recording rules separately

@buroa
Copy link

buroa commented May 18, 2023

looks like we should split this like @defenestration has suggested, allow additionalRuleLabels to be applied to both, add an additional two config options to target labels specifically for alerts and recording rules separately

The biggest problem this PR has introduced is the required labels key, even if additionalRuleLabels is not specified. To get around this bug, I had to do something like buroa/k8s-gitops@ede393d just to get it to deploy correctly.

@scott-grimes
Copy link
Contributor

scott-grimes commented May 18, 2023

@buroa submitted pr to fix issue labels:always being required. PR above checks to see if there are any labels that need to be applied, and adds the labels: only when necessary

@buroa
Copy link

buroa commented May 18, 2023

@scott-grimes Thanks! Looks great!

@Jellyfrog
Copy link
Contributor

The PR that closes this doesn't fix the issue from the title tho "additionalRuleLabels shouldn't apply to recording rules".
This should be reopened

@darkobas
Copy link

yes, the issue still persists

@defenestration
Copy link
Author

Agreed, @buroa 's issue was fixed, but the original issue still exists. Unsure if i can reopen this issue myself though.

@defenestration
Copy link
Author

defenestration commented May 25, 2023

Well, looking at a diff between 45.28.0 and the current 46.4.1 I see additionalRuleGroupLabels was added, so for my issue, I can probably work with that, but will have to figure out which ones are recording rules and which not.
Could be trouble if recording/alerting rules are in the same group.

https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml#L79

Current plan is to look at the files changed in the initial pr for 45.28.1 and not add my relabel rules to those groups.

@defenestration
Copy link
Author

So the rule groups so far have divisions between alert and recording rules. Here are the current additionalRuleGroupLabels that only have recording rules:

k8s
kubeApiserverAvailability
kubeApiserverBurnrate
kubeApiserverHistogram
kubelet
kubePrometheusGeneral
kubePrometheusNodeRecording
kubeSchedulerRecording
node
nodeExporterRecording

So i just added my label rule to every group but those.

      additionalRuleGroupLabels:
        alertmanager: 
          owner: {{ stuff }} 
       ...

So, it seems the issue can stay closed after all. Its not as fine of a line as initially imagined but so be it.

@dmitriy-lukyanchikov
Copy link

i believe that shouldn't work like that because in docs its saying that its label only for alert but its not. and you cant define label for just all alerts its pretty annoying issue that break previous functionality i dont think it should be closed until someone fixes docs or broken functionality

@speer
Copy link

speer commented Jun 12, 2023

i believe that shouldn't work like that because in docs its saying that its label only for alert but its not. and you cant define label for just all alerts its pretty annoying issue that break previous functionality i dont think it should be closed until someone fixes docs or broken functionality

I agree! We are hitting the same issue after upgrading.

@nickyfoster
Copy link

Please reopen as we're still experiencing the same issue.

@zen
Copy link

zen commented Jan 18, 2024

We are on the most recent chart version (55.11.0) and also hitting the same issue. We are using very simple config :

defaultRules:
  additionalRuleLabels:
    team: devops

@sbrodriguez
Copy link

sbrodriguez commented Jan 26, 2024

Same problem in Pometheus 56.1.0 version. We have to label individually excluding kubeApiserverAvailability.

defaultRules:
  additionalRuleGroupLabels:
...
    k8sPodOwner:
      group_dest: infra
    kubeApiserverAvailability: {}
    kubeApiserverBurnrate:
      group_dest: infra
    kubeApiserverHistogram:
...

If we add the label there or in the global config additionalRuleLabels we allways get the error in this rule:

Vector contains metrics with the same labelset after applying rule labels.

@sfrolich
Copy link

sfrolich commented Feb 7, 2024

Same here on 56.6.2

@Aaron-ML
Copy link

Still seeing this even on 57.0.2. Does not work as documented.

@ashuec90
Copy link

We also seeing this error in 62.7.0 chart version as well for kube-apiserver-availability.rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet