Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split rule group "node" into "node" (only alerts) and "node-records" (only records) #1175

Closed

Conversation

dimaslv
Copy link
Contributor

@dimaslv dimaslv commented Jul 29, 2024

After upgrade I am unable to set different labels for record rules and alert rules
We write metrics from multiple clusters into federated vm-stack. It has record VMRules from vm-stack, which use scraped metrics as an input and write them back with defaultRules labels, which overwrite cluster label from scraped metrics. So multiple input clusters end up in metrics which are recorded with labels from target cluster, making it impossible to use such setup (used to work before ae208d5).

There is a solution through rule groups, so I made this not very pretty, but working values.yaml:

defaultRules:
  rule:
    spec:
      labels:
        cluster: development
  groups:
    k8sContainerCpuUsageSecondsTotal:
      rules:
        spec:
          labels:
            cluster: null
    k8sContainerMemoryCache:
      rules:
        spec:
          labels:
            cluster: null
    k8sContainerMemoryRss:
      rules:
        spec:
          labels:
            cluster: null
    k8sContainerMemorySwap:
      rules:
        spec:
          labels:
            cluster: null
    k8sContainerMemoryWorkingSetBytes:
      rules:
        spec:
          labels:
            cluster: null
    k8sContainerResource:
      rules:
        spec:
          labels:
            cluster: null
    k8sPodOwner:
      rules:
        spec:
          labels:
            cluster: null
    kubePrometheusGeneral:
      rules:
        spec:
          labels:
            cluster: null
    kubePrometheusNodeRecording:
      rules:
        spec:
          labels:
            cluster: null
    kubelet:
      rules:
        spec:
          labels:
            cluster: null

(Not that "labels: null" cannot be used here, as some rules have their own labels in upstream and they must survive.)
But I cannot make same config for rule group "node".

There are three rule groups in upstream:

After sync_rules.py those groups get into separate files in files/rules/generated without renames.
Next step is VMRule generation in templates/rules/rule.yaml, which uses "sanitized group name":

{{- /*
Create sanitized group name retrieved from file
*/}}
{{- $groupName := include "victoria-metrics-k8s-stack.rulegroup.key" $ctx -}}

This sanitized name is generated in templates/_helpers.tpl:

{{/*
VMRule key
*/}}
{{- define "victoria-metrics-k8s-stack.rulegroup.key" -}}
{{ without (regexSplit "[-_.]" .name -1) "exporter" "rules" | join "-" | camelcase | untitle }}
{{- end -}}

So, here words "exporter" and "rules" are dropped from the group name and three groups node-exporter/node-exporter.rules/node.rules end up in combined group "node", which makes it impossible to set labels for alert while not setting them for records.
There are multiple solutions possible:

  1. Leave all group names intact.
  2. Leave node-exporter and node-exporter.rules group names intact.
  3. Rename groups node-exporter and node-exporter.rules to node-records.
  4. Use separate defaultRules for record rules and for alert rules.

In this PR I offer an implementation of the solution number 3, but I would be happy with any working solution, which you consider more appropriate.

P.S. As a side note I want to point out that same problem encountered in kube-prometheus-stack helm chart prometheus-community/helm-charts#3396, which could be used as a proof of popularity of such setup with multiple clusters and separate labels.

@dimaslv dimaslv marked this pull request as draft July 29, 2024 09:41
@dimaslv dimaslv marked this pull request as ready for review July 29, 2024 11:45
@Haleygo
Copy link
Contributor

Haleygo commented Jul 30, 2024

cc @AndrewChubatiuk

@AndrewChubatiuk
Copy link
Contributor

hey @dimaslv
What do you think about this solution instead?

@dimaslv
Copy link
Contributor Author

dimaslv commented Jul 30, 2024

@AndrewChubatiuk, looks perfect to me!

@Haleygo
Copy link
Contributor

Haleygo commented Jul 31, 2024

Closing this in favor of #1177.

@Haleygo Haleygo closed this Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants