Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ruler: add metric thanos alert sent by alert name #5368

Conversation

fgouteroux
Copy link
Contributor

@fgouteroux fgouteroux commented May 16, 2022

Hello thanos team,

It will be great to have the total number of alerts sent by alertmanager and by alert name. It will be useful to make some stats about how often an alert is triggered.

I'm not sure that if it's relevant to add another metric, and about the name too.

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Adding metric: Total number of alerts sent by alertmanager and alert name.

Verification

Trigger an alert
Check the metric thanos_alert_sender_alerts_sent_by_alertname_total in thanos ruler metrics endpoint

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this idea and I think this could be a nice addition to have.
@fgouteroux Can you please rebase your PR on main? This should fix the CI issue.

@fgouteroux fgouteroux force-pushed the add_metric_thanos_alert_sent_by_name branch from 8de0141 to 0edf958 Compare May 22, 2022 20:19
@fgouteroux fgouteroux force-pushed the add_metric_thanos_alert_sent_by_name branch from 0edf958 to 8481b12 Compare May 23, 2022 12:49
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work! However, what's the use-case here? If it's about tracing where specific alerts came from then I believe Alertmanager itself should be a better place to solve this. There were lots of discussions about the identity of an alert during Prometheus Day @ KubeCon last week. I don't think the alertname is enough to identify an alert. And I believe that if we'd give users such a metric, the next question would be: what about other labels? Hence, I don't think that this is a good idea

@fgouteroux
Copy link
Contributor Author

Thanks for your work! However, what's the use-case here? If it's about tracing where specific alerts came from then I believe Alertmanager itself should be a better place to solve this. There were lots of discussions about the identity of an alert during Prometheus Day @ KubeCon last week. I don't think the alertname is enough to identify an alert. And I believe that if we'd give users such a metric, the next question would be: what about other labels? Hence, I don't think that this is a good idea

It's not about tracing, it could be usefull to see which alertname is mostly triggered. In my case I'm using one ruler per tenant with a shared alertmanager, so each tenant manage his own alerts rules but have no access to the alertmanager. We only expose the alertmanager api through a custom proxy with authentification. Based on that metrics provided by the alertmanager are not very helpful as we only see the total of alert sent to an integration like PagerDuty. This is why ruler metrics fit to our design and let us provide a grafana dashboard with all ruler metrics related to the tenant.

@yeya24
Copy link
Contributor

yeya24 commented May 30, 2022

I don't think the alertname is enough to identify an alert.

Good point. I think alertname is only unique in one rule group. But can be duplicated across groups.

@fgouteroux
Copy link
Contributor Author

Understood, closing this as useless.

@fgouteroux fgouteroux closed this Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants