Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inhibition rules evaluation performance on many target alerts #3932

Open
Nuckal777 opened this issue Aug 1, 2024 · 5 comments
Open

Inhibition rules evaluation performance on many target alerts #3932

Nuckal777 opened this issue Aug 1, 2024 · 5 comments

Comments

@Nuckal777
Copy link

Context

Requesting /alerts from my organizations alertmanager returns a reply in 4-6 seconds.
The alertmanager has consistently

  • between 10k-13k alerts and
  • 86 inhibition rules.

Given this scale, I was a bit surprised that fetching the alerts takes multiple seconds.
Taking a trace via pprof revealed that at least 4 seconds are spent calculating inhibited alerts.
Furthermore, a few hundred milliseconds are spent serializing the json response, which is expected due to the size of ~20 MiB in our case.
The structure of my organizations inhibition rules can lead to ~10k alerts matching a target matcher and ~1k alerts matching the source matcher.
Currently the benchmarks only consider cases with a single alert satisfying a target matcher.

Calling /alerts results basically in the following algorithm being executed at the moment:

Memoize source cache target match evaluations

The Prevent two-sided match step evaluates r.TargetMatchers.Matches(a.Labels), where a is from the cached source alerts.
It's invoked over and over again for each alert that is a valid target (potentially evaluating regular expressions), but it is not dependent on a specific target alert.
Assuming I did not miss a side effect, the result of r.TargetMatchers.Matches(a.Labels) is only dependent on a specific inhibition rule and that rule's source alert cache.
By memoizing that expression, a performance gain can be achieved when many alerts are valid targets.

Constructing indices

Checking for label equality between the set of valid source alerts and valid target alerts runs in quadratic time.
By maintaining an index per label, which maps that label's values to an "index into a slice of all alerts", alerts that have a label with a specific value can found by a map lookup.
For multiple labels that need to be checked for equality, multiple slices of indices can be intersected in linear time.
This would be a larger change though.

@Nuckal777
Copy link
Author

Nuckal777 commented Aug 1, 2024

I implemented the memoization in #3933, which also includes a benchmark.

@grobinson-grafana
Copy link
Contributor

Hi! 👋 I thought GET /alerts reads the data in the marker, rather than re-calculate inhibitions, so unless I've misunderstood the code, I'm surprised that GET /alerts is slow because of inhibitor.Mutes? Could you share a pprof perhaps?

@Nuckal777
Copy link
Author

Hey, alertFilter calls setAlertStatus, which is set via Update here. The anonymous function for setAlertStatus calculates inhibitions and silences.

@grobinson-grafana
Copy link
Contributor

Ah. So weird – I don't understand why Alertmanager needs to re-calculate inhibitions and silences for GET requests.

Anyway, thank you for your contribution. I'll make sure to take a closer look later today or tomorrow.

@Nuckal777
Copy link
Author

Friendly reminder, that I would still appreciate some feedback. Likely this got lost in between other things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants