Inhibition rules evaluation performance on many target alerts #3932

Nuckal777 · 2024-08-01T14:52:03Z

Context

Requesting /alerts from my organizations alertmanager returns a reply in 4-6 seconds.
The alertmanager has consistently

between 10k-13k alerts and
86 inhibition rules.

Given this scale, I was a bit surprised that fetching the alerts takes multiple seconds.
Taking a trace via pprof revealed that at least 4 seconds are spent calculating inhibited alerts.
Furthermore, a few hundred milliseconds are spent serializing the json response, which is expected due to the size of ~20 MiB in our case.
The structure of my organizations inhibition rules can lead to ~10k alerts matching a target matcher and ~1k alerts matching the source matcher.
Currently the benchmarks only consider cases with a single alert satisfying a target matcher.

Calling /alerts results basically in the following algorithm being executed at the moment:

For each T in all alerts (calls into Mutes here):
- For each R in all inhibition rules:
  - Continue if T is not valid target
  - For each S in source alerts cached by R:
    - Check for label equality
    - Prevent two-sided match

Memoize source cache target match evaluations

The Prevent two-sided match step evaluates r.TargetMatchers.Matches(a.Labels), where a is from the cached source alerts.
It's invoked over and over again for each alert that is a valid target (potentially evaluating regular expressions), but it is not dependent on a specific target alert.
Assuming I did not miss a side effect, the result of r.TargetMatchers.Matches(a.Labels) is only dependent on a specific inhibition rule and that rule's source alert cache.
By memoizing that expression, a performance gain can be achieved when many alerts are valid targets.

Constructing indices

Checking for label equality between the set of valid source alerts and valid target alerts runs in quadratic time.
By maintaining an index per label, which maps that label's values to an "index into a slice of all alerts", alerts that have a label with a specific value can found by a map lookup.
For multiple labels that need to be checked for equality, multiple slices of indices can be intersected in linear time.
This would be a larger change though.

The text was updated successfully, but these errors were encountered:

Nuckal777 · 2024-08-01T14:53:56Z

I implemented the memoization in #3933, which also includes a benchmark.

grobinson-grafana · 2024-08-05T08:25:22Z

Hi! 👋 I thought GET /alerts reads the data in the marker, rather than re-calculate inhibitions, so unless I've misunderstood the code, I'm surprised that GET /alerts is slow because of inhibitor.Mutes? Could you share a pprof perhaps?

Nuckal777 · 2024-08-05T09:31:53Z

Hey, alertFilter calls setAlertStatus, which is set via Update here. The anonymous function for setAlertStatus calculates inhibitions and silences.

grobinson-grafana · 2024-08-05T09:39:09Z

Ah. So weird – I don't understand why Alertmanager needs to re-calculate inhibitions and silences for GET requests.

Anyway, thank you for your contribution. I'll make sure to take a closer look later today or tomorrow.

Nuckal777 · 2024-08-19T08:11:44Z

Friendly reminder, that I would still appreciate some feedback. Likely this got lost in between other things.

Nuckal777 mentioned this issue Aug 1, 2024

Add Inhibitor.MutesAll() #3933

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inhibition rules evaluation performance on many target alerts #3932

Inhibition rules evaluation performance on many target alerts #3932

Nuckal777 commented Aug 1, 2024

Nuckal777 commented Aug 1, 2024 •

edited

Loading

grobinson-grafana commented Aug 5, 2024

Nuckal777 commented Aug 5, 2024

grobinson-grafana commented Aug 5, 2024

Nuckal777 commented Aug 19, 2024

Inhibition rules evaluation performance on many target alerts #3932

Inhibition rules evaluation performance on many target alerts #3932

Comments

Nuckal777 commented Aug 1, 2024

Context

Memoize source cache target match evaluations

Constructing indices

Nuckal777 commented Aug 1, 2024 • edited Loading

grobinson-grafana commented Aug 5, 2024

Nuckal777 commented Aug 5, 2024

grobinson-grafana commented Aug 5, 2024

Nuckal777 commented Aug 19, 2024

Nuckal777 commented Aug 1, 2024 •

edited

Loading