Add Inhibitor.MutesAll() #3933

Nuckal777 · 2024-08-01T14:53:27Z

For the context please refer to #3932.

I added benchmarks, which consider more than one target alert.
The following benchstat output compares:

invoking Inhibitor.Mutes() for each target alert (old.txt)
with the Inhibitor.MutesAll() implementation (new.txt).
Inhibitor.MutesAll() increase the speed in certain cases at the cost of some memory.

goos: darwin
goarch: arm64
pkg: github.com/prometheus/alertmanager/inhibit
                                                                        │    old.txt    │               new.txt                 │
                                                                        │    sec/op     │    sec/op     vs base                 │
Mutes/1_inhibition_rule,_1_inhibiting_alert,_1_target_alert-10              768.5n ± 3%    842.5n ± 5%    +9.62% (p=0.001 n=10)
Mutes/10_inhibition_rules,_1_inhibiting_alert,_1_target_alert-10            769.5n ± 2%    834.4n ± 0%    +8.44% (p=0.000 n=10)
Mutes/100_inhibition_rules,_1_inhibiting_alert,_1_target_alert-10           779.3n ± 1%    841.4n ± 1%    +7.96% (p=0.000 n=10)
Mutes/1000_inhibition_rules,_1_inhibiting_alert,_1_target_alert-10          839.5n ± 1%    914.7n ± 4%    +8.96% (p=0.000 n=10)
Mutes/1000_inhibition_rules,_100_inhibiting_alert,_100_target_alert-10     157.39µ ± 1%    54.41µ ± 1%   -65.43% (p=0.000 n=10)
Mutes/10000_inhibition_rules,_1_inhibiting_alert,_1_target_alert-10         897.5n ± 3%    962.6n ± 2%    +7.25% (p=0.000 n=10)
Mutes/10000_inhibition_rules,_100_inhibiting_alert,_100_target_alert-10     63.07µ ± 6%    57.59µ ± 3%    -8.68% (p=0.000 n=10)
Mutes/1_inhibition_rule,_10_inhibiting_alerts,_1_target_alert-10            886.9n ± 1%   1087.0n ± 1%   +22.57% (p=0.000 n=10)
Mutes/1_inhibition_rule,_100_inhibiting_alerts,_1_target_alert-10           1.929µ ± 2%    3.083µ ± 1%   +59.84% (p=0.000 n=10)
Mutes/1_inhibition_rule,_100_inhibiting_alerts,_100_target_alert-10        174.18µ ± 1%    60.44µ ± 1%   -65.30% (p=0.000 n=10)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts,_1_target_alert-10          12.67µ ± 1%    25.82µ ± 1%  +103.87% (p=0.000 n=10)
Mutes/1_inhibition_rule,_1000_inhibiting_alerts,_1000_target_alert-10     12425.7µ ± 1%    582.3µ ± 1%   -95.31% (p=0.000 n=10)
Mutes/1_inhibition_rule,_10000_inhibiting_alerts,_1_target_alert-10         104.4µ ± 1%    278.8µ ± 1%  +167.13% (p=0.000 n=10)
Mutes/100_inhibition_rules,_1000_inhibiting_alerts,_1_target_alert-10       11.89µ ± 2%    24.69µ ± 0%  +107.65% (p=0.000 n=10)
Mutes/10_inhibition_rules,_last_rule_matches,_1_target_alert-10             1.012µ ± 3%    1.125µ ± 1%   +11.22% (p=0.000 n=10)
Mutes/100_inhibition_rules,_last_rule_matches,_1_target_alert-10            3.178µ ± 1%    3.872µ ± 1%   +21.86% (p=0.000 n=10)
Mutes/100_inhibition_rules,_last_rule_matches,_100_target_alerts-10         299.9µ ± 1%    227.9µ ± 1%   -24.00% (p=0.000 n=10)
Mutes/1000_inhibition_rules,_last_rule_matches,_1_target_alert-10           24.62µ ± 3%    31.57µ ± 1%   +28.24% (p=0.000 n=10)
Mutes/1000_inhibition_rules,_last_rule_matches,_1000_target_alert-10        24.29m ± 0%    18.27m ± 0%   -24.77% (p=0.000 n=10)
Mutes/10000_inhibition_rules,_last_rule_matches,_1_target_alert-10          239.6µ ± 6%    306.5µ ± 0%   +27.94% (p=0.000 n=10)
geomean                                                                     18.71µ         17.34µ         -7.36%

In cases where a low count of alerts is involved Inhibitor.MutesAll() is 2-3 times slower, due to the additional memory allocations.
This should still be sufficiently fast given the small scale.
When the count of target alerts is increased Inhibitor.MutesAll() becomes faster.
Invoking it as part of GET /alerts still needs to be done.

Signed-off-by: Nuckal777 <[email protected]>

grobinson-grafana

I have a question about how you intend to use MutesAll. I see you have updated the tests and the benchmarks to use the new MutesAll function, but you don't call it anywhere other than tests? Do you intend to call it from MuteStage as well as APIv2? The PR seems incomplete without it?

Nuckal777 · 2024-08-22T07:29:46Z

Do you intend to call it from MuteStage as well as APIv2?

Yes.

The PR seems incomplete without it?

I thought I check the general interest for the topic before implementing everything. I will add it. Might take a few days.

Nuckal777 · 2024-08-27T07:29:14Z

I added MutesAll to the api and MuteStage code paths. The CI needs a re-run, network foo during setup.

grobinson-grafana

Hi! 👋 Thanks for making these changes! However, I'm afraid it confirms what I suspected. This change has some weird effects on existing code that I don't think should be accepted. For example:

GroupFunc now returns a list of booleans because alertFilter no longer operates on individual alerts, but slices of alerts:

GroupFunc func(func(*dispatch.Route) bool, func([]*types.Alert, time.Time) []bool) (dispatch.AlertGroups, map[model.Fingerprint][]string)

setAlertStatus now accepts a list of label sets, which doesn't make sense as the function sets the status for an individual alert.

setAlertStatusFn func(...prometheus_model.LabelSet)

You then have to build slices of label sets and index into the returned slices in notify/notify.go.

Is it possible to optimize inhibition rules without changing the Mutes interface?

grobinson-grafana · 2024-09-04T11:57:25Z

types/types.go

@@ -472,6 +472,7 @@ func (a *Alert) Merge(o *Alert) *Alert {
 // Mutes.
 type Muter interface {
 	Mutes(model.LabelSet) bool


Do we still need Mutes(model.LabelSet) bool? Can it be deleted?

MuteStage.Exec() currently depends on Inhibitor implementing Muter.

Oof! Yeah, OK. I would recommend exploring other optimizations (only if you want and have the time to do so) which don't involve changing this interface, as changing the interface is quite an intrusive change and causes a number of problems elsewhere in the code.

Nuckal777 · 2024-09-04T15:28:52Z

Is it possible to optimize inhibition rules without changing the Mutes interface?

I implemented that. In the current state the algorithmic improvement can only be realised when all alerts are passed to MutesAll(). By moving setAlertStatus out of alertFilter the different API handlers can choose how to update it. For getAlertsHandler I used MutesAll() here.

Signed-off-by: Nuckal777 <[email protected]>

grobinson-grafana · 2024-09-04T16:04:38Z

Is it possible to optimize inhibition rules without changing the Mutes interface?

I implemented that. In the current state the algorithmic improvement can only be realised when all alerts are passed to MutesAll().

I looked again at the code, but I'm afraid I don't see where the optimization is implemented for Mutes()? Like you said, the performance optimization only works if you pass all alerts to MutesAll(), and there is no performance improvement if you only call Mutes()? Did I misunderstand?

My suggestion was to see if we can optimize the performance of Mutes without having to pass all alerts to MutesAll(). For example, what if the cache is persistent, like matcherCache for silences?

Nuckal777 · 2024-09-06T08:35:44Z

there is no performance improvement if you only call Mutes()?

Exactly. The whole speed improvement comes from caching the source cache evaluation of InhibitRules across multiple alerts. In the context of inhibitions, that's the whole set of potential target alerts. For MutesAll() it roughly boils down to:

for _, r := range ih.rules {
    var alerts []*types.Alert // <- reused in the inner loop
    var scacheEval []bool // <- reused in the inner loop
    for i, lset := range lsets {
        // ...
        if inhibitedByFP, eq := r.hasEqualCached(lset, r.SourceMatchers.Matches(lset), alerts, scacheEval); eq {
            ih.marker.SetInhibited(fingerprints[i], inhibitedByFP.String())
        }
        // ...
    }
}

Just using Mutes() flips the two loops. That looks roughly like:

func someOtherFunc() {
    // ...
    for _, a := range alerts {
        inhibitor.Mutes(a.Labels) // <- evaluates the source cache for each a
    }
}

func (ih *Inhibitor) Mutes(lset model.LabelSet) bool {
    // ...
    for _, r := range ih.rules {
        // ...
        if inhibitedByFP, eq := r.hasEqual(lset, r.SourceMatchers.Matches(lset)); eq { // <- always evaluates the source cache matches
            ih.marker.SetInhibited(fp, inhibitedByFP.String())
            return true
        }
    }
}

Caching the source cache evaluations "requires" to pass a list of alerts. If the signature change/addition is undesirable, we can close this PR. It doesn't work without. Technically, the state MutesAll() caches can be externalized into a struct, which implements the Muter interface, e.g.:

type CachingInhibitor {
    base Inhibitor
    var alerts [][]*types.Alert // one cache for each inhibition rule
    var scacheEval [][]bool // one cache for each inhibition rule
}

Externalizing the state would cost more memory, likely requires locking and requires explicitly dropping these caches at some point (likely after calling Mutes() a few times).

My suggestion was to see if we can optimize the performance of Mutes without having to pass all alerts to MutesAll(). For example, what if the cache is persistent, like matcherCache for silences?

I expect something in that direction is possible as well, but it's something for different PR.

Nuckal777 mentioned this pull request Aug 1, 2024

Inhibition rules evaluation performance on many target alerts #3932

Open

Nuckal777 added 7 commits August 1, 2024 17:03

Add inhibt benchmarks for more target alerts

5d990c1

Signed-off-by: Nuckal777 <[email protected]>

Add benchmark for many inhibition rules and alerts

4b14fbb

Signed-off-by: Nuckal777 <[email protected]>

Reduce inhibit count in benchmarks

d096502

Signed-off-by: Nuckal777 <[email protected]>

Add Inhibitor.MutesAll

447fa41

Signed-off-by: Nuckal777 <[email protected]>

Memoize scache evaluations

19c994e

Signed-off-by: Nuckal777 <[email protected]>

Cache fingerprints in Inhibitor.MutesAll

c3f3c97

Signed-off-by: Nuckal777 <[email protected]>

Return early in Inhibitor.MutesAll when all alerts are muted

2bc2738

Signed-off-by: Nuckal777 <[email protected]>

Nuckal777 force-pushed the inhibit-mutes-all branch from 4cdbffc to 2bc2738 Compare August 1, 2024 15:03

grobinson-grafana reviewed Aug 20, 2024

View reviewed changes

Nuckal777 force-pushed the inhibit-mutes-all branch from e389e08 to 9a4ddc0 Compare August 26, 2024 18:27

grobinson-grafana reviewed Sep 4, 2024

View reviewed changes

Nuckal777 force-pushed the inhibit-mutes-all branch from 9a4ddc0 to f901906 Compare September 4, 2024 14:50

Nuckal777 force-pushed the inhibit-mutes-all branch from f901906 to b68d3cb Compare September 4, 2024 15:29

Use inhibitor.MutesAll() in GET alerts handler

8b5fbfa

Signed-off-by: Nuckal777 <[email protected]>

Nuckal777 force-pushed the inhibit-mutes-all branch from b68d3cb to 8b5fbfa Compare September 4, 2024 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Inhibitor.MutesAll() #3933

Add Inhibitor.MutesAll() #3933

Nuckal777 commented Aug 1, 2024

grobinson-grafana left a comment

Nuckal777 commented Aug 22, 2024

Nuckal777 commented Aug 27, 2024

grobinson-grafana left a comment

grobinson-grafana Sep 4, 2024

Nuckal777 Sep 4, 2024

grobinson-grafana Sep 4, 2024

Nuckal777 commented Sep 4, 2024

grobinson-grafana commented Sep 4, 2024

Nuckal777 commented Sep 6, 2024 •

edited

Loading

Add Inhibitor.MutesAll() #3933

Are you sure you want to change the base?

Add Inhibitor.MutesAll() #3933

Conversation

Nuckal777 commented Aug 1, 2024

grobinson-grafana left a comment

Choose a reason for hiding this comment

Nuckal777 commented Aug 22, 2024

Nuckal777 commented Aug 27, 2024

grobinson-grafana left a comment

Choose a reason for hiding this comment

grobinson-grafana Sep 4, 2024

Choose a reason for hiding this comment

Nuckal777 Sep 4, 2024

Choose a reason for hiding this comment

grobinson-grafana Sep 4, 2024

Choose a reason for hiding this comment

Nuckal777 commented Sep 4, 2024

grobinson-grafana commented Sep 4, 2024

Nuckal777 commented Sep 6, 2024 • edited Loading

Nuckal777 commented Sep 6, 2024 •

edited

Loading