-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Inhibitor.MutesAll() #3933
base: main
Are you sure you want to change the base?
Add Inhibitor.MutesAll() #3933
Conversation
Signed-off-by: Nuckal777 <[email protected]>
Signed-off-by: Nuckal777 <[email protected]>
Signed-off-by: Nuckal777 <[email protected]>
Signed-off-by: Nuckal777 <[email protected]>
Signed-off-by: Nuckal777 <[email protected]>
Signed-off-by: Nuckal777 <[email protected]>
Signed-off-by: Nuckal777 <[email protected]>
4cdbffc
to
2bc2738
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about how you intend to use MutesAll
. I see you have updated the tests and the benchmarks to use the new MutesAll
function, but you don't call it anywhere other than tests? Do you intend to call it from MuteStage
as well as APIv2? The PR seems incomplete without it?
Yes.
I thought I check the general interest for the topic before implementing everything. I will add it. Might take a few days. |
e389e08
to
9a4ddc0
Compare
I added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! 👋 Thanks for making these changes! However, I'm afraid it confirms what I suspected. This change has some weird effects on existing code that I don't think should be accepted. For example:
GroupFunc
now returns a list of booleans because alertFilter
no longer operates on individual alerts, but slices of alerts:
GroupFunc func(func(*dispatch.Route) bool, func([]*types.Alert, time.Time) []bool) (dispatch.AlertGroups, map[model.Fingerprint][]string)
setAlertStatus
now accepts a list of label sets, which doesn't make sense as the function sets the status for an individual alert.
setAlertStatusFn func(...prometheus_model.LabelSet)
You then have to build slices of label sets and index into the returned slices in notify/notify.go
.
Is it possible to optimize inhibition rules without changing the Mutes
interface?
types/types.go
Outdated
@@ -472,6 +472,7 @@ func (a *Alert) Merge(o *Alert) *Alert { | |||
// Mutes. | |||
type Muter interface { | |||
Mutes(model.LabelSet) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need Mutes(model.LabelSet) bool
? Can it be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MuteStage.Exec()
currently depends on Inhibitor
implementing Muter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof! Yeah, OK. I would recommend exploring other optimizations (only if you want and have the time to do so) which don't involve changing this interface, as changing the interface is quite an intrusive change and causes a number of problems elsewhere in the code.
9a4ddc0
to
f901906
Compare
I implemented that. In the current state the algorithmic improvement can only be realised when all alerts are passed to |
f901906
to
b68d3cb
Compare
Signed-off-by: Nuckal777 <[email protected]>
b68d3cb
to
8b5fbfa
Compare
I looked again at the code, but I'm afraid I don't see where the optimization is implemented for My suggestion was to see if we can optimize the performance of |
Exactly. The whole speed improvement comes from caching the source cache evaluation of for _, r := range ih.rules {
var alerts []*types.Alert // <- reused in the inner loop
var scacheEval []bool // <- reused in the inner loop
for i, lset := range lsets {
// ...
if inhibitedByFP, eq := r.hasEqualCached(lset, r.SourceMatchers.Matches(lset), alerts, scacheEval); eq {
ih.marker.SetInhibited(fingerprints[i], inhibitedByFP.String())
}
// ...
}
} Just using func someOtherFunc() {
// ...
for _, a := range alerts {
inhibitor.Mutes(a.Labels) // <- evaluates the source cache for each a
}
}
func (ih *Inhibitor) Mutes(lset model.LabelSet) bool {
// ...
for _, r := range ih.rules {
// ...
if inhibitedByFP, eq := r.hasEqual(lset, r.SourceMatchers.Matches(lset)); eq { // <- always evaluates the source cache matches
ih.marker.SetInhibited(fp, inhibitedByFP.String())
return true
}
}
} Caching the source cache evaluations "requires" to pass a list of alerts. If the signature change/addition is undesirable, we can close this PR. It doesn't work without. Technically, the state type CachingInhibitor {
base Inhibitor
var alerts [][]*types.Alert // one cache for each inhibition rule
var scacheEval [][]bool // one cache for each inhibition rule
} Externalizing the state would cost more memory, likely requires locking and requires explicitly dropping these caches at some point (likely after calling
I expect something in that direction is possible as well, but it's something for different PR. |
For the context please refer to #3932.
I added benchmarks, which consider more than one target alert.
The following benchstat output compares:
Inhibitor.Mutes()
for each target alert (old.txt)Inhibitor.MutesAll()
implementation (new.txt).Inhibitor.MutesAll()
increase the speed in certain cases at the cost of some memory.In cases where a low count of alerts is involved
Inhibitor.MutesAll()
is 2-3 times slower, due to the additional memory allocations.This should still be sufficiently fast given the small scale.
When the count of target alerts is increased
Inhibitor.MutesAll()
becomes faster.Invoking it as part of
GET /alerts
still needs to be done.