Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pint check to disallow topk or bottomk in recording rules #820

Closed
wbollock opened this issue Dec 15, 2023 · 3 comments · Fixed by #1157
Closed

Add pint check to disallow topk or bottomk in recording rules #820

wbollock opened this issue Dec 15, 2023 · 3 comments · Fixed by #1157

Comments

@wbollock
Copy link

I believe a new check to warn users about the use of topk or bottomk in recording rules would be useful, as those rules will typically churn quite often and create many new time series.

This is fairly opinionated and could vary a lot of on the nature of the metric being aggregated so interested in discussion and definitely think it fits as a non-default rule

@wbollock
Copy link
Author

cc: @prymitive let me know what you think. happy to contribute this

wbollock added a commit to wbollock/pint that referenced this issue May 24, 2024
This adds a new check to warn against the use of `topk` or `bottomk` in
recording rules. This is an anti-pattern as these operators lead to high
churn as the time series the recording rule generates will change
frequently as the conditions for topk/bottomk adjust.

It is enabled by default with a warning severity. It will only fire for
recording rules, not alerting rules.

Resolves cloudflare#820
@prymitive
Copy link
Collaborator

prymitive commented Jun 4, 2024

As mentioned in #985 (comment) I don't believe that just throwing warnings every time someone uses topk is a valid approach.
Churn in recording rules might be undesired but there's nothing fundamentally wrong with it.
With alerting rules it's different, I agree that if someone has a rule like:

- alert: ...
  expr: topk(10, foo)

then it is likely to cause flapping alerts, since topk() might return different time series on each evaluation. But one can stabilise the results by stripping churning labels, for example by stripping all labels and reporting min or max value: min(topk(10, foo)).
So this would need a bit more advanced logic to find only queries that "leak" labels from topk() into final results with no constrains and don't use for: or keep_firing_for: to avoid churn.

@prymitive
Copy link
Collaborator

I need a function that gives me the source of labels in a query and this check is a good use case for testing it - so expect it to be added in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants