-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add absent alerts #50
Conversation
@devopsjonas Thanks for the contribution. I was curious to know on how you tested your patches ? |
@anmolsachan Sure we do it's in prod for one of our clients, I'll add a screenshot for Prometheus UI if that works for you? Should I do it for all of my PRs and future PRs? |
@devopsjonas That would be great |
config.libsonnet
Outdated
@@ -3,6 +3,22 @@ | |||
// Selectors are inserted between {} in Prometheus queries. | |||
cephExporterSelector: 'job="rook-ceph-mgr"', | |||
|
|||
// Number of Ceph Managers which are reporting metrics | |||
cephMgrCount: 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we always expecting 3 MGR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the alert checks if x
number of replicas are running.
So cephMgrCount
is a config option, where users set how many cephMgrs
they are running. It is a config option, so users are expect to change it.
So should we change the default? If yes, what should the number be? Or drop alert entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From ceph team I heard that one of the mons ceph-mgr runs?
@leseb does Rook provide and option to decide the no of manager pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want this config to be there. Lets have it as optional config. We can comment it out and if during the actual deployment the user can uncomment it. @shtripat @devopsjonas what do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly it should not be default configuration I feel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, instead we should just do sum(up{%(cephExporterSelector)s}) < %(cephMgrCount)d
and set it to 1
by default. So that becomes sum(up{job="ceph-mgr"}) < 1
, which means if we are not scraping metrics we would get alerted. This would work for everyone. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the default to be 1 and alert condition to fire than it's less thant that.
@shtripat @anmolsachan please take another look. Thank you! |
@devopsjonas Any specific reason why the alerts are in different groups and not in the same alerts group say |
nice catch 🥇 fixed |
@shtripat @anmolsachan please take another look 🙇♂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Merging this now. Please add another PR for unit tests. |
This is part of moving away from https://github.com/devopyio/ceph-monitoring-mixin to this community repo :)
This is how generated alerting rules look: