-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
添加对数据源可用性的检测,当数据源在线时才会对齐进行监控 #2062
Comments
如果突然又可以連上了應不應該又自動重新監測呢? |
数据源都是内网互通吗???我现在使用异地机房的数据源就不能添加。强制验证。 |
需要监控,因此建议是定期去检测数据源可用性 |
数据源是分布在各机房的,不过都可以通过http进行访问 |
大量的私有云集群,一个集群一个Prometheus集群 |
希望能够支持这个需求;目前面临着大量的因为数据源异常导致错误的发送告警恢复的通知,且难以知道数据源是否正常,如果要实现这类需求,还需要采用其他类似uptime-kuma的工具做集成 |
现在的代码逻辑确实是查不到数据了就恢复,但是前提是查询请求本身没有报错,你确定在你的环境里出现了:数据源已经连不上了即查询失败,仍然报了恢复? |
What would you like to be added:
当数据源不可用时候,n9e仍然会对不可用数据源对应的rule进行监测,这带来一定的消耗,是否可考虑仅对在线的数据源进行监测?
Why is this needed:
目前管理了200+ Prometheus数据源,其中有很多数据源会时常断开,导致n9e 日志大量的Error
The text was updated successfully, but these errors were encountered: