-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add health status endpoint for monitoring #185
Conversation
graphite-clickhouse.go
Outdated
@@ -198,6 +198,10 @@ func main() { | |||
} | |||
w.Write(b) | |||
}) | |||
mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion here it doesn't make much sense. I always used /render?query=some.metrics.definitely.does.not.exist&from=-1min
or similar for health check. It checks the metric exists and does an actual query to storage.
This handler looks more like /alive
, it doesn't check for the database behind or so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, that "alive" is preferable option. Will change it. Still think, that standalone point for alive check is needed.
Because "/render?query=some.metrics.definitely.does.not.exist&from=-1min" is functional check and this not the same as "basic health check", usually used for monitoring daemon health itself.
With this check if backend dies, we got two alerts instead of one. If I need overall pipeline check, I will use query to carbonapi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Felixoid, напишу лучше на русском, на английском мне такие тексты сложновато писать. Путаница между мониторингом на живость самого сервиса и функциональных проверок на мой взгляд обычно приводит к тому, что при выходе одного сервиса случается шторм алертов. Вот у нас есть цепочка сервисов
clickhouse->graphite-clichouse->carbonapi. Если мы везде добавляем функциональные проверки вместо простых проверок на живость отдельных сервисов, то при выходе из строя clickhouse у нас придёт три алерта вместо одного. Если у нас более сложные зависимости с большим количеством сервисов, то это превращается в ад.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Оно, в целом, совершенно валидно при прочих равных. Но от того же балансера, к примеру, по этой ручке я бы отстреливать не стал, потому что она проверяет живость сервера, но не его работу. Для балансера нужна функциональная ручка.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Оно, в целом, совершенно валидно при прочих равных. Но от того же балансера, к примеру, по этой ручке я бы отстреливать не стал, потому что она проверяет живость сервера, но не его работу. Для балансера нужна функциональная ручка.
Да, мой основной тезис не про то, что функциональных проверок не должно быть. Для разных сценариев нужны и те и другие. Спасибо за принятый пул-реквест!
@Felixoid, I made it because since version 0.13.2 our previous health check no longer working:
In the logs error: Was found via |
Yep, you can add |
I not sure it is good idea to rely on thing, which can break in any time. |
I don't think one would remove an introduced format. It's one of the default graphite data exchange formats, and graphite-clickhouse must support it in favor of compatibility with graphite-web ¯\_(ツ)_/¯ upd |
Allow other monitoring software(like zabbix and so on) to check graphite-clickhouse health via standalone health check endpoint.
Similiar to https://prometheus.io/docs/prometheus/latest/management_api/#health-check