add health status endpoint for monitoring #185

MalKeshar · 2022-05-17T14:14:01Z

Allow other monitoring software(like zabbix and so on) to check graphite-clickhouse health via standalone health check endpoint.
Similiar to https://prometheus.io/docs/prometheus/latest/management_api/#health-check

Felixoid · 2022-05-17T14:21:30Z

graphite-clickhouse.go

@@ -198,6 +198,10 @@ func main() {
 		}
 		w.Write(b)
 	})
+	mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {


In my opinion here it doesn't make much sense. I always used /render?query=some.metrics.definitely.does.not.exist&from=-1min or similar for health check. It checks the metric exists and does an actual query to storage.

This handler looks more like /alive, it doesn't check for the database behind or so

I agree, that "alive" is preferable option. Will change it. Still think, that standalone point for alive check is needed.
Because "/render?query=some.metrics.definitely.does.not.exist&from=-1min" is functional check and this not the same as "basic health check", usually used for monitoring daemon health itself.

With this check if backend dies, we got two alerts instead of one. If I need overall pipeline check, I will use query to carbonapi

@Felixoid, напишу лучше на русском, на английском мне такие тексты сложновато писать. Путаница между мониторингом на живость самого сервиса и функциональных проверок на мой взгляд обычно приводит к тому, что при выходе одного сервиса случается шторм алертов. Вот у нас есть цепочка сервисов
clickhouse->graphite-clichouse->carbonapi. Если мы везде добавляем функциональные проверки вместо простых проверок на живость отдельных сервисов, то при выходе из строя clickhouse у нас придёт три алерта вместо одного. Если у нас более сложные зависимости с большим количеством сервисов, то это превращается в ад.

Оно, в целом, совершенно валидно при прочих равных. Но от того же балансера, к примеру, по этой ручке я бы отстреливать не стал, потому что она проверяет живость сервера, но не его работу. Для балансера нужна функциональная ручка.

Оно, в целом, совершенно валидно при прочих равных. Но от того же балансера, к примеру, по этой ручке я бы отстреливать не стал, потому что она проверяет живость сервера, но не его работу. Для балансера нужна функциональная ручка.

Да, мой основной тезис не про то, что функциональных проверок не должно быть. Для разных сценариев нужны и те и другие. Спасибо за принятый пул-реквест!

MalKeshar · 2022-05-17T14:29:48Z

@Felixoid, I made it because since version 0.13.2 our previous health check no longer working:

# curl -I 'http://xxx:9090/render/?target=test&from=0&until=0'
HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
X-Gch-Request-Id: 4d5b40c0b0c4f1abad4932787ea2303a
Date: Tue, 17 May 2022 14:25:32 GMT
Content-Length: 126

In the logs error:
"format is not supported, supported formats: carbonapi_v3_pb, pickle, protobuf (aka carbonapi_v2_pb)

Was found via
#118

Felixoid · 2022-05-17T14:45:20Z

Yep, you can add format=pickle, should work

MalKeshar · 2022-05-17T14:50:33Z

Yep, you can add format=pickle, should work

I not sure it is good idea to rely on thing, which can break in any time.

Felixoid · 2022-05-17T15:32:09Z

I don't think one would remove an introduced format. It's one of the default graphite data exchange formats, and graphite-clickhouse must support it in favor of compatibility with graphite-web ¯\_(ツ)_/¯

upd
the query /render/?target=some.metric&from=0&until=0 is sent to a frontend, so either graphite-web or carbonapi. It works there and checks the whole chain direct to storage. For graphite-clickhouse additional format=X should be used

add health status endpoint for monitoring

910687f

Felixoid reviewed May 17, 2022

View reviewed changes

change name of endpoint to alive

17eaa50

Felixoid merged commit 1bdb4a9 into go-graphite:master May 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add health status endpoint for monitoring #185

add health status endpoint for monitoring #185

MalKeshar commented May 17, 2022

Felixoid May 17, 2022

MalKeshar May 17, 2022

MalKeshar May 17, 2022 •

edited

Loading

Felixoid May 17, 2022

MalKeshar May 17, 2022

MalKeshar commented May 17, 2022 •

edited

Loading

Felixoid commented May 17, 2022

MalKeshar commented May 17, 2022 •

edited

Loading

Felixoid commented May 17, 2022 •

edited

Loading

add health status endpoint for monitoring #185

add health status endpoint for monitoring #185

Conversation

MalKeshar commented May 17, 2022

Felixoid May 17, 2022

Choose a reason for hiding this comment

MalKeshar May 17, 2022

Choose a reason for hiding this comment

MalKeshar May 17, 2022 • edited Loading

Choose a reason for hiding this comment

Felixoid May 17, 2022

Choose a reason for hiding this comment

MalKeshar May 17, 2022

Choose a reason for hiding this comment

MalKeshar commented May 17, 2022 • edited Loading

Felixoid commented May 17, 2022

MalKeshar commented May 17, 2022 • edited Loading

Felixoid commented May 17, 2022 • edited Loading

MalKeshar May 17, 2022 •

edited

Loading

MalKeshar commented May 17, 2022 •

edited

Loading

MalKeshar commented May 17, 2022 •

edited

Loading

Felixoid commented May 17, 2022 •

edited

Loading