From a2d7067f0268eaf312481983732e378a2cb01b7e Mon Sep 17 00:00:00 2001 From: milen <94537774+taratorio@users.noreply.github.com> Date: Thu, 16 Nov 2023 16:30:37 +0000 Subject: [PATCH] metrics: switch to using prometheus library (#8741) # Background Erigon currently uses a combination of Victoria Metrics and Prometheus client for providing metrics. We want to rationalize this and use only the Prometheus client library, but we want to maintain the simplified Victoria Metrics methods for constructing metrics. This task is currently partly complete and needs to be finished to a stage where we can remove the Victoria Metrics module from the Erigon code base. ## Tests ### Functional * Make sure that the format change int->float implied by VM to Prometheus does not impact clients (pay particular attention to block numbers) * Check that the prometheus/grafana dashboards defined in cmd/prometheus are functional after the change (see docker-compose.yml for details and https://github.com/ledgerwatch/erigon/tree/devel/cmd/prometheus#readme) * Confirm that the underlying go metrics are still generated * Confirm the following flags setting work: --metrics, --metrics.addr, --metrics.port with the new code * Confirm that --metrics and --proff settings and handlers configuration still allow metrics and pprof to share a port #### Float counters - scientific notation test case ![Screenshot_2023-11-07_at_15 57 21](https://github.com/ledgerwatch/erigon/assets/94537774/32f0a6f6-968b-477c-8ec8-bb1812f3e848) ![Screenshot 2023-11-15 at 16 26 56](https://github.com/ledgerwatch/erigon/assets/94537774/3f402b2e-e343-4928-9fbb-18fa4d077485) #### Float counters - NaN test case ![Screenshot_2023-11-07_at_16 04 25](https://github.com/ledgerwatch/erigon/assets/94537774/cbf90d5d-3749-4bd7-971d-e2124e54267c) ![Screenshot 2023-11-15 at 16 28 36](https://github.com/ledgerwatch/erigon/assets/94537774/5924915e-1977-4b7f-8082-23f73d0957d5) ### Performance * Check the performance of counters created by RPC calls measurements created by rpc/metrics.go are not impacted by the change. #### RPC Performed tests on rpcdaemon & erigon on localhost using `etc_blockNumber`. Did tests with 100, 1000, 10000 requests. Got a steady 15 ms response time. #### Memory ![Screenshot 2023-11-16 at 09 58 39](https://github.com/ledgerwatch/erigon/assets/94537774/5dd956d7-903f-4bea-a460-d3644da56201) --- metrics/register.go | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/metrics/register.go b/metrics/register.go index 0839816cb29..5344dc57697 100644 --- a/metrics/register.go +++ b/metrics/register.go @@ -8,7 +8,7 @@ import ( dto "github.com/prometheus/client_model/go" ) -const UsePrometheusClient = false +const UsePrometheusClient = true type Summary interface { UpdateDuration(time.Time)