Store-gateway: high memory allocations caused by per-tenant Prometheus registry #102

grafanabot · 2021-08-10T18:12:01Z

Describe the bug
To be able to use Thanos BucketStore while supporting Cortex multi-tenancy we need to create a BucketStore for each tenant, passing a dedicated Prometheus registry to each one and then aggregate metrics from all registries.

Due to this, the Prometheus metrics collection causes high memory allocations (order of 50MB/s in a store-gateway with 7.5K tenants). Allocated memory is not retained, but still puts pressure on GC.

In a cluster with low QPS, 95% store-gateway memory allocations are caused by metrics collecting.

Submitted by: pracucci
Cortex Issue Number: 3697

grafanabot · 2021-08-10T18:12:03Z

Enabling shuffle-sharding on store-gateway significantly improve this.

Submitted by: pracucci

pracucci · 2021-08-11T15:06:26Z

More data points from a store-gateway loading blocks from 13k tenants.

CPU

Memory allocations (bytes)

grafanabot added storage/blocks type/performance component/store-gateway labels Aug 10, 2021

pracucci mentioned this issue Aug 11, 2021

Simplify store gateway metrics #123

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store-gateway: high memory allocations caused by per-tenant Prometheus registry #102

Store-gateway: high memory allocations caused by per-tenant Prometheus registry #102

grafanabot commented Aug 10, 2021

grafanabot commented Aug 10, 2021

pracucci commented Aug 11, 2021

Store-gateway: high memory allocations caused by per-tenant Prometheus registry #102

Store-gateway: high memory allocations caused by per-tenant Prometheus registry #102

Comments

grafanabot commented Aug 10, 2021

grafanabot commented Aug 10, 2021

pracucci commented Aug 11, 2021

CPU

Memory allocations (bytes)