Add metrics for max/min/desired shards to queue manager. #5787

cstyan · 2019-07-19T21:57:19Z

Add metrics for max shards, min shards, and desired shards to queue manager. This will help remote write users determine whether they to raise the max shards in their config.

Signed-off-by: Callum Styan [email protected]

@csmarchbanks
cc @tomwilkie @gouthamve

csmarchbanks · 2019-07-22T22:55:10Z

I don't think this adds much as a Debug log since there is detailed information about the desired shards calculation just a couple lines up. Is it too noisy above the debug level?

cstyan · 2019-07-23T21:45:04Z

@csmarchbanks yeah I suppose this would make sense as warning level as well, especially since the existing debug logging already takes place prior to the check and assignment of desiredShards = maxShards.

beorn7

In think, a warning is fine if this requires operator's attention. In that case, perhaps it would make sense to have a metric for desired and max shards so that it is easy to create an alert for it?

csmarchbanks · 2019-07-29T17:19:37Z

I like the idea of using metrics rather than a warning log. The issue only needs to be brought to an operator's attention if desired shards is greater than max shards for a significant amount of time, so I think individual warnings could be a bit misleading.

cstyan · 2019-07-30T16:10:15Z

I still think logging is useful to more easily see how many shards the calculation wanted to run. I'll add a metric to count the # of times the calculation has been > maxShards.

beorn7 · 2019-07-30T16:33:23Z

Is it important how often the calculation came up with that result? If you had a gauge "desired shards" (updated after each shard calculation) and a fixed-value metric with the configured "max shards", wouldn't that be sufficient? You could still look at it over time.

(I'm not a remote-write expert. My suggestion might be completely off.)

cstyan · 2019-07-30T16:46:44Z

If we were to alert on a metric for this it would likely be something like if "desired shards is greater than max shards" over some period of time. For example we only alert if remote write is behind by a minute or two for more than 15 minutes.

Sounds like a gauge would be sufficient. I don't have a preference either way.

beorn7 · 2019-07-30T21:00:24Z

I would also add a metric for the configured max shards, even if that seems redundant. In some setup, it's hard to get this value from config management, but you want to have it easily available in alerts or even dashboards.

cstyan · 2019-08-01T02:54:57Z

Added metrics, removed the logging.

storage/remote/queue_manager.go

storage/remote/seriesTest/main.go

csmarchbanks

One small comment that doesn't need to be changed. These metrics look helpful to me!

storage/remote/queue_manager.go

Signed-off-by: Callum Styan <[email protected]>

gouthamve · 2019-08-08T13:38:32Z

documentation/prometheus-mixin/alerts.libsonnet

+              > on(job, instance) group_right
+                prometheus_remote_storage_shards_max{%(prometheusSelector)s})
+              )
+              == 1


== 1 requires a bool operator no? I don't think a simple > would work here.

Only for scalar/scalar comparisons. > should work in here. the RHS however also needs a max_over_time.

already be going off, about desired shards being higher than max shards. Signed-off-by: Callum Styan <[email protected]>

cstyan · 2019-08-28T14:08:31Z

@beorn7 @codesome can we merge this?

beorn7

I think you only need to update the description (copy-pasta from different alert).

beorn7 · 2019-08-30T17:48:34Z

documentation/prometheus-mixin/alerts.libsonnet

+            },
+            annotations: {
+              summary: 'Prometheus remote write desired shards calculation wants to run more than configured max shards.',
+              description: 'Prometheus %(prometheusName)s remote write is {{ printf "%%.1f" $value }}s behind for queue {{$labels.queue}}.' % $._config,


This description is from a different alert.

cstyan · 2019-09-04T23:42:55Z

@beorn7 PTAL at the alert. I tested it locally by hardcoding the return value of calculateDesiredShards and setting max_shards to 1 in my config, then checking the annotations on the alerts page. It looks correct to me.

beorn7 · 2019-09-05T10:08:45Z

documentation/prometheus-mixin/alerts.libsonnet

+            },
+            annotations: {
+              summary: 'Prometheus remote write desired shards calculation wants to run more than configured max shards.',
+              description: 'Prometheus %(prometheusName)s remote write desired shards calculation wants to run {{ printf $value }} shards, which is more than the max of {{ printf `prometheus_remote_storage_shards_max{%(prometheusSelector)s}` | query | first | value }}.' % $._config,


The query will just pick the first result, not necessarily the one for the Prometheus server affected. I believe it has to be written in the following way:

description: 'Prometheus %(prometheusName)s remote write desired shards calculation wants to run {{ printf $value }} shards, which is more than the max of {{ printf `prometheus_remote_storage_shards_max{instance="%%s",%(prometheusSelector)s}` $labels.instance | query | first | value }}.' % $._config,

Hmmm okay, is that not what prometheusSelectors is supposed to be doing? I assumed those would contain the label selectors for the instance that triggered the alert.

prometheusSelector selects all Prometheus servers to monitor.

In this case, it would intersect with the alerts firing, i.e. if only one Prometheus server fires this alerts, all is good, but as soon as more than one of the monitored Prometheus servers are affected by this alert, you'll get multiple results in the query.

👍 got it, thank you!

shards in description. Signed-off-by: Callum Styan <[email protected]>

beorn7 reviewed Jul 29, 2019

View reviewed changes

cstyan changed the title ~~Log if desiredShards > maxShards~~ Add metrics for max/min/desired shards to queue manager. Aug 1, 2019

brian-brazil reviewed Aug 1, 2019

View reviewed changes

storage/remote/queue_manager.go Outdated Show resolved Hide resolved

beorn7 reviewed Aug 1, 2019

View reviewed changes

storage/remote/queue_manager.go Outdated Show resolved Hide resolved

storage/remote/queue_manager.go Outdated Show resolved Hide resolved

storage/remote/seriesTest/main.go Outdated Show resolved Hide resolved

beorn7 mentioned this pull request Aug 1, 2019

Working set size increase on startup when remote write enabled #5709

Closed

csmarchbanks approved these changes Aug 1, 2019

View reviewed changes

storage/remote/queue_manager.go Outdated Show resolved Hide resolved

Add metrics for max shards, min shards, and desired shards.

c40a83f

Signed-off-by: Callum Styan <[email protected]>

gouthamve reviewed Aug 8, 2019

View reviewed changes

Add a warning alert, since the remote write behind alert will probably

3b75614

already be going off, about desired shards being higher than max shards. Signed-off-by: Callum Styan <[email protected]>

beorn7 requested changes Aug 30, 2019

View reviewed changes

beorn7 reviewed Sep 5, 2019

View reviewed changes

Update remote write max shards alert; properly template/query for max

a98599b

shards in description. Signed-off-by: Callum Styan <[email protected]>

beorn7 approved these changes Sep 9, 2019

View reviewed changes

beorn7 merged commit 3b3eaf3 into prometheus:master Sep 9, 2019

krasi-georgiev added the kind/enhancement label Sep 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics for max/min/desired shards to queue manager. #5787

Add metrics for max/min/desired shards to queue manager. #5787

cstyan commented Jul 19, 2019 •

edited

Loading

csmarchbanks commented Jul 22, 2019

cstyan commented Jul 23, 2019

beorn7 left a comment

csmarchbanks commented Jul 29, 2019

cstyan commented Jul 30, 2019

beorn7 commented Jul 30, 2019

cstyan commented Jul 30, 2019

beorn7 commented Jul 30, 2019

cstyan commented Aug 1, 2019

csmarchbanks left a comment

gouthamve Aug 8, 2019

brian-brazil Aug 8, 2019 •

edited

Loading

cstyan Aug 8, 2019

cstyan commented Aug 28, 2019

beorn7 left a comment

beorn7 Aug 30, 2019

cstyan commented Sep 4, 2019

beorn7 Sep 5, 2019

cstyan Sep 9, 2019

beorn7 Sep 9, 2019

beorn7 Sep 9, 2019

cstyan Sep 9, 2019

Add metrics for max/min/desired shards to queue manager. #5787

Add metrics for max/min/desired shards to queue manager. #5787

Conversation

cstyan commented Jul 19, 2019 • edited Loading

csmarchbanks commented Jul 22, 2019

cstyan commented Jul 23, 2019

beorn7 left a comment

Choose a reason for hiding this comment

csmarchbanks commented Jul 29, 2019

cstyan commented Jul 30, 2019

beorn7 commented Jul 30, 2019

cstyan commented Jul 30, 2019

beorn7 commented Jul 30, 2019

cstyan commented Aug 1, 2019

csmarchbanks left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brian-brazil Aug 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cstyan commented Aug 28, 2019

beorn7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cstyan commented Sep 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cstyan commented Jul 19, 2019 •

edited

Loading

brian-brazil Aug 8, 2019 •

edited

Loading