Enable sharding compact instances by prometheus shard #1245

mattrco · 2019-06-11T17:25:19Z

We have some very large prometheus shards which take a long time to compact and downsample (especially if there's catching up to do).

This PR adds two new flags, shard.include and shard.exclude which specify prometheus shards to filter blocks by. Filtering is done when the syncer builds a slice of metadata about blocks in the bucket.

Each flag can be specified multiple times (but including both at once is not valid, I thought this would over-complicate things and increase the chance of mistakes in config).

@bwplotka I think we chatted about this briefly last week :)

I'm currently testing this on a production-like workload.

bwplotka

Thanks for this! I love the idea ❤️ but I would propose different flag type and name. See comment below:

I think something like this is great, but I have 2 suggestions:

I would not tie it to Prometheus. There can be many sources of blocks, including Thanos-receiver. This being said it should be agnosting to Prometheus. It should be agnostic to sharding as well as again, it unnecessarly ties things. I would literally match of external-labels of source blocks. As we do literally that on compactor. It's tied to TSDB blocks. So e.g repeated block-external-label-matcher flag would be nice?

I would not reinvent the wheel in terms how to specify this "string". Especially this form does not specify what exactly we include / exclude on. Pod name? replica name? external labels? UIDs? (: I would stick to matchers because that is what we do, we query the blocks like metrics (where each block is single series). We can even reuse internal code for github.com/prometheus/[email protected]/labels/selector.go labels.Matcher. So matcher flag could take literally {cluster=~"monzo-super-cluster-(east|west)", zone="abc"}.

Let me know what are you thoughts (:

bwplotka · 2019-06-11T18:15:35Z

cmd/thanos/compact.go

@@ -110,6 +110,10 @@ func registerCompact(m map[string]setupFunc, app *kingpin.Application, name stri
 	compactionConcurrency := cmd.Flag("compact.concurrency", "Number of goroutines to use when compacting groups.").
 		Default("1").Int()

+	includeShards := cmd.Flag("shard.include", "Prometheus shard to compact. May be specified multiple times. Cannot be used with --shards.exclude.").Strings()


I think something like this is great, but I have 2 suggestions:

I would not tie it to Prometheus. There can be many sources of blocks, including Thanos-receiver. This being said it should be agnosting to Prometheus. It should be agnostic to sharding as well as again, it unnecessarly ties things. I would literally match of external-labels of source blocks. As we do literally that on compactor. It's tied to TSDB blocks. So e.g repeated block-external-label-matcher flag would be nice?

I would not reinvent the wheel in terms how to specify this "string". Especially this form does not specify what exactly we include / exclude on. Pod name? replica name? external labels? UIDs? (: I would stick to matchers because that is what we do, we query the blocks like metrics (where each block is single series). We can even reuse internal code for github.com/prometheus/[email protected]/labels/selector.go labels.Matcher. So matcher flag could take literally {cluster=~"monzo-super-cluster-(east|west)", zone="abc"}. What do you think?

In this manner we don't need include/exclude as well (:

This sounds good - I'm happy not to tie it to prometheus shard and reuse matcher code 👍

brancz · 2019-06-12T15:19:07Z

What's the use of doing functional sharding here? How about we do consistent hashing over the sets of external labels instead?

bwplotka · 2019-06-12T15:36:14Z

Essentially you have single bucket for too many sources and compactor is too slow to cope with work. (:

bwplotka · 2019-06-12T15:38:19Z

@brancz what would be benefit of consistent sharding vs manual specification? I think manual is also nice if you want to have control over what to do first or maybe one compactor being bigger than other for different number of sources / bigger sources.

brancz · 2019-06-12T15:45:50Z

Consistent hashing so that the local cache needs minimal change, but generally just use 0, 1, 2, 3,... out of x sharding, so we can build autoscaling mechanisms around this.

bwplotka · 2019-06-12T17:29:04Z

Sure but that will block custom deployments as I mentioned:

if you want to have control over what to do first or maybe one compactor being bigger than other for different number of sources / bigger sources.

Especially that usually users have from 1 to 10 maybe sources (of cource happen to be 100), and not equal ones, so I don't see consistent hashing to be that useful for most of users. Plus you can have consistent hashing done by configuration management as well, but not vice versa (custom deployment when we are tied to consistent hashing). Happy to discuss though (:

We might want both as well, but it's again, it's some complexity.

brancz · 2019-06-12T19:35:01Z

We might want both as well

I’d be ok with that. In that case this sounds a lot like we want relabeling...

mattrco · 2019-06-13T10:54:21Z

I like the suggestion of consistent hashing too, although as @bwplotka points out there's a slightly different use case for each kind of sharding.

I'm happy to go ahead with this method and we could potentially replace it or add a consistent hashing scheme later on.

bwplotka · 2019-06-13T11:49:48Z

@brancz looks like we all agree to have both.

In this case your comment:

In that case this sounds a lot like we want relabeling...

How this affects the idea I proposed here? (matchers):
#1245 (review)

I think instead of matchers, we literally do relabelling in form of:

As input we have list of external labels (can be one empty set) we see from blocks in object storage.
We do relabelling. We need some form of target label like __address__ that points to what we actually end up doing (if not dropped). I guess __external_labels__?
We dedup on __external_labels__ obviously as well

I think I like this although it's complex if someone is new to this - I think that's fair for this feature though. Thoughts @mattrco ?

cc @brian-brazil would be nice to have your view on this (:

brian-brazil · 2019-06-13T11:52:27Z

The hashmod action is how this is done in Prometheus, so I'd suggest looking at that with keep/drop rather than coming with a new way of hashing.

brancz · 2019-06-13T12:40:24Z

Yeah the consistent hashing mechanism would really just be so the local disk doesn't have to be re-populated entirely when re-sharding. I'd be perfectly happy to start with the hashmod approach that relabeling offers for sharding. We need to think a bit about vertical compaction here though, so I would say the selection should apply to the result block instead of the input blocks. So to me it looks like we should pass the potential result external labels of a block to be compacted into a relabeling config and based on the config decide whether to keep or not.

We do relabelling. We need some form of target label like address that points to what we actually end up doing (if not dropped). I guess external_labels?

Relabeling itself doesn't care about any special labels. It's just an input of a set of labels and an output of a set of labels, and if the output set is empty, it's dropped. So I don't think this is needed.

The only slightly weird thing is that relabeling would allow re-writing labels, instead of just keep/drop, which may not be a desirable feature.

brian-brazil · 2019-06-13T12:43:42Z

The only slightly weird thing is that relabeling would allow re-writing labels, instead of just keep/drop, which may not be a desirable feature.

Only if you use those output labels, rather than merely checking for existence.

brancz · 2019-06-13T13:03:38Z

Fair. I think relabeling is a good fit when done that way.

bwplotka · 2019-06-14T15:17:45Z

Relabeling itself doesn't care about any special labels. It's just an input of a set of labels and an output of a set of labels, and if the output set is empty, it's dropped. So I don't think this is needed.
The only slightly weird thing is that relabeling would allow re-writing labels, instead of just keep/drop, which may not be a desirable feature.

Yea, you are right. But indeed we have quite not obvious cases here, so we need to agree on clear semantics here. What happens if:

input a=1,b=2 output a=2 (because well, you can rewrite). What to do? It means keep? It means keep all that matches a=2? (I think the former)
input output (yes - that can be the case, although it's catched by newest Thanos sidecar. But in theory it can be a source without any external labels). I think it's fair to drop it.

brancz · 2019-06-14T15:35:08Z

Correct. Relabeling would just be used for selection. If the set is non empty after relabeling is done, then it’s kept.

bwplotka · 2019-08-27T14:12:47Z

@jojohappy I think this is prefered for all components, including store: #1059

bwplotka · 2019-08-27T15:20:23Z

The only problem I can see (or at least something to agree on) is how to provide such configuration.. It has to be some YAML I guess..

jojohappy · 2019-08-27T16:16:49Z

@mattrco @brancz @bwplotka I like the relabeling too. IMO, we can try to do like objstore config file and re-use the relabeling code from prometheus, for example, Loki.

bwplotka · 2019-08-27T16:32:04Z

Fair let's aim for this then. Are you happy to adjust store @jojohappy ?

jojohappy · 2019-08-28T01:49:36Z

Are you happy to adjust store?

Yes, I do.

Lord-Y · 2019-09-05T10:13:18Z

Any news about fixing the conflicts?

bwplotka · 2019-09-06T08:40:14Z

It's not only conflicts, but we might also need to design and agree and consistent approach. We are working on this (:

@mattrco are you happy to close this in the meantime to stop confusion? I will start some PR with the design it would be also awesome if you would join us in making this if you want (:

bwplotka · 2019-09-07T03:02:44Z

#1501

bwplotka reviewed Jun 11, 2019

View reviewed changes

bwplotka mentioned this pull request Jun 12, 2019

Store: Add time & duration based partitioning #1077

Closed

Enable sharding compact instances by prometheus shard

0aead72

mattrco force-pushed the mattrco/shard-compact branch from 412ca02 to 0aead72 Compare June 24, 2019 16:40

This was referenced Sep 4, 2019

Store: selector labels from store #1059

Closed

Reduce store memory usage by reworking block.indexCache #1471

Closed

bwplotka closed this Sep 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable sharding compact instances by prometheus shard #1245

Enable sharding compact instances by prometheus shard #1245

mattrco commented Jun 11, 2019

bwplotka left a comment •

edited

Loading

bwplotka Jun 11, 2019

mattrco Jun 13, 2019

brancz commented Jun 12, 2019

bwplotka commented Jun 12, 2019

bwplotka commented Jun 12, 2019 •

edited

Loading

brancz commented Jun 12, 2019

bwplotka commented Jun 12, 2019 •

edited

Loading

brancz commented Jun 12, 2019

mattrco commented Jun 13, 2019

bwplotka commented Jun 13, 2019 •

edited

Loading

brian-brazil commented Jun 13, 2019

brancz commented Jun 13, 2019

brian-brazil commented Jun 13, 2019

brancz commented Jun 13, 2019

bwplotka commented Jun 14, 2019

brancz commented Jun 14, 2019

bwplotka commented Aug 27, 2019

bwplotka commented Aug 27, 2019

jojohappy commented Aug 27, 2019

bwplotka commented Aug 27, 2019

jojohappy commented Aug 28, 2019

Lord-Y commented Sep 5, 2019

bwplotka commented Sep 6, 2019 •

edited

Loading

bwplotka commented Sep 7, 2019

Enable sharding compact instances by prometheus shard #1245

Enable sharding compact instances by prometheus shard #1245

Conversation

mattrco commented Jun 11, 2019

bwplotka left a comment • edited Loading

Choose a reason for hiding this comment

bwplotka Jun 11, 2019

Choose a reason for hiding this comment

mattrco Jun 13, 2019

Choose a reason for hiding this comment

brancz commented Jun 12, 2019

bwplotka commented Jun 12, 2019

bwplotka commented Jun 12, 2019 • edited Loading

brancz commented Jun 12, 2019

bwplotka commented Jun 12, 2019 • edited Loading

brancz commented Jun 12, 2019

mattrco commented Jun 13, 2019

bwplotka commented Jun 13, 2019 • edited Loading

brian-brazil commented Jun 13, 2019

brancz commented Jun 13, 2019

brian-brazil commented Jun 13, 2019

brancz commented Jun 13, 2019

bwplotka commented Jun 14, 2019

brancz commented Jun 14, 2019

bwplotka commented Aug 27, 2019

bwplotka commented Aug 27, 2019

jojohappy commented Aug 27, 2019

bwplotka commented Aug 27, 2019

jojohappy commented Aug 28, 2019

Lord-Y commented Sep 5, 2019

bwplotka commented Sep 6, 2019 • edited Loading

bwplotka commented Sep 7, 2019

bwplotka left a comment •

edited

Loading

bwplotka commented Jun 12, 2019 •

edited

Loading

bwplotka commented Jun 12, 2019 •

edited

Loading

bwplotka commented Jun 13, 2019 •

edited

Loading

bwplotka commented Sep 6, 2019 •

edited

Loading