-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable sharding compact instances by prometheus shard #1245
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I love the idea ❤️ but I would propose different flag type and name. See comment below:
I think something like this is great, but I have 2 suggestions:
- I would not tie it to Prometheus. There can be many sources of blocks, including Thanos-receiver. This being said it should be agnosting to Prometheus. It should be agnostic to
sharding
as well as again, it unnecessarly ties things. I would literally match ofexternal-labels of source blocks
. As we do literally that on compactor. It's tied to TSDB blocks. So e.g repeatedblock-external-label-matcher
flag would be nice?- I would not reinvent the wheel in terms how to specify this "string". Especially this form does not specify what exactly we include / exclude on. Pod name? replica name? external labels? UIDs? (: I would stick to matchers because that is what we do, we query the blocks like metrics (where each block is single series). We can even reuse internal code for
github.com/prometheus/[email protected]/labels/selector.go
labels.Matcher
. Somatcher
flag could take literally{cluster=~"monzo-super-cluster-(east|west)", zone="abc"}
.
Let me know what are you thoughts (:
@@ -110,6 +110,10 @@ func registerCompact(m map[string]setupFunc, app *kingpin.Application, name stri | |||
compactionConcurrency := cmd.Flag("compact.concurrency", "Number of goroutines to use when compacting groups."). | |||
Default("1").Int() | |||
|
|||
includeShards := cmd.Flag("shard.include", "Prometheus shard to compact. May be specified multiple times. Cannot be used with --shards.exclude.").Strings() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something like this is great, but I have 2 suggestions:
- I would not tie it to Prometheus. There can be many sources of blocks, including Thanos-receiver. This being said it should be agnosting to Prometheus. It should be agnostic to
sharding
as well as again, it unnecessarly ties things. I would literally match ofexternal-labels of source blocks
. As we do literally that on compactor. It's tied to TSDB blocks. So e.g repeatedblock-external-label-matcher
flag would be nice? - I would not reinvent the wheel in terms how to specify this "string". Especially this form does not specify what exactly we include / exclude on. Pod name? replica name? external labels? UIDs? (: I would stick to matchers because that is what we do, we query the blocks like metrics (where each block is single series). We can even reuse internal code for
github.com/prometheus/[email protected]/labels/selector.go
labels.Matcher
. Somatcher
flag could take literally{cluster=~"monzo-super-cluster-(east|west)", zone="abc"}
. What do you think?
In this manner we don't need include/exclude as well (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds good - I'm happy not to tie it to prometheus shard and reuse matcher code 👍
What's the use of doing functional sharding here? How about we do consistent hashing over the sets of external labels instead? |
Essentially you have single bucket for too many sources and compactor is too slow to cope with work. (: |
@brancz what would be benefit of consistent sharding vs manual specification? I think manual is also nice if you want to have control over what to do first or maybe one compactor being bigger than other for different number of sources / bigger sources. |
Consistent hashing so that the local cache needs minimal change, but generally just use 0, 1, 2, 3,... out of x sharding, so we can build autoscaling mechanisms around this. |
Sure but that will block custom deployments as I mentioned:
Especially that usually users have from 1 to 10 maybe sources (of cource happen to be 100), and not equal ones, so I don't see consistent hashing to be that useful for most of users. Plus you can have consistent hashing done by configuration management as well, but not vice versa (custom deployment when we are tied to consistent hashing). Happy to discuss though (: We might want both as well, but it's again, it's some complexity. |
I’d be ok with that. In that case this sounds a lot like we want relabeling... |
I like the suggestion of consistent hashing too, although as @bwplotka points out there's a slightly different use case for each kind of sharding. I'm happy to go ahead with this method and we could potentially replace it or add a consistent hashing scheme later on. |
@brancz looks like we all agree to have both. In this case your comment:
How this affects the idea I proposed here? (matchers): I think instead of matchers, we literally do relabelling in form of:
I think I like this although it's complex if someone is new to this - I think that's fair for this feature though. Thoughts @mattrco ? cc @brian-brazil would be nice to have your view on this (: |
The hashmod action is how this is done in Prometheus, so I'd suggest looking at that with keep/drop rather than coming with a new way of hashing. |
Yeah the consistent hashing mechanism would really just be so the local disk doesn't have to be re-populated entirely when re-sharding. I'd be perfectly happy to start with the hashmod approach that relabeling offers for sharding. We need to think a bit about vertical compaction here though, so I would say the selection should apply to the result block instead of the input blocks. So to me it looks like we should pass the potential result external labels of a block to be compacted into a relabeling config and based on the config decide whether to keep or not.
Relabeling itself doesn't care about any special labels. It's just an input of a set of labels and an output of a set of labels, and if the output set is empty, it's dropped. So I don't think this is needed. The only slightly weird thing is that relabeling would allow re-writing labels, instead of just keep/drop, which may not be a desirable feature. |
Only if you use those output labels, rather than merely checking for existence. |
Fair. I think relabeling is a good fit when done that way. |
Yea, you are right. But indeed we have quite not obvious cases here, so we need to agree on clear semantics here. What happens if:
|
Correct. Relabeling would just be used for selection. If the set is non empty after relabeling is done, then it’s kept. |
412ca02
to
0aead72
Compare
@jojohappy I think this is prefered for all components, including store: #1059 |
The only problem I can see (or at least something to agree on) is how to provide such configuration.. It has to be some YAML I guess.. |
Fair let's aim for this then. Are you happy to adjust store @jojohappy ? |
Yes, I do. |
Any news about fixing the conflicts? |
It's not only conflicts, but we might also need to design and agree and consistent approach. We are working on this (: @mattrco are you happy to close this in the meantime to stop confusion? I will start some PR with the design it would be also awesome if you would join us in making this if you want (: |
We have some very large prometheus shards which take a long time to compact and downsample (especially if there's catching up to do).
This PR adds two new flags,
shard.include
andshard.exclude
which specify prometheus shards to filter blocks by. Filtering is done when the syncer builds a slice of metadata about blocks in the bucket.Each flag can be specified multiple times (but including both at once is not valid, I thought this would over-complicate things and increase the chance of mistakes in config).
@bwplotka I think we chatted about this briefly last week :)
I'm currently testing this on a production-like workload.