Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Gateway: downloaded bytes limit doesn't help for expand postings OOM kill #6470

Open
yeya24 opened this issue Jun 26, 2023 · 1 comment

Comments

@yeya24
Copy link
Contributor

yeya24 commented Jun 26, 2023

What happened:

User ran a very high cardinality query like count(count_over_time({__name__=~".+"}[30d])) and it OOM killed several store gateways.

I am testing whether the downloaded bytes limit could help this case or not. But when I am doing some heap profiling, I found that most of the memory spent on index.ExpandPostings function https://github.com/thanos-io/thanos/blob/main/pkg/store/bucket.go#L2303.

This is after fetching all required postings. If we got OOM killed here, we didn't even start fetching series and chunks, only postings are fetched.

I checked that for such query, we fetched only 247MB postings, this is pretty small compared to series and chunk size, which means if I set the downloaded bytes limit to protect us, I have to set the value to < 247MB, which is unrealistic considering the series and chunk size.

What you expected to happen:

I have a limit that could limit posting size only, series and chunk size don't need to be included.

How to reproduce it (as minimally and precisely as possible):

Run the same query in a environment with a relatively large number of timeseries.

Full logs to relevant components:

Anything else we need to know:

@yeya24
Copy link
Contributor Author

yeya24 commented Jul 10, 2023

I'd love to gather some feedbacks here first before I start the implementation. What about having a dedicated postings limit in store gateway? As mentioned in the issue, the downloaded bytes limit won't help much when store gateway got OOM kill during postings expansion.
The limit can be total postings to fetch either in bytes or number (number of postings). Bytes is easier to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants