You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is after fetching all required postings. If we got OOM killed here, we didn't even start fetching series and chunks, only postings are fetched.
I checked that for such query, we fetched only 247MB postings, this is pretty small compared to series and chunk size, which means if I set the downloaded bytes limit to protect us, I have to set the value to < 247MB, which is unrealistic considering the series and chunk size.
What you expected to happen:
I have a limit that could limit posting size only, series and chunk size don't need to be included.
How to reproduce it (as minimally and precisely as possible):
Run the same query in a environment with a relatively large number of timeseries.
Full logs to relevant components:
Anything else we need to know:
The text was updated successfully, but these errors were encountered:
I'd love to gather some feedbacks here first before I start the implementation. What about having a dedicated postings limit in store gateway? As mentioned in the issue, the downloaded bytes limit won't help much when store gateway got OOM kill during postings expansion.
The limit can be total postings to fetch either in bytes or number (number of postings). Bytes is easier to implement.
What happened:
User ran a very high cardinality query like
count(count_over_time({__name__=~".+"}[30d]))
and it OOM killed several store gateways.I am testing whether the downloaded bytes limit could help this case or not. But when I am doing some heap profiling, I found that most of the memory spent on
index.ExpandPostings
function https://github.com/thanos-io/thanos/blob/main/pkg/store/bucket.go#L2303.This is after fetching all required postings. If we got OOM killed here, we didn't even start fetching series and chunks, only postings are fetched.
I checked that for such query, we fetched only 247MB postings, this is pretty small compared to series and chunk size, which means if I set the downloaded bytes limit to protect us, I have to set the value to < 247MB, which is unrealistic considering the series and chunk size.
What you expected to happen:
I have a limit that could limit posting size only, series and chunk size don't need to be included.
How to reproduce it (as minimally and precisely as possible):
Run the same query in a environment with a relatively large number of timeseries.
Full logs to relevant components:
Anything else we need to know:
The text was updated successfully, but these errors were encountered: