Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure all significant memory usage in aggs are tracked in BigArrays #59892

Open
3 of 11 tasks
nik9000 opened this issue Jul 20, 2020 · 3 comments
Open
3 of 11 tasks
Labels
:Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >tech debt

Comments

@nik9000
Copy link
Member

nik9000 commented Jul 20, 2020

When we did #56487 we decided that it was important to do an inventory of all of the memory that aggregations allocate that is not part of BigArrays. We'd like to get everything tracked so we're less reliant on the real memory breaker catching stuff.

  • DeferringBucketCollector subclasses aren't tracked.
  • matrix_stats's RunningStats has a bunch of HashMaps that aren't being tracked properly. It looks like they don't grow a lot, but if you put in a high enough cardinality agg it could get messy.
  • string_stats has Map<Character, LongArray> would could end up taking up a fair bit of untracked space if under a high cardinality agg and there are a bunch of characters. English's would see that Map has 64ish entries. Japanese and Chinese look like they'd consistently see a couple thousand entries in the Map. And the array won't work at all for things that aren't on the BMP like Emoji and Egyptian Hieroglyphs and a few unlucky languages.
  • top_hits will create a bunch of Collectors which aren't tracked by BigArrays. They are all fairly careful with memory, but it could use a bit and we aren't tracking it.
  • TDigestState, HDR histogram and friends look like they can use a fair bit of untracked memory. We could probably track a max for it or something like that (see also Integrate TDigestState with circuit breakers #99815). For HDR, the HDR histogram library needs to be forked first (Fork HdrHistogram library #95904)
  • HyperLogLogPlusPlus has an OpenBitSet which has the same behavior as our BitArray but it isn't backed to BigArrays. Use standard bit set impl in cardinality #61816
  • DoubleHistogram and friends are also untracked.
  • ScriptedMetric is totally untracked and frankly terrifying.
  • filters's "compatible" collector can realize a bunch of bit sets in memory. Trigger parent circuit breaker when building scorers in filters aggregation #102511
  • The reduction phase is all java object based (see Enable Circuit Breaker tracking in more parts of the aggregations framework #89437)
  • While global ordinals memory usage is tracked, the process of building them isn't Check the real memory circuit breaker when building global ordinals #102462
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >tech debt
Projects
None yet
Development

No branches or pull requests

4 participants