-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-bucket aggregator wrapper is slow and uses a ton of memory #56487
Labels
:Analytics/Aggregations
Aggregations
>enhancement
Meta
release highlight
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
Comments
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
elasticmachine
added
the
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
label
May 9, 2020
nik9000
changed the title
Multi-bucket aggregator removal
Multi-bucket aggregator wrapper is slow!
May 9, 2020
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this issue
May 11, 2020
This makes a `parentCardinality` available to every `Aggregator`'s ctor so it can make intelligent choices about how it collects bucket values. This replaces `collectsFromSingleBucket` and is similar to it but: 1. It supports `NONE`, `ONE`, and `MANY` values and is generally extensible if we decide we can use more precise counts. 2. It is more accurate. `collectsFromSingleBucket` assumed that all sub-aggregations live under multi-bucket aggregations. This is normally true but `parentCardinality` is properly carried forward for single bucket aggregations like `filter` and for multi-bucket aggregations configured in single-bucket for like `range` with a single range. While I was touching every aggregation I renamed `doCreateInternal` to `createMapped` because that seemed like a much better name and it was right there, next to the change I was already making. Relates to elastic#56487
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this issue
May 13, 2020
This merges the code for the `significant_terms` agg into the package for the code for the `terms` agg. They are *super* entangled already, this mostly just admits that to ourselves. Precondition for the terms work in elastic#56487
nik9000
added a commit
that referenced
this issue
May 13, 2020
This merges the code for the `significant_terms` agg into the package for the code for the `terms` agg. They are *super* entangled already, this mostly just admits that to ourselves. Precondition for the terms work in #56487
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this issue
May 13, 2020
…6699) This merges the code for the `significant_terms` agg into the package for the code for the `terms` agg. They are *super* entangled already, this mostly just admits that to ourselves. Precondition for the terms work in elastic#56487
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this issue
May 18, 2020
When `date_histogram` is a sub-aggregator it used to allocate a bunch of objects for every one of it's parent's buckets. This uses the data structures that we built in elastic#55873 rework the `date_histogram` aggregator instead of all of the allocation. Part of elastic#56487
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this issue
May 19, 2020
When `date_histogram` is a sub-aggregator it used to allocate a bunch of objects for every one of it's parent's buckets. This uses the data structures that we built in elastic#55873 rework the `date_histogram` aggregator instead of all of the allocation. Part of elastic#56487
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this issue
May 28, 2020
This rebuilds `auto_date_histogram`'s aggregator to function without `asMultiBucketAggregator` which should save a significant amount of memory when it is not the top most aggregator. It isn't possible to "just port the aggregator" without taking a pretty significant performance hit because we used to rewrite all of the buckets every time we switched to a coarser and coarser rounding configuration. Without some major surgery to how to delay sub-aggs we'd end up rewriting the delay list zillions of time if there are many buckets. This change replaces the constant rewrites with a "budget" of "wasted" buckets and only rewrites all of the buckets when we exceed that budget. Now that we don't rebucket every time we increase the rounding we can no longer get an accurate count of the number of buckets! So instead the aggregator uses an esimate of the number of buckets to trigger switching to a coarser rounding. This estimate is likely to be *terrible* when buckets are far apart compared to the rounding. So it also uses the difference between the first and last bucket to trigger switching to a coarser rounding. Which covers for the shortcomings of the bucket estimation technique pretty well. It also causes the aggregator to emit fewer buckets in cases where they'd be reduced together on the coordinating node. This is wonderful! But probably fairly rare. After all that, it amounts to about the same performance, in the benchmarks that I've run. But the memory savings is totaly still at thing! Relates to elastic#56487
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this issue
May 31, 2020
Merges the remaining implementation of `significant_terms` into `terms` so that we can more easilly make them work properly without `asMultiBucketAggregator` which *should* save memory and speed them up. Relates elastic#56487
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this issue
Jun 25, 2020
`descendsFromBucketAggregator` was important before we removed `asMultiBucketAggregator` but now that it is gone `collectsFromSingleBucket` is good enough. Relates to elastic#56487
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this issue
Jun 26, 2020
I've always been confused by the strange behavior that I saw when working on elastic#57304. Specifically, I saw switching from a bimorphic invocation to a monomorphic invocation to give us a 7%-15% performance bump. This felt *bonkers* to me. And, it also made me wonder whether it'd be worth looking into doing it everywhere. It turns out that, no, it isn't needed everywhere. This benchmark shows that a bimorphic invocation like: ``` LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` is 19% slower than a monomorphic invocation like: ``` LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` But *only* when the reference is mutable. In the example above, if `ords` is never changed then both perform the same. But if the `ords` reference is assigned twice then we start to see the difference: ``` immutable bimorphic avgt 10 6.468 ± 0.045 ns/op immutable monomorphic avgt 10 6.756 ± 0.026 ns/op mutable bimorphic avgt 10 9.741 ± 0.073 ns/op mutable monomorphic avgt 10 8.190 ± 0.016 ns/op ``` So the conclusion from all this is that we've done the right thing: `auto_date_histogram` is the only aggregation in which `ords` isn't final and it is the only aggregation that forces monomorphic invocations. All other aggregations use an immutable bimorphic invocation. Which is fine. Relates to elastic#56487
nik9000
added a commit
that referenced
this issue
Jun 26, 2020
`descendsFromBucketAggregator` was important before we removed `asMultiBucketAggregator` but now that it is gone `collectsFromSingleBucket` is good enough. Relates to #56487 Co-authored-by: Elastic Machine <[email protected]>
nik9000
changed the title
Multi-bucket aggregator wrapper is slow!
Multi-bucket aggregator wrapper is slow and uses a ton of memory
Jun 30, 2020
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this issue
Jul 6, 2020
We did it in elastic#56487.
nik9000
added a commit
that referenced
this issue
Jul 6, 2020
This makes a `parentCardinality` available to every `Aggregator`'s ctor so it can make intelligent choices about how it collects bucket values. This replaces `collectsFromSingleBucket` and is similar to it but: 1. It supports `NONE`, `ONE`, and `MANY` values and is generally extensible if we decide we can use more precise counts. 2. It is more accurate. `collectsFromSingleBucket` assumed that all sub-aggregations live under multi-bucket aggregations. This is normally true but `parentCardinality` is properly carried forward for single bucket aggregations like `filter` and for multi-bucket aggregations configured in single-bucket for like `range` with a single range. While I was touching every aggregation I renamed `doCreateInternal` to `createMapped` because that seemed like a much better name and it was right there, next to the change I was already making. Relates to #56487
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this issue
Jul 6, 2020
This makes a `parentCardinality` available to every `Aggregator`'s ctor so it can make intelligent choices about how it collects bucket values. This replaces `collectsFromSingleBucket` and is similar to it but: 1. It supports `NONE`, `ONE`, and `MANY` values and is generally extensible if we decide we can use more precise counts. 2. It is more accurate. `collectsFromSingleBucket` assumed that all sub-aggregations live under multi-bucket aggregations. This is normally true but `parentCardinality` is properly carried forward for single bucket aggregations like `filter` and for multi-bucket aggregations configured in single-bucket for like `range` with a single range. While I was touching every aggregation I renamed `doCreateInternal` to `createMapped` because that seemed like a much better name and it was right there, next to the change I was already making. Relates to elastic#56487
This was referenced Jul 6, 2020
nik9000
added a commit
that referenced
this issue
Jul 8, 2020
This makes a `parentCardinality` available to every `Aggregator`'s ctor so it can make intelligent choices about how it collects bucket values. This replaces `collectsFromSingleBucket` and is similar to it but: 1. It supports `NONE`, `ONE`, and `MANY` values and is generally extensible if we decide we can use more precise counts. 2. It is more accurate. `collectsFromSingleBucket` assumed that all sub-aggregations live under multi-bucket aggregations. This is normally true but `parentCardinality` is properly carried forward for single bucket aggregations like `filter` and for multi-bucket aggregations configured in single-bucket for like `range` with a single range. While I was touching every aggregation I renamed `doCreateInternal` to `createMapped` because that seemed like a much better name and it was right there, next to the change I was already making. Relates to #56487 Co-authored-by: Elastic Machine <[email protected]>
nik9000
added a commit
that referenced
this issue
Jul 13, 2020
I've always been confused by the strange behavior that I saw when working on #57304. Specifically, I saw switching from a bimorphic invocation to a monomorphic invocation to give us a 7%-15% performance bump. This felt *bonkers* to me. And, it also made me wonder whether it'd be worth looking into doing it everywhere. It turns out that, no, it isn't needed everywhere. This benchmark shows that a bimorphic invocation like: ``` LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` is 19% slower than a monomorphic invocation like: ``` LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` But *only* when the reference is mutable. In the example above, if `ords` is never changed then both perform the same. But if the `ords` reference is assigned twice then we start to see the difference: ``` immutable bimorphic avgt 10 6.468 ± 0.045 ns/op immutable monomorphic avgt 10 6.756 ± 0.026 ns/op mutable bimorphic avgt 10 9.741 ± 0.073 ns/op mutable monomorphic avgt 10 8.190 ± 0.016 ns/op ``` So the conclusion from all this is that we've done the right thing: `auto_date_histogram` is the only aggregation in which `ords` isn't final and it is the only aggregation that forces monomorphic invocations. All other aggregations use an immutable bimorphic invocation. Which is fine. Relates to #56487
nik9000
added a commit
to nik9000/elasticsearch
that referenced
this issue
Jul 13, 2020
I've always been confused by the strange behavior that I saw when working on elastic#57304. Specifically, I saw switching from a bimorphic invocation to a monomorphic invocation to give us a 7%-15% performance bump. This felt *bonkers* to me. And, it also made me wonder whether it'd be worth looking into doing it everywhere. It turns out that, no, it isn't needed everywhere. This benchmark shows that a bimorphic invocation like: ``` LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` is 19% slower than a monomorphic invocation like: ``` LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` But *only* when the reference is mutable. In the example above, if `ords` is never changed then both perform the same. But if the `ords` reference is assigned twice then we start to see the difference: ``` immutable bimorphic avgt 10 6.468 ± 0.045 ns/op immutable monomorphic avgt 10 6.756 ± 0.026 ns/op mutable bimorphic avgt 10 9.741 ± 0.073 ns/op mutable monomorphic avgt 10 8.190 ± 0.016 ns/op ``` So the conclusion from all this is that we've done the right thing: `auto_date_histogram` is the only aggregation in which `ords` isn't final and it is the only aggregation that forces monomorphic invocations. All other aggregations use an immutable bimorphic invocation. Which is fine. Relates to elastic#56487
nik9000
added a commit
that referenced
this issue
Jul 13, 2020
I've always been confused by the strange behavior that I saw when working on #57304. Specifically, I saw switching from a bimorphic invocation to a monomorphic invocation to give us a 7%-15% performance bump. This felt *bonkers* to me. And, it also made me wonder whether it'd be worth looking into doing it everywhere. It turns out that, no, it isn't needed everywhere. This benchmark shows that a bimorphic invocation like: ``` LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` is 19% slower than a monomorphic invocation like: ``` LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` But *only* when the reference is mutable. In the example above, if `ords` is never changed then both perform the same. But if the `ords` reference is assigned twice then we start to see the difference: ``` immutable bimorphic avgt 10 6.468 ± 0.045 ns/op immutable monomorphic avgt 10 6.756 ± 0.026 ns/op mutable bimorphic avgt 10 9.741 ± 0.073 ns/op mutable monomorphic avgt 10 8.190 ± 0.016 ns/op ``` So the conclusion from all this is that we've done the right thing: `auto_date_histogram` is the only aggregation in which `ords` isn't final and it is the only aggregation that forces monomorphic invocations. All other aggregations use an immutable bimorphic invocation. Which is fine. Relates to #56487
11 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Analytics/Aggregations
Aggregations
>enhancement
Meta
release highlight
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
Before 7.9.0 many of our more complex aggregations made a simplifying assumption that required that they duplicate many data structures once per bucket that contained them. The most expensive of these weighed in at a couple of kilobytes each. So for an aggregation like:
When run over three years spends a couple of megabytes just on bucket accounting. More deeply nested aggregations spend even more on this overhead. And 7.9.0 removes all of it which should allow us to run better in lower memory environments.
As a bonus we wrote quite a few Rally benchmarks for aggs to make sure that these tests didn't slow down aggregations. So we can think much more scientifically about aggregation performance. The benchmarks suggest that these changes don't affect simple aggregation trees and speed up complex aggregation trees of similar or higher depth than the example above. Your actual performance changes will vary but it this should help! 🤞
EDIT:
Everything above the EDIT mark was added when I tagged this
release highlight
so it could be more easily understood in context.#55873 removed the "multi-bucket wrapper" from the numeric terms aggregator and showed that we can get a pretty substantial performance improvement in some common aggregation requests. This will track work to remove the wrapper for other aggregations because:
terms
(Fix casting of scaled_float in sorts #57207 + Fold some of sig_terms into terms #57361 Merge remaining sig_terms into terms #57397 + Fix an optimization in terms agg #57438 + Save memory when string terms are not on top #57758)significant_terms
(Save memory on numeric sig terms when not top #56789 + Fix casting of scaled_float in sorts #57207 + Fold some of sig_terms into terms #57361 + Merge remaining sig_terms into terms #57397 + Fix an optimization in terms agg #57438 + Save memory when string terms are not on top #57758)rare_terms
(Save memory when rare_terms is not on top #57948)date_histogram
(Save memory when date_histogram is not on top #56921)auto_date_histogram
(Save memory when auto_date_histogram is not on top #57304)histogram
(Save memory when histogram agg is not on top #57277)parent
(Make parent and child aggregator more obvious #57490 + Save memory when parent and child are not on top #57892)child
(Make parent and child aggregator more obvious #57490 + Save memory when parent and child are not on top #57892)geohash_grid
(Same memory when geo aggregations are not on top #57483)geotile_grid
(Same memory when geo aggregations are not on top #57483)scripted_metric
(Remove deprecated wrapper from scripted_metric #57627)significant_text
(Give significance lookups their own home #57903 + Save memory when significant_text is not on top #58145)After this is all done we can:
significant_terms
's "funny" reference back to its factory for caching. We won't need it because they'll only ever be one aggregator so it can cache. (Give significance lookups their own home #57903)Look into non-Moved to Make sure all significant memory usage in aggs are tracked in BigArrays #59892BigArrays
backed memory usage in aggs. This is more important now that we don't get the 5k "artificial" value added to the breaker per bucket.descendsFromBucketAggregator(parent)
withcollectsFromSingleBucket
. (Remove useless aggregation helper #58571)The text was updated successfully, but these errors were encountered: