Multi-bucket aggregator wrapper is slow and uses a ton of memory #56487

nik9000 · 2020-05-09T12:18:41Z

Before 7.9.0 many of our more complex aggregations made a simplifying assumption that required that they duplicate many data structures once per bucket that contained them. The most expensive of these weighed in at a couple of kilobytes each. So for an aggregation like:

POST _search
{
  "aggs": {
    "date": {
      "date_histogram": { "field": "timestamp", "calendar_interval": "day" },
      "aggs": {
        "ips": {
          "terms": { "field": "ip" }
        }
      }
    }
  }
}

When run over three years spends a couple of megabytes just on bucket accounting. More deeply nested aggregations spend even more on this overhead. And 7.9.0 removes all of it which should allow us to run better in lower memory environments.

As a bonus we wrote quite a few Rally benchmarks for aggs to make sure that these tests didn't slow down aggregations. So we can think much more scientifically about aggregation performance. The benchmarks suggest that these changes don't affect simple aggregation trees and speed up complex aggregation trees of similar or higher depth than the example above. Your actual performance changes will vary but it this should help! 🤞

EDIT:
Everything above the EDIT mark was added when I tagged this release highlight so it could be more easily understood in context.

#55873 removed the "multi-bucket wrapper" from the numeric terms aggregator and showed that we can get a pretty substantial performance improvement in some common aggregation requests. This will track work to remove the wrapper for other aggregations because:

I expect we can get a similar or better performance improvement for each one.
The wrapper makes it very difficult to reason about aggregations.
This will give us a good excuse to add rally tracks for these aggregations.

After this is all done we can:

Remove significant_terms's "funny" reference back to its factory for caching. We won't need it because they'll only ever be one aggregator so it can cache. (Give significance lookups their own home #57903)
~~Look into non-BigArrays backed memory usage in aggs. This is more important now that we don't get the 5k "artificial" value added to the breaker per bucket.~~ Moved to Make sure all significant memory usage in aggs are tracked in BigArrays #59892
Replace descendsFromBucketAggregator(parent) with collectsFromSingleBucket. (Remove useless aggregation helper #58571)
Look into replacing "lego-ed" data structures with purpose built ones. (7.10: Allocate slightly less per bucket #59740)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-05-09T12:18:45Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

This makes a `parentCardinality` available to every `Aggregator`'s ctor so it can make intelligent choices about how it collects bucket values. This replaces `collectsFromSingleBucket` and is similar to it but: 1. It supports `NONE`, `ONE`, and `MANY` values and is generally extensible if we decide we can use more precise counts. 2. It is more accurate. `collectsFromSingleBucket` assumed that all sub-aggregations live under multi-bucket aggregations. This is normally true but `parentCardinality` is properly carried forward for single bucket aggregations like `filter` and for multi-bucket aggregations configured in single-bucket for like `range` with a single range. While I was touching every aggregation I renamed `doCreateInternal` to `createMapped` because that seemed like a much better name and it was right there, next to the change I was already making. Relates to elastic#56487

This merges the code for the `significant_terms` agg into the package for the code for the `terms` agg. They are *super* entangled already, this mostly just admits that to ourselves. Precondition for the terms work in elastic#56487

This merges the code for the `significant_terms` agg into the package for the code for the `terms` agg. They are *super* entangled already, this mostly just admits that to ourselves. Precondition for the terms work in #56487

…6699) This merges the code for the `significant_terms` agg into the package for the code for the `terms` agg. They are *super* entangled already, this mostly just admits that to ourselves. Precondition for the terms work in elastic#56487

…56715) This merges the code for the `significant_terms` agg into the package for the code for the `terms` agg. They are *super* entangled already, this mostly just admits that to ourselves. Precondition for the terms work in #56487

When `date_histogram` is a sub-aggregator it used to allocate a bunch of objects for every one of it's parent's buckets. This uses the data structures that we built in elastic#55873 rework the `date_histogram` aggregator instead of all of the allocation. Part of elastic#56487

When `date_histogram` is a sub-aggregator it used to allocate a bunch of objects for every one of it's parent's buckets. This uses the data structures that we built in #55873 rework the `date_histogram` aggregator instead of all of the allocation. Part of #56487

When `date_histogram` is a sub-aggregator it used to allocate a bunch of objects for every one of it's parent's buckets. This uses the data structures that we built in elastic#55873 rework the `date_histogram` aggregator instead of all of the allocation. Part of elastic#56487

When `date_histogram` is a sub-aggregator it used to allocate a bunch of objects for every one of it's parent's buckets. This uses the data structures that we built in #55873 rework the `date_histogram` aggregator instead of all of the allocation. Part of #56487

This rebuilds `auto_date_histogram`'s aggregator to function without `asMultiBucketAggregator` which should save a significant amount of memory when it is not the top most aggregator. It isn't possible to "just port the aggregator" without taking a pretty significant performance hit because we used to rewrite all of the buckets every time we switched to a coarser and coarser rounding configuration. Without some major surgery to how to delay sub-aggs we'd end up rewriting the delay list zillions of time if there are many buckets. This change replaces the constant rewrites with a "budget" of "wasted" buckets and only rewrites all of the buckets when we exceed that budget. Now that we don't rebucket every time we increase the rounding we can no longer get an accurate count of the number of buckets! So instead the aggregator uses an esimate of the number of buckets to trigger switching to a coarser rounding. This estimate is likely to be *terrible* when buckets are far apart compared to the rounding. So it also uses the difference between the first and last bucket to trigger switching to a coarser rounding. Which covers for the shortcomings of the bucket estimation technique pretty well. It also causes the aggregator to emit fewer buckets in cases where they'd be reduced together on the coordinating node. This is wonderful! But probably fairly rare. After all that, it amounts to about the same performance, in the benchmarks that I've run. But the memory savings is totaly still at thing! Relates to elastic#56487

Merges the remaining implementation of `significant_terms` into `terms` so that we can more easilly make them work properly without `asMultiBucketAggregator` which *should* save memory and speed them up. Relates elastic#56487

`descendsFromBucketAggregator` was important before we removed `asMultiBucketAggregator` but now that it is gone `collectsFromSingleBucket` is good enough. Relates to elastic#56487

I've always been confused by the strange behavior that I saw when working on elastic#57304. Specifically, I saw switching from a bimorphic invocation to a monomorphic invocation to give us a 7%-15% performance bump. This felt *bonkers* to me. And, it also made me wonder whether it'd be worth looking into doing it everywhere. It turns out that, no, it isn't needed everywhere. This benchmark shows that a bimorphic invocation like: ``` LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` is 19% slower than a monomorphic invocation like: ``` LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` But *only* when the reference is mutable. In the example above, if `ords` is never changed then both perform the same. But if the `ords` reference is assigned twice then we start to see the difference: ``` immutable bimorphic avgt 10 6.468 ± 0.045 ns/op immutable monomorphic avgt 10 6.756 ± 0.026 ns/op mutable bimorphic avgt 10 9.741 ± 0.073 ns/op mutable monomorphic avgt 10 8.190 ± 0.016 ns/op ``` So the conclusion from all this is that we've done the right thing: `auto_date_histogram` is the only aggregation in which `ords` isn't final and it is the only aggregation that forces monomorphic invocations. All other aggregations use an immutable bimorphic invocation. Which is fine. Relates to elastic#56487

`descendsFromBucketAggregator` was important before we removed `asMultiBucketAggregator` but now that it is gone `collectsFromSingleBucket` is good enough. Relates to #56487 Co-authored-by: Elastic Machine <[email protected]>

We did it in elastic#56487.

We did it in #56487.

This makes a `parentCardinality` available to every `Aggregator`'s ctor so it can make intelligent choices about how it collects bucket values. This replaces `collectsFromSingleBucket` and is similar to it but: 1. It supports `NONE`, `ONE`, and `MANY` values and is generally extensible if we decide we can use more precise counts. 2. It is more accurate. `collectsFromSingleBucket` assumed that all sub-aggregations live under multi-bucket aggregations. This is normally true but `parentCardinality` is properly carried forward for single bucket aggregations like `filter` and for multi-bucket aggregations configured in single-bucket for like `range` with a single range. While I was touching every aggregation I renamed `doCreateInternal` to `createMapped` because that seemed like a much better name and it was right there, next to the change I was already making. Relates to #56487

This makes a `parentCardinality` available to every `Aggregator`'s ctor so it can make intelligent choices about how it collects bucket values. This replaces `collectsFromSingleBucket` and is similar to it but: 1. It supports `NONE`, `ONE`, and `MANY` values and is generally extensible if we decide we can use more precise counts. 2. It is more accurate. `collectsFromSingleBucket` assumed that all sub-aggregations live under multi-bucket aggregations. This is normally true but `parentCardinality` is properly carried forward for single bucket aggregations like `filter` and for multi-bucket aggregations configured in single-bucket for like `range` with a single range. While I was touching every aggregation I renamed `doCreateInternal` to `createMapped` because that seemed like a much better name and it was right there, next to the change I was already making. Relates to elastic#56487

This makes a `parentCardinality` available to every `Aggregator`'s ctor so it can make intelligent choices about how it collects bucket values. This replaces `collectsFromSingleBucket` and is similar to it but: 1. It supports `NONE`, `ONE`, and `MANY` values and is generally extensible if we decide we can use more precise counts. 2. It is more accurate. `collectsFromSingleBucket` assumed that all sub-aggregations live under multi-bucket aggregations. This is normally true but `parentCardinality` is properly carried forward for single bucket aggregations like `filter` and for multi-bucket aggregations configured in single-bucket for like `range` with a single range. While I was touching every aggregation I renamed `doCreateInternal` to `createMapped` because that seemed like a much better name and it was right there, next to the change I was already making. Relates to #56487 Co-authored-by: Elastic Machine <[email protected]>

I've always been confused by the strange behavior that I saw when working on #57304. Specifically, I saw switching from a bimorphic invocation to a monomorphic invocation to give us a 7%-15% performance bump. This felt *bonkers* to me. And, it also made me wonder whether it'd be worth looking into doing it everywhere. It turns out that, no, it isn't needed everywhere. This benchmark shows that a bimorphic invocation like: ``` LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` is 19% slower than a monomorphic invocation like: ``` LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` But *only* when the reference is mutable. In the example above, if `ords` is never changed then both perform the same. But if the `ords` reference is assigned twice then we start to see the difference: ``` immutable bimorphic avgt 10 6.468 ± 0.045 ns/op immutable monomorphic avgt 10 6.756 ± 0.026 ns/op mutable bimorphic avgt 10 9.741 ± 0.073 ns/op mutable monomorphic avgt 10 8.190 ± 0.016 ns/op ``` So the conclusion from all this is that we've done the right thing: `auto_date_histogram` is the only aggregation in which `ords` isn't final and it is the only aggregation that forces monomorphic invocations. All other aggregations use an immutable bimorphic invocation. Which is fine. Relates to #56487

I've always been confused by the strange behavior that I saw when working on elastic#57304. Specifically, I saw switching from a bimorphic invocation to a monomorphic invocation to give us a 7%-15% performance bump. This felt *bonkers* to me. And, it also made me wonder whether it'd be worth looking into doing it everywhere. It turns out that, no, it isn't needed everywhere. This benchmark shows that a bimorphic invocation like: ``` LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` is 19% slower than a monomorphic invocation like: ``` LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` But *only* when the reference is mutable. In the example above, if `ords` is never changed then both perform the same. But if the `ords` reference is assigned twice then we start to see the difference: ``` immutable bimorphic avgt 10 6.468 ± 0.045 ns/op immutable monomorphic avgt 10 6.756 ± 0.026 ns/op mutable bimorphic avgt 10 9.741 ± 0.073 ns/op mutable monomorphic avgt 10 8.190 ± 0.016 ns/op ``` So the conclusion from all this is that we've done the right thing: `auto_date_histogram` is the only aggregation in which `ords` isn't final and it is the only aggregation that forces monomorphic invocations. All other aggregations use an immutable bimorphic invocation. Which is fine. Relates to elastic#56487

I've always been confused by the strange behavior that I saw when working on #57304. Specifically, I saw switching from a bimorphic invocation to a monomorphic invocation to give us a 7%-15% performance bump. This felt *bonkers* to me. And, it also made me wonder whether it'd be worth looking into doing it everywhere. It turns out that, no, it isn't needed everywhere. This benchmark shows that a bimorphic invocation like: ``` LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` is 19% slower than a monomorphic invocation like: ``` LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle(); ords.add(0, 0); <------ this line ``` But *only* when the reference is mutable. In the example above, if `ords` is never changed then both perform the same. But if the `ords` reference is assigned twice then we start to see the difference: ``` immutable bimorphic avgt 10 6.468 ± 0.045 ns/op immutable monomorphic avgt 10 6.756 ± 0.026 ns/op mutable bimorphic avgt 10 9.741 ± 0.073 ns/op mutable monomorphic avgt 10 8.190 ± 0.016 ns/op ``` So the conclusion from all this is that we've done the right thing: `auto_date_histogram` is the only aggregation in which `ords` isn't final and it is the only aggregation that forces monomorphic invocations. All other aggregations use an immutable bimorphic invocation. Which is fine. Relates to #56487

nik9000 added :Analytics/Aggregations Aggregations Meta labels May 9, 2020

nik9000 self-assigned this May 9, 2020

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 9, 2020

nik9000 changed the title ~~Multi-bucket aggregator removal~~ Multi-bucket aggregator wrapper is slow! May 9, 2020

nik9000 mentioned this issue May 11, 2020

Improve cardinality measure used to build aggs #56533

Merged

nik9000 mentioned this issue May 13, 2020

Merge significant_terms into the terms package #56699

Merged

nik9000 mentioned this issue May 13, 2020

Merge significant_terms into the terms package (backport of #56699) #56715

Merged

nik9000 mentioned this issue May 18, 2020

Save memory when date_histogram is not on top #56921

Merged

nik9000 mentioned this issue May 19, 2020

Save memory when date_histogram is not on top (#56921) #56960

Merged

nik9000 mentioned this issue May 27, 2020

Make global ords terms simpler to understand #57241

Merged

nik9000 mentioned this issue May 28, 2020

Save memory when auto_date_histogram is not on top #57304

Merged

nik9000 mentioned this issue May 31, 2020

Merge remaining sig_terms into terms #57397

Merged

nik9000 mentioned this issue Jun 25, 2020

Remove useless aggregation helper (backport of #58571) #58578

Merged

nik9000 mentioned this issue Jun 26, 2020

Add microbenchmark for LongKeyedBucketOrds #58608

Merged

nik9000 changed the title ~~Multi-bucket aggregator wrapper is slow!~~ Multi-bucket aggregator wrapper is slow and uses a ton of memory Jun 30, 2020

nik9000 added release highlight v7.9.0 labels Jun 30, 2020

nik9000 added a commit to nik9000/elasticsearch that referenced this issue Jul 6, 2020

Drop a TODO from the terms aggregator

f164ab5

We did it in elastic#56487.

nik9000 mentioned this issue Jul 6, 2020

Drop a TODO from the terms aggregator #59100

Merged

nik9000 added a commit that referenced this issue Jul 6, 2020

Drop a TODO from the terms aggregator (#59100)

3493fba

We did it in #56487.

nik9000 added a commit that referenced this issue Jul 6, 2020

Drop a TODO from the terms aggregator (#59100)

be13dea

We did it in #56487.

This was referenced Jul 6, 2020

Improve cardinality measure used to build aggs (backport of #56533) #59107

Merged

Aggregations can be bottlenecked on ChildMemoryCircuitBreaker.limit() #58647

Closed

nik9000 mentioned this issue Jul 13, 2020

Add microbenchmark for LongKeyedBucketOrds (backport of #58608) #59459

Merged

pugnascotia added v7.10.0 and removed v7.9.0 labels Jul 15, 2020

nik9000 mentioned this issue Jul 20, 2020

Make sure all significant memory usage in aggs are tracked in BigArrays #59892

Open

11 tasks

nik9000 closed this as completed Jul 20, 2020

andreidan added >enhancement v7.10.0 and removed v7.10.0 labels Oct 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-bucket aggregator wrapper is slow and uses a ton of memory #56487

Multi-bucket aggregator wrapper is slow and uses a ton of memory #56487

nik9000 commented May 9, 2020 •

edited

Loading

elasticmachine commented May 9, 2020

Multi-bucket aggregator wrapper is slow and uses a ton of memory #56487

Multi-bucket aggregator wrapper is slow and uses a ton of memory #56487

Comments

nik9000 commented May 9, 2020 • edited Loading

elasticmachine commented May 9, 2020

nik9000 commented May 9, 2020 •

edited

Loading