Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-bucket aggregator wrapper is slow and uses a ton of memory #56487

Closed
16 tasks done
nik9000 opened this issue May 9, 2020 · 1 comment
Closed
16 tasks done

Multi-bucket aggregator wrapper is slow and uses a ton of memory #56487

nik9000 opened this issue May 9, 2020 · 1 comment
Assignees
Labels
:Analytics/Aggregations Aggregations >enhancement Meta release highlight Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@nik9000
Copy link
Member

nik9000 commented May 9, 2020

Before 7.9.0 many of our more complex aggregations made a simplifying assumption that required that they duplicate many data structures once per bucket that contained them. The most expensive of these weighed in at a couple of kilobytes each. So for an aggregation like:

POST _search
{
  "aggs": {
    "date": {
      "date_histogram": { "field": "timestamp", "calendar_interval": "day" },
      "aggs": {
        "ips": {
          "terms": { "field": "ip" }
        }
      }
    }
  }
}

When run over three years spends a couple of megabytes just on bucket accounting. More deeply nested aggregations spend even more on this overhead. And 7.9.0 removes all of it which should allow us to run better in lower memory environments.

As a bonus we wrote quite a few Rally benchmarks for aggs to make sure that these tests didn't slow down aggregations. So we can think much more scientifically about aggregation performance. The benchmarks suggest that these changes don't affect simple aggregation trees and speed up complex aggregation trees of similar or higher depth than the example above. Your actual performance changes will vary but it this should help! 🤞

EDIT:
Everything above the EDIT mark was added when I tagged this release highlight so it could be more easily understood in context.

#55873 removed the "multi-bucket wrapper" from the numeric terms aggregator and showed that we can get a pretty substantial performance improvement in some common aggregation requests. This will track work to remove the wrapper for other aggregations because:

  1. I expect we can get a similar or better performance improvement for each one.
  2. The wrapper makes it very difficult to reason about aggregations.
  3. This will give us a good excuse to add rally tracks for these aggregations.

After this is all done we can:

@nik9000 nik9000 self-assigned this May 9, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 9, 2020
@nik9000 nik9000 changed the title Multi-bucket aggregator removal Multi-bucket aggregator wrapper is slow! May 9, 2020
nik9000 added a commit to nik9000/elasticsearch that referenced this issue May 11, 2020
This makes a `parentCardinality` available to every `Aggregator`'s ctor
so it can make intelligent choices about how it collects bucket values.
This replaces `collectsFromSingleBucket` and is similar to it but:
1. It supports `NONE`, `ONE`, and `MANY` values and is generally
   extensible if we decide we can use more precise counts.
2. It is more accurate. `collectsFromSingleBucket` assumed that all
   sub-aggregations live under multi-bucket aggregations. This is
   normally true but `parentCardinality` is properly carried forward
   for single bucket aggregations like `filter` and for multi-bucket
   aggregations configured in single-bucket for like `range` with a
   single range.

While I was touching every aggregation I renamed `doCreateInternal` to
`createMapped` because that seemed like a much better name and it was
right there, next to the change I was already making.

Relates to elastic#56487
nik9000 added a commit to nik9000/elasticsearch that referenced this issue May 13, 2020
This merges the code for the `significant_terms` agg into the package
for the code for the `terms` agg. They are *super* entangled already,
this mostly just admits that to ourselves.

Precondition for the terms work in elastic#56487
nik9000 added a commit that referenced this issue May 13, 2020
This merges the code for the `significant_terms` agg into the package
for the code for the `terms` agg. They are *super* entangled already,
this mostly just admits that to ourselves.

Precondition for the terms work in #56487
nik9000 added a commit to nik9000/elasticsearch that referenced this issue May 13, 2020
…6699)

This merges the code for the `significant_terms` agg into the package
for the code for the `terms` agg. They are *super* entangled already,
this mostly just admits that to ourselves.

Precondition for the terms work in elastic#56487
nik9000 added a commit that referenced this issue May 13, 2020
…56715)

This merges the code for the `significant_terms` agg into the package
for the code for the `terms` agg. They are *super* entangled already,
this mostly just admits that to ourselves.

Precondition for the terms work in #56487
nik9000 added a commit to nik9000/elasticsearch that referenced this issue May 18, 2020
When `date_histogram` is a sub-aggregator it used to allocate a bunch of
objects for every one of it's parent's buckets. This uses the data
structures that we built in elastic#55873 rework the `date_histogram`
aggregator instead of all of the allocation.

Part of elastic#56487
nik9000 added a commit that referenced this issue May 19, 2020
When `date_histogram` is a sub-aggregator it used to allocate a bunch of
objects for every one of it's parent's buckets. This uses the data
structures that we built in #55873 rework the `date_histogram`
aggregator instead of all of the allocation.

Part of #56487
nik9000 added a commit to nik9000/elasticsearch that referenced this issue May 19, 2020
When `date_histogram` is a sub-aggregator it used to allocate a bunch of
objects for every one of it's parent's buckets. This uses the data
structures that we built in elastic#55873 rework the `date_histogram`
aggregator instead of all of the allocation.

Part of elastic#56487
nik9000 added a commit that referenced this issue May 19, 2020
When `date_histogram` is a sub-aggregator it used to allocate a bunch of
objects for every one of it's parent's buckets. This uses the data
structures that we built in #55873 rework the `date_histogram`
aggregator instead of all of the allocation.

Part of #56487
nik9000 added a commit to nik9000/elasticsearch that referenced this issue May 28, 2020
This rebuilds `auto_date_histogram`'s aggregator to function without
`asMultiBucketAggregator` which should save a significant amount of
memory when it is not the top most aggregator.

It isn't possible to "just port the aggregator" without taking a pretty
significant performance hit because we used to rewrite all of the
buckets every time we switched to a coarser and coarser rounding
configuration. Without some major surgery to how to delay sub-aggs
we'd end up rewriting the delay list zillions of time if there are many
buckets.

This change replaces the constant rewrites with a "budget" of "wasted"
buckets and only rewrites all of the buckets when we exceed that budget.
Now that we don't rebucket every time we increase the rounding we can no
longer get an accurate count of the number of buckets! So instead the
aggregator uses an esimate of the number of buckets to trigger switching
to a coarser rounding. This estimate is likely to be *terrible* when
buckets are far apart compared to the rounding. So it also uses the
difference between the first and last bucket to trigger switching to a
coarser rounding. Which covers for the shortcomings of the bucket
estimation technique pretty well. It also causes the aggregator to emit
fewer buckets in cases where they'd be reduced together on the
coordinating node. This is wonderful! But probably fairly rare.

After all that, it amounts to about the same performance, in the
benchmarks that I've run. But the memory savings is totaly still at
thing!

Relates to elastic#56487
nik9000 added a commit to nik9000/elasticsearch that referenced this issue May 31, 2020
Merges the remaining implementation of `significant_terms` into `terms`
so that we can more easilly make them work properly without
`asMultiBucketAggregator` which *should* save memory and speed them up.

Relates elastic#56487
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Jun 25, 2020
`descendsFromBucketAggregator` was important before we removed
`asMultiBucketAggregator` but now that it is gone
`collectsFromSingleBucket` is good enough.

Relates to elastic#56487
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Jun 26, 2020
I've always been confused by the strange behavior that I saw when
working on elastic#57304. Specifically, I saw switching from a bimorphic
invocation to a monomorphic invocation to give us a 7%-15% performance
bump. This felt *bonkers* to me. And, it also made me wonder whether
it'd be worth looking into doing it everywhere.

It turns out that, no, it isn't needed everywhere. This benchmark shows
that a bimorphic invocation like:
```
LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

is 19% slower than a monomorphic invocation like:
```
LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

But *only* when the reference is mutable. In the example above, if
`ords` is never changed then both perform the same. But if the `ords`
reference is assigned twice then we start to see the difference:
```
immutable bimorphic    avgt   10   6.468 ± 0.045  ns/op
immutable monomorphic  avgt   10   6.756 ± 0.026  ns/op
mutable   bimorphic    avgt   10   9.741 ± 0.073  ns/op
mutable   monomorphic  avgt   10   8.190 ± 0.016  ns/op
```

So the conclusion from all this is that we've done the right thing:
`auto_date_histogram` is the only aggregation in which `ords` isn't final
and it is the only aggregation that forces monomorphic invocations. All
other aggregations use an immutable bimorphic invocation. Which is fine.

Relates to elastic#56487
nik9000 added a commit that referenced this issue Jun 26, 2020
`descendsFromBucketAggregator` was important before we removed
`asMultiBucketAggregator` but now that it is gone
`collectsFromSingleBucket` is good enough.

Relates to #56487

Co-authored-by: Elastic Machine <[email protected]>
@nik9000 nik9000 changed the title Multi-bucket aggregator wrapper is slow! Multi-bucket aggregator wrapper is slow and uses a ton of memory Jun 30, 2020
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Jul 6, 2020
nik9000 added a commit that referenced this issue Jul 6, 2020
nik9000 added a commit that referenced this issue Jul 6, 2020
nik9000 added a commit that referenced this issue Jul 6, 2020
This makes a `parentCardinality` available to every `Aggregator`'s ctor
so it can make intelligent choices about how it collects bucket values.
This replaces `collectsFromSingleBucket` and is similar to it but:
1. It supports `NONE`, `ONE`, and `MANY` values and is generally
   extensible if we decide we can use more precise counts.
2. It is more accurate. `collectsFromSingleBucket` assumed that all
   sub-aggregations live under multi-bucket aggregations. This is
   normally true but `parentCardinality` is properly carried forward
   for single bucket aggregations like `filter` and for multi-bucket
   aggregations configured in single-bucket for like `range` with a
   single range.

While I was touching every aggregation I renamed `doCreateInternal` to
`createMapped` because that seemed like a much better name and it was
right there, next to the change I was already making.

Relates to #56487
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Jul 6, 2020
This makes a `parentCardinality` available to every `Aggregator`'s ctor
so it can make intelligent choices about how it collects bucket values.
This replaces `collectsFromSingleBucket` and is similar to it but:
1. It supports `NONE`, `ONE`, and `MANY` values and is generally
   extensible if we decide we can use more precise counts.
2. It is more accurate. `collectsFromSingleBucket` assumed that all
   sub-aggregations live under multi-bucket aggregations. This is
   normally true but `parentCardinality` is properly carried forward
   for single bucket aggregations like `filter` and for multi-bucket
   aggregations configured in single-bucket for like `range` with a
   single range.

While I was touching every aggregation I renamed `doCreateInternal` to
`createMapped` because that seemed like a much better name and it was
right there, next to the change I was already making.

Relates to elastic#56487
nik9000 added a commit that referenced this issue Jul 8, 2020
This makes a `parentCardinality` available to every `Aggregator`'s ctor
so it can make intelligent choices about how it collects bucket values.
This replaces `collectsFromSingleBucket` and is similar to it but:
1. It supports `NONE`, `ONE`, and `MANY` values and is generally
   extensible if we decide we can use more precise counts.
2. It is more accurate. `collectsFromSingleBucket` assumed that all
   sub-aggregations live under multi-bucket aggregations. This is
   normally true but `parentCardinality` is properly carried forward
   for single bucket aggregations like `filter` and for multi-bucket
   aggregations configured in single-bucket for like `range` with a
   single range.

While I was touching every aggregation I renamed `doCreateInternal` to
`createMapped` because that seemed like a much better name and it was
right there, next to the change I was already making.

Relates to #56487

Co-authored-by: Elastic Machine <[email protected]>
nik9000 added a commit that referenced this issue Jul 13, 2020
I've always been confused by the strange behavior that I saw when
working on #57304. Specifically, I saw switching from a bimorphic
invocation to a monomorphic invocation to give us a 7%-15% performance
bump. This felt *bonkers* to me. And, it also made me wonder whether
it'd be worth looking into doing it everywhere.

It turns out that, no, it isn't needed everywhere. This benchmark shows
that a bimorphic invocation like:
```
LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

is 19% slower than a monomorphic invocation like:
```
LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

But *only* when the reference is mutable. In the example above, if
`ords` is never changed then both perform the same. But if the `ords`
reference is assigned twice then we start to see the difference:
```
immutable bimorphic    avgt   10   6.468 ± 0.045  ns/op
immutable monomorphic  avgt   10   6.756 ± 0.026  ns/op
mutable   bimorphic    avgt   10   9.741 ± 0.073  ns/op
mutable   monomorphic  avgt   10   8.190 ± 0.016  ns/op
```

So the conclusion from all this is that we've done the right thing:
`auto_date_histogram` is the only aggregation in which `ords` isn't final
and it is the only aggregation that forces monomorphic invocations. All
other aggregations use an immutable bimorphic invocation. Which is fine.

Relates to #56487
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Jul 13, 2020
I've always been confused by the strange behavior that I saw when
working on elastic#57304. Specifically, I saw switching from a bimorphic
invocation to a monomorphic invocation to give us a 7%-15% performance
bump. This felt *bonkers* to me. And, it also made me wonder whether
it'd be worth looking into doing it everywhere.

It turns out that, no, it isn't needed everywhere. This benchmark shows
that a bimorphic invocation like:
```
LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

is 19% slower than a monomorphic invocation like:
```
LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

But *only* when the reference is mutable. In the example above, if
`ords` is never changed then both perform the same. But if the `ords`
reference is assigned twice then we start to see the difference:
```
immutable bimorphic    avgt   10   6.468 ± 0.045  ns/op
immutable monomorphic  avgt   10   6.756 ± 0.026  ns/op
mutable   bimorphic    avgt   10   9.741 ± 0.073  ns/op
mutable   monomorphic  avgt   10   8.190 ± 0.016  ns/op
```

So the conclusion from all this is that we've done the right thing:
`auto_date_histogram` is the only aggregation in which `ords` isn't final
and it is the only aggregation that forces monomorphic invocations. All
other aggregations use an immutable bimorphic invocation. Which is fine.

Relates to elastic#56487
nik9000 added a commit that referenced this issue Jul 13, 2020
I've always been confused by the strange behavior that I saw when
working on #57304. Specifically, I saw switching from a bimorphic
invocation to a monomorphic invocation to give us a 7%-15% performance
bump. This felt *bonkers* to me. And, it also made me wonder whether
it'd be worth looking into doing it everywhere.

It turns out that, no, it isn't needed everywhere. This benchmark shows
that a bimorphic invocation like:
```
LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

is 19% slower than a monomorphic invocation like:
```
LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

But *only* when the reference is mutable. In the example above, if
`ords` is never changed then both perform the same. But if the `ords`
reference is assigned twice then we start to see the difference:
```
immutable bimorphic    avgt   10   6.468 ± 0.045  ns/op
immutable monomorphic  avgt   10   6.756 ± 0.026  ns/op
mutable   bimorphic    avgt   10   9.741 ± 0.073  ns/op
mutable   monomorphic  avgt   10   8.190 ± 0.016  ns/op
```

So the conclusion from all this is that we've done the right thing:
`auto_date_histogram` is the only aggregation in which `ords` isn't final
and it is the only aggregation that forces monomorphic invocations. All
other aggregations use an immutable bimorphic invocation. Which is fine.

Relates to #56487
@nik9000 nik9000 closed this as completed Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Meta release highlight Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

4 participants