Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM on date_histogram with small interval #72619

Open
nik9000 opened this issue May 3, 2021 · 12 comments
Open

OOM on date_histogram with small interval #72619

nik9000 opened this issue May 3, 2021 · 12 comments
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@nik9000
Copy link
Member

nik9000 commented May 3, 2021

I recently merged #72081 which protects against OOM in the reduce phase for date_histograms. A discuss user reported having a similar issue, but they seem to have it on the data nodes while building results.

Elasticsearch version (bin/elasticsearch --version): Reported on 7.9 - @nik9000 thinks it should be possible to reproduce against master

Steps to reproduce:

We just got the stack trace in the linked discuss issue. Looks like they have a wide range and tight interval. They have min_doc_count set to 0 but I don't think that matters too much here. I'd try setting one second bins a hundred thousand docs all in a different second. The trick, I think, is not to run out of memory when collecting the agg - we have protections there - but to run out of memory when building bloaty result objects we send back to the coordinating node.

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 3, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@sag-tobias-frey
Copy link
Contributor

Bash script to reproduce the issue:

curl -H 'Content-type: application/json' -XPUT 'http://localhost:9200/test_index2' -d '{}'
curl -H 'Content-type: application/json' -XPUT 'http://localhost:9200/test_index2/_mapping' -d '{
 "properties": {
  "nested": {
   "type": "nested"
  }
 }
}'

for i in {0..1000}
do
  dateValue=$(($i * 100))
  date_end=$(($i * 100000))
  curl -H 'Content-type: application/json' -XPOST 'http://localhost:9200/test_index2/_doc' -d "{
	\"value\": \"A\",
	\"textValue\": $dateValue,
	\"nested\": [{
		\"date\": 0
	}, {
		\"date\": $date_end
	}]
  }"
done

sleep 5s

curl -H 'Content-type: application/json' -XPOST 'http://localhost:9200/test_index2/_search' -d '{
 "size": 0,
 "query": {
  "bool": {
   "adjust_pure_negative": true,
   "boost": 1
  }
 },
 "aggregations": {
  "agg1": {
   "terms": {
    "field": "textValue",
    "size": 2147483647,
    "min_doc_count": 0
   },
   "aggregations": {
    "agg2": {
     "terms": {
      "field": "textValue",
      "size": 2147483647,
      "min_doc_count": 0
     },
     "aggregations": {
      "activities": {
       "nested": {
        "path": "nested"
       },
       "aggregations": {
        "dateHistogram": {
         "date_histogram": {
          "field": "nested.date",
          "calendar_interval": "1M",
          "offset": 0,
          "order": {
           "_key": "asc"
          },
          "keyed": false,
          "min_doc_count": 0
         }
        }
       }
      }
     }
    }
   }
  }
 }
}'

@nik9000
Copy link
Member Author

nik9000 commented May 4, 2021

Thanks @sag-tobias-frey. Its interesting that you got there with nested! I hadn't realized that might be in the mix. Fun times.

@sag-tobias-frey
Copy link
Contributor

I have managed to get there even without nested:

curl -H 'Content-type: application/json' -XPUT 'http://localhost:9200/test_index_3' -d '{}'
curl -H 'Content-type: application/json' -XPUT 'http://localhost:9200/test_index_3/_mapping' -d '{
 "properties": {
  "nested": {
   "type": "nested"
  }
 }
}'

for i in {0..1000}
do
  dateValue=$(($i * 100))
  date_end=$(($i * 100000))
  curl -H 'Content-type: application/json' -XPOST 'http://localhost:9200/test_index_3/_doc' -d "{
	\"value\": \"A\",
	\"textValue\": $dateValue,
	\"date\": $date_end,
	\"nested\": [{
		\"date\": 0
	}, {
		\"date\": $date_end
	}]
  }"
done

sleep 5s

curl -H 'Content-type: application/json' -XPOST 'http://localhost:9200/test_index_3/_search' -d '{
  "size": 0,
  "query": {
    "bool": {
      "adjust_pure_negative": true,
      "boost": 1
    }
  },
  "aggregations": {
    "agg1": {
      "terms": {
        "field": "textValue",
        "size": 1001,
        "min_doc_count": 0
      },
      "aggregations": {
        "agg2": {
          "terms": {
            "field": "textValue",
            "size": 1001,
            "min_doc_count": 0
          },
          "aggregations": {
            "dateHistogram": {
              "date_histogram": {
                "field": "date",
                "calendar_interval": "1M",
                "offset": 0,
                "order": {
                  "_key": "asc"
                },
                "keyed": false,
                "min_doc_count": 0
              }
            }
          }
        }
      }
    }
  }
}'

@salvatore-campagna
Copy link
Contributor

salvatore-campagna commented May 23, 2022

I executed some test (with 0.5GB heap) running the query against the current master branch and I see that we still have an OOM but not the one described originally . If my understanding is correct the original OOM happened on the coordinator and has been fixed #72081. That patch makes sure we hit the circuit breaker on the coordinator before the OOM takes place.

Right now, anyway, I see something different and my understanding is that the OOM is happening on the data node.
To be more precise I see the counter introduced by #72018 is never hit because the issue happens before the reduce operation is done.

What I see is that the method BucketsAggregator#buildAggregationsForVariableBuckets is called with an array long[] owningBucketOrds whose size if 1_002_001 (so slightly more than 1M entries) which results in creating several objects of type InternalDateHistogram (the result of running a date histogram aggregation).

According to the heapdump these objects (InternalDateHistogram, LongTerms.Bucket, InternalNested, InternalDateHistogram.EmptyBucketInfo) take more than 40% of the heap.

Increasing the heap to 2GB I get the following response (no OOM)

{
  "error": {
    "root_cause": [],
    "type": "search_phase_execution_exception",
    "reason": "",
    "phase": "fetch",
    "grouped": true,
    "failed_shards": [],
    "caused_by": {
      "type": "too_many_buckets_exception",
      "reason": "Trying to create too many buckets. Must be less than or equal to: [65536] but this number of buckets was exceeded. This limit can be set by changing the [search.max_buckets] cluster level setting.",
      "max_buckets": 65536
    }
  },
  "status": 503
}

NOTE: the original query is actually calculating a cross product on the textValue field as a result of having two nested (numeric) terms queries and a nested date histogram. There are 1000 distinct numeric terms which considering the cross product results in more than 1M buckets. Each of these buckets holds the result for a data histogram. So, in the end the query is returning 1M (empty) date histograms.

@sag-tobias-frey
Copy link
Contributor

NOTE: the original query is actually calculating a cross product on the textValue field. There are 1000 distinct numeric terms which considering the cross product results in more than 1M buckets. Each of these buckets holds the result for a data histogram. So, in the end the query is returning 1M (empty) date histograms.

However, shouldn't the number of resulting buckets from the cross product of textValue be 1000 because that is the same field which always has the same value? In the end, the date histogram pushes it over the max buckets anyway but the first aggregations should be fine.

Have you tried sending multiple of these requests in parallel against the 2GB cluster? We noticed that we have to be careful with these kind of aggregation / distributions when we have parallel requests because then the OOM error might still occur with more heap occur because the circuit breaker does not detect it early enough.

@salvatore-campagna
Copy link
Contributor

salvatore-campagna commented May 23, 2022

NOTE: the original query is actually calculating a cross product on the textValue field. There are 1000 distinct numeric terms which considering the cross product results in more than 1M buckets. Each of these buckets holds the result for a data histogram. So, in the end the query is returning 1M (empty) date histograms.

However, shouldn't the number of resulting buckets from the cross product of textValue be 1000 because that is the same field which always has the same value? In the end, the date histogram pushes it over the max buckets anyway but the first aggregations should be fine.

Have you tried sending multiple of these requests in parallel against the 2GB cluster? We noticed that we have to be careful with these kind of aggregation / distributions when we have parallel requests because then the OOM error might still occur with more heap occur because the circuit breaker does not detect it early enough.

Regarding the cross product my understanding is different. If we have a three documents each with textValue equal to 1, 2, 3 the result will be something like

(1, 1, date_histo)
(1, 2, date_histo)
(1, 3, date_histo)
...
(3, 2, date_histo)
(3, 3, date_histo)

Extending it to 1000 distinct values results in 1M buckets.

Anyway, yes the problem is that the circuit breaker is not firing but I think this is not happening because the creation of objects like InternalDateHistogram is not accounted for...or is not checked early enough.

@sag-tobias-frey
Copy link
Contributor

sag-tobias-frey commented May 24, 2022

Removed

@salvatore-campagna
Copy link
Contributor

I had a discussion with the team about this issue and the agreement is that it needs to be addresses by the following two issues:

  • Make sure all significant memory usage in aggs are tracked in BigArrays #59892: most of our aggregations use data structures like lists and arrays which are not tracked using the BigArrays abstraction. As a result of this, we use some memory without accounting for it and this causes OOMs to happen before a circuit breaker fires.
  • Dense representation for aggs #77449: this is more a consequence since different data structures which are not accounted for in memory consumption are also serialised to the wire format. As a result, we need to come up with a compact representation to avoid large network traffic.

@salvatore-campagna
Copy link
Contributor

salvatore-campagna commented Jun 6, 2022

The list of objects taking more space is the following:

  • byte[]
  • InternalDateHistogram
  • BucketsAggregator$1
  • LongTerms$Bucket
  • InternalAggregations
  • InternalNested
  • InternalDateHistogram$EmptyBucketInfo

Attaching a script which triggers the issue (test.txt).
NOTE: you need to run Elasticsearch with a small heap to see the OOM. I used 512M.

Screenshot 2022-06-06 at 11 01 34
test.txt

@nik9000
Copy link
Member Author

nik9000 commented Jun 6, 2022

Just an update for posterity/those following along at home - this is mostly #77449 - our response objects are super wasteful and sometimes allocate so quickly the real memory breaker doesn't catch them. A dense representation would save us here. And help lots of other things.

In the short run, I expect we could save some heap by reworking how EmptyBucketInfo works. But I think cutting aggs over to a dense representation is probably a good thing anyway.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

7 participants