Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent return value across metric aggregations between sum, avg, max, min with min_doc_count = 0 #31887

Closed
guanghaofan opened this issue Jul 9, 2018 · 5 comments
Labels

Comments

@guanghaofan
Copy link

guanghaofan commented Jul 9, 2018

es.version: 6.3.0
description:
With min_doc_count = 0, then
Metric Type sum return value '0'
Others[min/max/avg/mean] return value 'null'
if sum returns value '0', the empty bucket data will be mixed with the valid data value '0' in Kibana visualizations, and not sure the purpose of this design?
I know the sum is double type and the initial value is set as '0' and however it would be best if the return value can be null for the empty bucket as what you did in the min/max/avg aggregation types since value '0' is a real meaningful data from the time we have the negative numbers!

thanks!

@danielmitterdorfer
Copy link
Member

Here is a minimal reproduction for the scenario described above:

curl -X PUT "localhost:9200/products/_doc/1" -H 'Content-Type: application/json' -d'
{
    "product" : "Book",
    "price" : 1.0
}
'

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range": {
      "price": {
        "gt": 2
      }
    }
  },
  "size": 0,
  "aggs": {
    "prices": {
      "terms": {
        "field": "product.keyword",
        "min_doc_count": 0
      },
      "aggs": {
        "min_price": {
          "min": {
            "field": "price"
          }
        },
        "sum_price": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
}
'

Elasticsearch 6.3.0 produces (output shortened):

{
  "aggregations" : {
    "prices" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Book",
          "doc_count" : 0,
          "min_price" : {
            "value" : null
          },
          "sum_price" : {
            "value" : 0.0
          }
        }
      ]
    }
  }
}

As I am not sure whether this is intentional, can somebody in @elastic/es-search-aggs have a look and label it accordingly please?

@jpountz
Copy link
Contributor

jpountz commented Jul 9, 2018

This is intentional: the sum of an empty set is well defined while the average is not (it is 0/0), likewise for min and max. I'd be curious to better understand what are the implications on the Kibana side.

@polyfractal
Copy link
Contributor

Just a note for posterity, value_count and cardinality also return 0 for empty sets (zero values/distinct values in the set)

@guanghaofan
Copy link
Author

@polyfractal
value_count or doc_count is really different from Agg Type sum, sum can be 0, or positive and even a negative value. But count should be at least 0 and can not be null.
Per my experience in Kibana, there's no any difference for the value '0' in the visualizations. that is, empty bucket data are mixed with valid data value '0' in the visualization. of course the Kibana developers can do somethings to differentiate the two kinds of value '0' since the filed 'doc_count' is always 0 in the empty bucket response, but still not yet. this is why I think you guys maybe need to discuss it with the Kibana developers!
@jpountz
the sum agg is really easy to handle, but I think it's a special case if empty bucket.

@polyfractal
Copy link
Contributor

value_count or doc_count is really different from Agg Type sum, sum can be 0, or positive and even a negative value. But count should be at least 0 and can not be null.

The main thing here is not the possible values that an agg can return, but rather how the agg handles the empty set / no document scenario. value_count, cardinality, doc_count and sum are all similar in this regard: if the set is empty (no documents), their values are zero.

avg is different because the denominator is the number of values collected, so an empty set is 0 / 0 which is undefined. As @jpountz mentioned, min and max are similar since you can't take the min/max of an empty set, so they are also undefined with no documents.

That's why Elasticsearch provides the values it does when the bucket is empty (doc_count == 0). There's not much else we can do... we have to rely on the consumer (kibana, user applications, etc) to interpret the data correctly.

Per my experience in Kibana, there's no any difference for the value '0' in the visualizations. that is, empty bucket data are mixed with valid data value '0' in the visualization. of course the Kibana developers can do somethings to differentiate the two kinds of value '0' since the filed 'doc_count' is always 0 in the empty bucket response, but still not yet.

Yep, this is a known issue in Kibana (elastic/kibana#13356). It looks like the Kibana team just opened an issue independently to address this yesterday (elastic/kibana#17717), so it appears there's a plan in motion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants