Fix xcontent rendering of ip terms aggs. #18003

jpountz · 2016-04-27T08:28:45Z

Currently terms on an ip address try to put their binary representation in the
json response. With this commit, they would return a formatted ip address:

      "buckets": [
        {
          "key": "192.168.1.7",
          "doc_count": 1
        }
      ]

Relates to #17971

markharwood · 2016-04-27T09:51:42Z

I see significant terms on IPs now format strings correctly with this change but this may be an irrelevant improvement - I expected an error (IPs are numerics, numerics don't have doc frequency any more). What I see in results are IPs selected on what I assume is a false significance - bg_count is reported as zero in the JSON response.

Until we adopt a solid strategy for computing background frequencies for types that don't have frequencies directly held by Lucene I thought our policy was to throw a parse error and suggest users index-as-string?

jpountz · 2016-04-27T10:04:46Z

Oh you are right, I was focused on the json rendering issue and completely missed that. I agree the significant terms aggregation should raise an exception if it cannot get the backgrond frequency rather than assuming 0. For the record, I also tested on an unindexed keyword field and it does not fail either while it should:

PUT t 
{
  "mappings": {
    "t": {
      "properties": {
        "a": {
          "type": "keyword",
          "index": false
        }
      }
    }
  }
}

PUT t/t/1
{
  "a": "abc"
}

PUT t/t/2
{
}

GET t/_search
{
  "query": {
    "exists" : {
      "field": "a"
    }
  }, 
  "aggs": {
    "a_terms": {
      "significant_terms": {
        "field": "a",
        "min_doc_count": 1
      }
    }
  }
}

jpountz · 2016-04-28T08:32:21Z

@markharwood I opened #18031 to address this issue.

jpountz · 2016-05-11T15:38:39Z

@markharwood May I merge this one now that points work with significant terms?

markharwood · 2016-05-12T11:10:55Z

Looks great but I wonder if the Kibana folks will be upset by the removal of "key_as_string" in the json?

jpountz · 2016-05-12T12:47:48Z

@epixa Do you know if not having a key_as_string in the response of the terms agg on an ip field would be an issue for Kibana? (the string representation of the ip address is directly under key)

epixa · 2016-05-12T16:06:22Z

Thanks for the heads up, I'm pretty sure it won't cause any problems with Kibana, but let me poke around a bit to make sure. At the moment, the only reference to key_as_string in Kibana is in test fixtures, which may only exist because it was copied from actual ES response data.

I'll update this PR shortly.

epixa · 2016-05-12T16:13:58Z

It looks like removing key_as_string will break monitoring in x-pack. It's also explicitly referenced from the watcher docs (slack action), so removing it will likely break ~~any~~ at least some existing watcher setups with slack.

How urgent is this? We at least have complete control over monitoring which means we could theoretically get a change introduced there as well, but breaking existing watcher setups might be a different animal entirely.

jpountz · 2016-05-12T16:28:40Z

It is not urgent at all, we are just trying to make terms aggregations work again on ip fields (which was not the case anymore since we added ipv6 support).

So I guess we can either add a key_as_string key in the response all the time, but this feels a bit weird since it will always be the same as the key (this is why a terms agg on a string field does not have a key_as_string) or we can only return a key like the PR currently does.

In that particular case, I think the latter option is fine since we need to break the output of terms aggregations on ip fields anyway (since we cannot return a number that identifies ip addresses anymore)? @clintongormley any opinions?

clintongormley · 2016-05-12T16:37:38Z

@jpountz i'm +1 on just returning key.

epixa · 2016-05-12T16:43:02Z

It doesn't seem weird to me at all that we'd return key_as_string even in situations when it is the same as key. If anything, the fact that there are some situations when it differs and some when it does not only underscores the importance of doing that to me. IMO, the burden shouldn't be on the consumer to make choices about which property to use based on the data type. I mean, it's great that we know that those values are identical in some situations, but why should a consumer have to codify that detail as well?

jpountz · 2016-05-12T17:59:31Z

@epixa I would be fine to do it all the time (but in another issue since this is a broader problem as it also affects eg. terms aggs on string fields). However I suspect there will be push back since making aggregation responses more verbose will put more load on the network (although compression would probably work very well in that case) and make parsing slower since there are more bytes to process. I am not sure how much of an issue this is in practice, but this is a recurring concern.

Currently terms on an ip address try to put their binary representation in the json response. With this commit, they would return a formatted ip address: ``` "buckets": [ { "key": "192.168.1.7", "doc_count": 1 } ] ```

epixa · 2016-05-13T14:44:27Z

This really shouldn't be merged yet... if we want to proceed with the change, fine, but I'm relatively certain monitoring in xpack is no longer going to work now. Ideally we'd at least get our own products patched up to support a change like this before we merged it.

jpountz · 2016-05-13T15:06:11Z

@epixa Terms aggregations on ip fields were broken since we added support for ipv6, so this change is making things better, not worse. Unfortunately we have to break backward compatibility of the responses anyway since we cannot return a numeric representation for ip terms anymore, and I made the response consistent with terms aggregations on a string field. I proceeded based on @clintongormley's comment and will now check how this can be addressed in monitoring.

epixa · 2016-05-13T15:52:17Z

Looks like we're OK after all, carry on!

jpountz added >bug :Analytics/Aggregations Aggregations v5.0.0-alpha3 labels Apr 27, 2016

This was referenced Apr 27, 2016

Not all features work on ip fields #17971

Closed

Terms aggregations parse failure for date and ip fields #17705

Closed

jpountz mentioned this pull request May 4, 2016

Make significant terms work on fields that are indexed with points. #18031

Merged

Fix xcontent rendering of ip terms aggs. elastic#18003

61b1f4a

Currently terms on an ip address try to put their binary representation in the json response. With this commit, they would return a formatted ip address: ``` "buckets": [ { "key": "192.168.1.7", "doc_count": 1 } ] ```

jpountz force-pushed the fix/ip_terms_xcontent branch from fe57bcb to 61b1f4a Compare May 13, 2016 12:59

jpountz merged commit 61b1f4a into elastic:master May 13, 2016

jpountz deleted the fix/ip_terms_xcontent branch May 13, 2016 14:46

benwtrent mentioned this pull request Jan 3, 2020

composite and terms aggregations behave differently on ip fields #50600

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix xcontent rendering of ip terms aggs. #18003

Fix xcontent rendering of ip terms aggs. #18003

jpountz commented Apr 27, 2016 •

edited by clintongormley

Loading

markharwood commented Apr 27, 2016

jpountz commented Apr 27, 2016

jpountz commented Apr 28, 2016

jpountz commented May 11, 2016

markharwood commented May 12, 2016

jpountz commented May 12, 2016

epixa commented May 12, 2016

epixa commented May 12, 2016 •

edited

Loading

jpountz commented May 12, 2016

clintongormley commented May 12, 2016

epixa commented May 12, 2016

jpountz commented May 12, 2016

epixa commented May 13, 2016

jpountz commented May 13, 2016

epixa commented May 13, 2016

Fix xcontent rendering of ip terms aggs. #18003

Fix xcontent rendering of ip terms aggs. #18003

Conversation

jpountz commented Apr 27, 2016 • edited by clintongormley Loading

markharwood commented Apr 27, 2016

jpountz commented Apr 27, 2016

jpountz commented Apr 28, 2016

jpountz commented May 11, 2016

markharwood commented May 12, 2016

jpountz commented May 12, 2016

epixa commented May 12, 2016

epixa commented May 12, 2016 • edited Loading

jpountz commented May 12, 2016

clintongormley commented May 12, 2016

epixa commented May 12, 2016

jpountz commented May 12, 2016

epixa commented May 13, 2016

jpountz commented May 13, 2016

epixa commented May 13, 2016

jpountz commented Apr 27, 2016 •

edited by clintongormley

Loading

epixa commented May 12, 2016 •

edited

Loading