-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix xcontent rendering of ip terms aggs. #18003
Conversation
I see significant terms on IPs now format strings correctly with this change but this may be an irrelevant improvement - I expected an error (IPs are numerics, numerics don't have doc frequency any more). What I see in results are IPs selected on what I assume is a false significance - bg_count is reported as zero in the JSON response. Until we adopt a solid strategy for computing background frequencies for types that don't have frequencies directly held by Lucene I thought our policy was to throw a parse error and suggest users index-as-string? |
Oh you are right, I was focused on the json rendering issue and completely missed that. I agree the significant terms aggregation should raise an exception if it cannot get the backgrond frequency rather than assuming 0. For the record, I also tested on an unindexed keyword field and it does not fail either while it should:
|
@markharwood I opened #18031 to address this issue. |
@markharwood May I merge this one now that points work with significant terms? |
Looks great but I wonder if the Kibana folks will be upset by the removal of "key_as_string" in the json? |
@epixa Do you know if not having a |
Thanks for the heads up, I'm pretty sure it won't cause any problems with Kibana, but let me poke around a bit to make sure. At the moment, the only reference to I'll update this PR shortly. |
It looks like removing How urgent is this? We at least have complete control over monitoring which means we could theoretically get a change introduced there as well, but breaking existing watcher setups might be a different animal entirely. |
It is not urgent at all, we are just trying to make terms aggregations work again on ip fields (which was not the case anymore since we added ipv6 support). So I guess we can either add a In that particular case, I think the latter option is fine since we need to break the output of terms aggregations on ip fields anyway (since we cannot return a number that identifies ip addresses anymore)? @clintongormley any opinions? |
@jpountz i'm +1 on just returning |
It doesn't seem weird to me at all that we'd return |
@epixa I would be fine to do it all the time (but in another issue since this is a broader problem as it also affects eg. terms aggs on string fields). However I suspect there will be push back since making aggregation responses more verbose will put more load on the network (although compression would probably work very well in that case) and make parsing slower since there are more bytes to process. I am not sure how much of an issue this is in practice, but this is a recurring concern. |
Currently terms on an ip address try to put their binary representation in the json response. With this commit, they would return a formatted ip address: ``` "buckets": [ { "key": "192.168.1.7", "doc_count": 1 } ] ```
fe57bcb
to
61b1f4a
Compare
This really shouldn't be merged yet... if we want to proceed with the change, fine, but I'm relatively certain monitoring in xpack is no longer going to work now. Ideally we'd at least get our own products patched up to support a change like this before we merged it. |
@epixa Terms aggregations on ip fields were broken since we added support for ipv6, so this change is making things better, not worse. Unfortunately we have to break backward compatibility of the responses anyway since we cannot return a numeric representation for ip terms anymore, and I made the response consistent with terms aggregations on a string field. I proceeded based on @clintongormley's comment and will now check how this can be addressed in monitoring. |
Looks like we're OK after all, carry on! |
Currently terms on an ip address try to put their binary representation in the
json response. With this commit, they would return a formatted ip address:
Relates to #17971