Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade t-digest and HdrHistogram library versions #92767

Conversation

salvatore-campagna
Copy link
Contributor

Upgrade both t-digest and HdrHistogram libraries to the latest version.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2023

Documentation preview:

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 9, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @salvatore-campagna, I've created a changelog YAML for you.

@not-napoleon
Copy link
Member

We haven't upgraded to t-digest 3.3 because it changed the way it calculates the second quartile. For an even number of data points, it returns the (N/2)th element instead of an average between the (N/2-1)th and (N/2)th elements. This is different from what t-digest 3.2 did, as you can see from the changes you've had to make to the boxplot tests. We've considered this breaking, as it will cause aggregations to return different results than before the change.

@not-napoleon
Copy link
Member

I left a note to that effect (although apparently lacking in detail) on the upgrade ticket: #73928

@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Jan 10, 2023

@not-napoleon yes I saw that but I am not convinced by the fact that this change is a breaking change. I mean, if the quantile (or percentile) is an estimation of the actual value....why should we consider that a breaking change?

Moreover, why V7 compatibility test check the actual values returned? Should't compatibility be something about endpoints, parameters, formats...instead of the actual values returned? Especially considering that percentiles and quantiles are estimated?

Also, most tests do calculations using a very low number of samples that probably makes results not meaningful. Am I wrong?

@rjernst rjernst removed the v8.7.0 label Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >upgrade v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants