-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing ES Forward Compatibility: Default CI Group #1 / APM API tests basic apm_8.0.0 Transaction groups main statistics when data is loaded returns the correct data for latency aggregation 99th percentile #161042
Comments
Pinging @elastic/apm-ui (Team:APM) |
Skipped top level suite. 7.17: 881bb7c |
@mistic It's interesting that this occurs only shortly after the fork of tdigest. One of the known impacts of that is the new percentile algo. Could this be related? |
chore(NA): upgrade babel to 7.21 on branch 7.17
@sqren interesting finding, I think it can be very well related |
It's a pretty big change: 66,836,803 vs 36,904,905. Should we flag it as a possible bug with the ES team or just let it slide and update the snapshots? |
@mistic
I see the test is skipped for 7.17 branch. [UPDATE] |
I believe @sqren suspicion is correct here. I ran the tests locally with Kibana at 7.17 and ES set once at 8.10 and once at 8.9 and both times, it gave consistently the same error. Steps performed
The tests fails. In current main, some time back i have removed the archives and added synthtrace which means it works correctly there. Not sure how to proceed here ? The only option i see is to leave this test skipped forever on 7.17 as updating archives is not an easy task. |
So APM Server uses 2 possible theories -
After discussion with @dgieselaar it seems that theory 2 is valid here. Query to retrieve percentile
Now if we add
There is still a delta in numbers now -
|
@achyutjhunjhunwala Thanks for digging into this. Let's talk about this during today's UI sync. |
This is likely due to #95903, indeed. Note that TDigest was already the default for percentiles; what changed was the internal implementation, with the new default being 2x-10x faster but less accurate. We included some details about how to fall back to the old behavior in the breaking changes documentation. Still, do we have a good idea of how much of an issue this is, in practice? How many values are processed? What are the other percentiles (50%, 75%, 90%, 99%, 100%)? |
@kkrik-es I have prepared a 8.10 stack which has this archive imported. https://p.elstc.co/paste/EjSk4xkG#ZCJ+r2lrKmM19Vwk01yvvUGVbeRdhsw5WlmuNSzwF1Q Here you can play the with data as well Regarding the numbers, on the APM side, we were only calculating P99 and P95
Looking at the above data, P95 looks quite close but P99 is almost 2X |
This looks like a fairly skewed distribution:
It seems like the newer implementation has fewer data points (expected) so the error is due to interpolating between the last and second-to-last centroid values. Nothing too extreme. |
@gbamparop @sqren @dgieselaar I had an offline chat with @kkrik-es and this seems to be expected behaviour with the TDigest change. We need to explore what we do with this test now as it needs to run on 7.17 as well forward compatibility on 8.9+. One suggestion from @kkrik-es was to rather check for ranges instead of exact values. In our case the delta is 2x. we can leave it skipped as well on 7.17 alone as this test has been re-written using synthtrace for 8.x |
As discussed with the team during the weekly call, we will keep these tests skipped on 7.17 branch due to the complexity involved. Also this failure is not linked due to any bug related to Forward Compatibility. |
APM API Integration tests (basic)
x-pack/test/apm_api_integration/tests/transactions/transactions_groups_main_statistics.ts
APM API tests basic apm_8.0.0 Transaction groups main statistics when data is loaded returns the correct data for latency aggregation 99th percentile
This failure is preventing the ES 8.10 forward compatibility pipeline to proceed.
The text was updated successfully, but these errors were encountered: