Test flakes due to reusuable metric GC requirement #6211

jack-berg · 2024-02-08T16:00:38Z

@asafm we've had a couple of test flakes around the the reusuable metric GC test. WDYT think about relaxing the GC reduction requirement?

InstrumentGarbageCollectionBenchmarkTest > normalizedAllocationRateTest() FAILED
java.lang.AssertionError: [Aggregation temporality = DELTA, testInstrumentType = DOUBLE_LAST_VALUE]
Expecting actual:
94.63418122969631
to be close to:
97.30000305175781
by less than 2.0 but difference was 2.6658218220615.
(a difference of exactly 2.0 being considered valid)
at io.opentelemetry.sdk.metrics.internal.state.InstrumentGarbageCollectionBenchmarkTest.lambda$normalizedAllocationRateTest$2(InstrumentGarbageCollectionBenchmarkTest.java:123)
at java.base/java.util.HashMap.forEach(HashMap.java:1337)
at io.opentelemetry.sdk.metrics.internal.state.InstrumentGarbageCollectionBenchmarkTest.lambda$normalizedAllocationRateTest$3(InstrumentGarbageCollectionBenchmarkTest.java:101)
at java.base/java.util.HashMap.forEach(HashMap.java:1337)
at io.opentelemetry.sdk.metrics.internal.state.InstrumentGarbageCollectionBenchmarkTest.normalizedAllocationRateTest(InstrumentGarbageCollectionBenchmarkTest.java:93)

jack-berg · 2024-02-09T15:44:50Z

I've had more trouble with these test flakes today. The last value aggregation seems to be particularly prone to non-deterministic behavior. Will need to look into the failures closer to determine if there's anything on the metrics SDK side we can to do to make the GC allocation more reliable, or whether we should relax the test constraints.

I also think we don't need to run these tests on every entry of the test matrix. Similar to how we only run code coverage on the ubuntu / java LTS entry, we should be able to only run these expensive perf tests on a single entry as well.

asafm · 2024-02-11T11:16:54Z

I'm starting to work on this now

- Adds more leniency towards tests has less options to "mess" them and lower the allocation rate and are more prone to have variable allocation rate. - Constraints the InstrumentGarbageCollectionBenchmarkTest to run only in one specific environment out of the entire matrix

asafm · 2024-02-11T14:41:36Z

I couldn't recreate the issue locally after inspecting it and trying to run it with RepeatedTest.
I'm okay with increasing the offset for those 4 types of tests since the ability to refactor them by mistake is very constrained; hence, any change there will likely cause the allocation rate to go even lower than 93.7%.

I created #6221 to fix both issues mentioned: lower rate and make the test run only in one matrix option.

jack-berg added the Bug Something isn't working label Feb 8, 2024

asafm mentioned this issue Feb 11, 2024

Fix flaky InstrumentGarbageCollectionBenchmarkTest #6221

Merged

jack-berg closed this as completed in #6221 Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test flakes due to reusuable metric GC requirement #6211

Test flakes due to reusuable metric GC requirement #6211

jack-berg commented Feb 8, 2024

jack-berg commented Feb 9, 2024

asafm commented Feb 11, 2024

asafm commented Feb 11, 2024

Test flakes due to reusuable metric GC requirement #6211

Test flakes due to reusuable metric GC requirement #6211

Comments

jack-berg commented Feb 8, 2024

jack-berg commented Feb 9, 2024

asafm commented Feb 11, 2024

asafm commented Feb 11, 2024