You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug #3750 fixed the issue for distributed publishing of metrics. For StepRegistries, this implementation took an approach of scheduling a thread to poll the meters at the start of the minute and exporting at a different point in time. An unexpected outcome of this fix is that DeltaHistogram implementation of SignalFx registry is broken. Since the polling thread will call the count() on Timer at the start of the minute, the count and total are rolled over for the StepWindow. But the actual reading (takeSnapShot()) of the timer can happen at any point in time between the start of the step and 80% of the step (maybe even later if batching is involved). This breaks the Delta Histogram Implementation of the SignalFx registry.
Environment
All
Micrometer version 1.11.0
Micrometer registry - SignalFx
To Reproduce
How to reproduce the bug:
Set-Up:
Create a SIgnalFx Meter Registry with a Step of 60 seconds (any step would do for that matter. The 60s is chosen for explaining the behavior).
The poll service rollover count at the 0th second.
The scheduled publish happens on the 20th second when the metric is published the count will be a snapshot at the 0th second whereas the Histogram count will be a snapshot at the 20th second. So, for the first publication, the histogram count will be (count+20).
In real-case scenarios, the data recorded will not be uniform, so this Histogram will report a different snapshot that is misaligned with the count and total
Expected behavior
Irrespective of the publishing time, the count metric and the count of histogram buckets should match.
The text was updated successfully, but these errors were encountered:
Though I agree with the good intentions of #3750, to just call count() on Timers/Summaries, probably the pollMetersToRollover() can be protected so that registries can define it. (Similar to publish, where to publish is scheduled by the core but implemented by registries. Maybe in this case, pollMetersToRollover() can have a default implementation but with an option for registries to override them).
But that also won't solve the problem that the SignalFx registry has because of its implementation. #3749 has a completely StepAligned Histogram but it is still in its early phase. But what is needed for this might be a StepAligned Histogram Implementation.
lenin-jaganathan
changed the title
DeltaHistogram in SignalFx registry reports incorrect information
DeltaHistogram in SignalFx registry doesn't align with count and total
Apr 17, 2023
Describe the bug
#3750 fixed the issue for distributed publishing of metrics. For StepRegistries, this implementation took an approach of scheduling a thread to poll the meters at the start of the minute and exporting at a different point in time. An unexpected outcome of this fix is that DeltaHistogram implementation of SignalFx registry is broken. Since the polling thread will call the count() on Timer at the start of the minute, the count and total are rolled over for the StepWindow. But the actual reading (
takeSnapShot()
) of the timer can happen at any point in time between the start of the step and 80% of the step (maybe even later if batching is involved). This breaks the Delta Histogram Implementation of the SignalFx registry.Environment
AllTo Reproduce
How to reproduce the bug:
Set-Up:
Scenario:
Assume that the https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/push/PushMeterRegistry.java#L129-L138 returns 20th second. So, publishing will happen on the 20th second.
In real-case scenarios, the data recorded will not be uniform, so this Histogram will report a different snapshot that is misaligned with the count and total
Expected behavior
Irrespective of the publishing time, the count metric and the count of histogram buckets should match.
The text was updated successfully, but these errors were encountered: