telegraf-1.9.3/1.9.4 triggering counter resets #5431
Labels
area/prometheus
bug
unexpected problem or unintended behavior
regression
something that used to work, but is now broken
Milestone
Relevant telegraf.conf:
/etc/telegraf.conf
:/etc/telegraf/telegraf.d/default_inputs.conf
:/etc/telegraf/telegraf.d/default_outputs.conf
:System info:
CentOS7, telegraf 1.9.4.
Steps to reproduce:
Unclear
Expected behavior:
There are no counter reset events in the datastream
Actual behavior:
There are unexpected counter reset events in the datastream
Additional info:
I have been tracking an issue where our monitoring graph suddenly shows large spikes, which started happening after upgrading from telegraf-1.9.1 to telegraf-1.9.4.
I have been running tcpdump traces to capture the traffic between prometheus and telegraf on one specific server which is exhibiting these symptoms, and am seeing counter reset events over the wire when there has been no reset. E.g. for the system_uptime value -- I saw the following extracted values from a tcpdump of the http requests:
As all of these values were for the same host, the 211245 uptime being received by prometheus after 211255 uptime has triggered a counter reset. Analysis of scrape durations on prometheus can find no instances where these exceeded our 10s scrape time.
I have been trying multiple versions to attempt to bisect the version introducing this issue, as this server had been running telegraf-1.9.1 for several weeks which was stable, the issue has only occurred since upgrading to telegraf-1.9.4, but downgrading to telegraf-1.9.2 also seemed to resolve the issue. telegraf-1.9.3 is definitely exhibiting the same issues as telegraf-1.9.4, as such I believe its been introduced with telegraf-1.9.3 and is still present in telegraf-1.9.4.
I'm seeing this counter reset across a wide variety of metrics, but not in any consistent manner so unfortunately its proving difficult to reproduce so any help would be appreciated.
The text was updated successfully, but these errors were encountered: