Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telegraf calling lsb_release resulting in indefinite hang #1215

Closed
djahandarie opened this issue May 17, 2016 · 1 comment · Fixed by #1241
Closed

telegraf calling lsb_release resulting in indefinite hang #1215

djahandarie opened this issue May 17, 2016 · 1 comment · Fixed by #1241
Labels
bug unexpected problem or unintended behavior

Comments

@djahandarie
Copy link

I noticed that one of my telegraf nodes (0.13.0 on Debian Wheezy amd64) stopped reporting for almost 20 minutes.

$ ps aux | grep teleg
telegraf 14537  4.2  0.9 221204 16528 ?        Sl   May15 167:29 /usr/bin/telegraf -pidfile /var/run/telegraf/telegraf.pid -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
telegraf 19039 97.8  0.2  22032  3832 ?        R    19:53  46:02 /usr/bin/python /usr/bin/lsb_release

As you can see in the above output, lsb_release had been running for quite some time, which seemed suspicious. So I killed it, and once I did, the box began reporting again. However, since it was stuck on lsb_release, none of the other measurements had been happening, so I had actually lost 20 minutes worth of data (as opposed to it getting buffered somewhere).

I'm not entirely sure why telegraf was calling lsb_release (I certainly don't explicitly do it anywhere), and am curious if there is someway to put a timeout on it.

Here's my config:

[agent]
  interval = "2s"
  round_interval = true
  metric_buffer_limit = 10000
  flush_buffer_when_full = true
  collection_jitter = "0s"
  flush_interval = "2s"
  flush_jitter = "0s"
  debug = false
  quiet = true
  hostname = "[omitted]"

[[outputs.influxdb]]
  urls = ["http://[omitted]"]
  database = "telegraf"
  retention_policy = "default"
  precision = "s"
  timeout = "5s"

[[inputs.cpu]]
  percpu = false
  totalcpu = true
  fielddrop = ["time_*"]

[[inputs.mem]]
  fieldpass = ["used", "available", "free"]

[[inputs.system]]
  fieldpass = ["load1"]

[[inputs.disk]]
  fieldpass = ["used_percent"]

[[inputs.procstat]]
  exe = "[omitted]"
  fielddrop = ["cpu_time_*"]

[[inputs.procstat]]
  pattern = "[omitted]"
  fielddrop = ["cpu_time_*"]

[[inputs.exec]]
  commands = ["[omitted]"]
  data_format = "influx"

[[inputs.exec]]
  commands = ["[omitted]"]
  data_format = "influx"

[[inputs.exec]]
  commands = ["[omitted]"]
  data_format = "influx"

[[inputs.exec]]
  commands = ["[omitted]"]
  data_format = "influx"
@sparrc
Copy link
Contributor

sparrc commented May 18, 2016

the system plugin gathers some host information via gopsutil, which calls lsb_release if it's available.

I opened an issue on the gopsutil repo for this: shirou/gopsutil#201.

I will leave this issue opened until that is resolved and the telegraf dependency is updated.

@sparrc sparrc closed this as completed May 18, 2016
@sparrc sparrc reopened this May 18, 2016
@sparrc sparrc added the bug unexpected problem or unintended behavior label May 18, 2016
This was referenced May 19, 2016
sparrc added a commit that referenced this issue May 21, 2016
sparrc added a commit that referenced this issue May 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants