Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.2.1] slice bounds out of range #2488

Closed
rossmcdonald opened this issue Mar 3, 2017 · 7 comments
Closed

[1.2.1] slice bounds out of range #2488

rossmcdonald opened this issue Mar 3, 2017 · 7 comments
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@rossmcdonald
Copy link
Contributor

rossmcdonald commented Mar 3, 2017

Bug report

Telegraf v1.2.1 (git: release-1.2 3b6ffb344e5c03c1595d862282a6823ecb438cff) 

Relevant telegraf.conf:

[agent]
collection_jitter = "0s"
debug = true
flush_buffer_when_full = true
flush_interval = "30s"
flush_jitter = "30s"
hostname = "hostname"
interval = "10s"
metric_buffer_limit = 10000
round_interval = true
quiet = false

[inputs]

[inputs.netstat]

[inputs.processes]

[inputs.tcp_listener]
allowed_pending_messages = 10000
max_tcp_connections = 250
data_format = "influx"
service_address = ":8090"

[outputs]

[outputs.influxdb]
database = "telegraf"
precision = "s"
urls = ["https://mydbhost:8086"]

[cpu]
drop = ["cpu_time"]
percpu = true
totalcpu = true

[disk]

[io]

[mem]

[swap]

[system]

Steps to reproduce:

Seeing this panic fairly regularly:

panic: runtime error: slice bounds out of range

goroutine 254475 [running]:
panic(0xf2b720, 0xc4200100b0)
#011/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/influxdata/telegraf/metric.(*metric).Fields(0xc4205b2480, 0xc42034a3c0)
#011/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/metric/metric.go:271 +0x422
github.com/influxdata/telegraf/metric.(*metric).Point(0xc4205b2480, 0xc42092e9b0)
#011/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/metric/metric.go:141 +0x80
github.com/influxdata/telegraf/plugins/outputs/influxdb.(*InfluxDB).Write(0xc4201b8100, 0xc420257b00, 0x42, 0x42, 0xc4205bb658, 0x6b3f41)
#011/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/outputs/influxdb/influxdb.go:194 +0x154
github.com/influxdata/telegraf/internal/models.(*RunningOutput).write(0xc42010a000, 0xc420257b00, 0x42, 0x42, 0x42, 0x60f612)
#011/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/internal/models/running_output.go:173 +0xa1
github.com/influxdata/telegraf/internal/models.(*RunningOutput).Write(0xc42010a000, 0x1157720, 0xc4204649f0)
#011/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/internal/models/running_output.go:157 +0x49c
github.com/influxdata/telegraf/agent.(*Agent).flush.func1(0xc4204649f0, 0xc42010a000)
#011/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:232 +0x68
created by github.com/influxdata/telegraf/agent.(*Agent).flush
#011/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:237 +0xb4

Let me know what other information is needed!

@sparrc
Copy link
Contributor

sparrc commented Mar 6, 2017

It seems that this must be coming from the tcp_listener input plugin. For some reason there are metrics which are parsing correctly via tcp_listener, but then are actually invalid metrics that can't get translated to an InfluxDB point.

I think it's likely that this is fixed in 1.3, because the function that is panicking doesn't exist anymore, so it shouldn't panic but should instead raise an error when the metric gets written to InfluxDB (and will subsequently just be dropped).

Would be good to figure out what the problem metrics are, so we can then write a unit-test that will catch it.

@danielnelson danielnelson added the bug unexpected problem or unintended behavior label Mar 15, 2017
@danielnelson danielnelson added this to the 1.3.0 milestone Mar 15, 2017
@miniskipper
Copy link

My telegraf also crashes repeatedly using the following config:

[global_tags]
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = false
  logfile = ""
  hostname = ""
  omit_hostname = false
[[outputs.influxdb]]
  urls = ["http://HOST:8086"] # required
  database = "DB" # required
  retention_policy = ""
  write_consistency = "any"
  timeout = "5s"
  username = "USER"
  password = "PASS"
 [[inputs.logparser]]
   files = ["/var/log/httpd/*access_log"]
   from_beginning = false
   [inputs.logparser.grok]
     patterns = ["%{CUSTOM_LOG_FORMAT}"]
     measurement = "apache_access_log"
     custom_patterns = '''
     CUSTOM_LOG_FORMAT %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequ
est})" %{NUMBER:response} (?:%{NUMBER:bytes:int}|-) %{NUMBER:responsetime:int} us %{QS:Referrer} %{QS:Agent}
     '''

log output:

panic: runtime error: slice bounds out of range

goroutine 32 [running]:
panic(0xf2b720, 0xc4200100d0)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/influxdata/telegraf/metric.(*metric).Fields(0xc420094800, 0xc4206279c0)
/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/metric/metric.go:279 +0x5dd
github.com/influxdata/telegraf/plugins/inputs/logparser.(*LogParserPlugin).parser(0xc42014a090)
/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/logparser/logparser.go:205 +0x250
created by github.com/influxdata/telegraf/plugins/inputs/logparser.(*LogParserPlugin).Start
/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/logparser/logparser.go:131 +0x62d

@danielnelson
Copy link
Contributor

@miniskipper I think your issue is unrelated, can you open it as a new issue and also include some sample logs and, ideally, try to find a sample log that reproduces the crash.

@gaving
Copy link

gaving commented Apr 3, 2017

Similar issue here when loading in an apache access_log w/ logparser:-

Telegraf v1.2.1 (git: release-1.2 3b6ffb344e5c03c1595d862282a6823ecb438cff)

[[inputs.logparser]]
  ## file(s) to tail:
  files = ["/tmp/input.log"]
  from_beginning = true
  name_override = "test_metric"

  ## For parsing logstash-style "grok" patterns:
  [inputs.logparser.grok]
    patterns = ["%{COMMON_LOG_FORMAT}"]

[[outputs.file]]
  ## Files to write to, "stdout" is a specially handled file.
  files = ["stdout", "/tmp/output.log"]
  data_format = "influx"


[[outputs.influxdb]]

  ## The full HTTP or UDP endpoint URL for your InfluxDB instance.
  urls = ["http://influxdb:8086"] # required
  ## The target database for metrics (telegraf will create it if not exists).
  database = "telegraf" # required
  ## Write timeout (for the InfluxDB client), formatted as a string.
  timeout = "5s"

panic: runtime error: slice bounds out of range

goroutine 14 [running]:
panic(0xf2b720, 0xc42000c0b0)
        /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/influxdata/telegraf/metric.(*metric).Fields(0xc4207e8680, 0xc42091e4e0)
        /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/metric/metric.go:279 +0x5dd
github.com/influxdata/telegraf/plugins/inputs/logparser.(*LogParserPlugin).parser(0xc42007a120)
        /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/logparser/logparser.go:205 +0x250
created by github.com/influxdata/telegraf/plugins/inputs/logparser.(*LogParserPlugin).Start
        /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/logparser/logparser.go:131 +0x62d

@danielnelson
Copy link
Contributor

Closing, fixed in 1.3

@itdevon
Copy link

itdevon commented Jun 27, 2017

Is there a workaround until 1.3 comes out? The influx service keeps crashing with the same runtime error.

@danielnelson
Copy link
Contributor

1.3 is out :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

6 participants