Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JTI Plugin - TLS Connectivity issue #4899

Closed
mohsin106 opened this issue Oct 22, 2018 · 16 comments
Closed

JTI Plugin - TLS Connectivity issue #4899

mohsin106 opened this issue Oct 22, 2018 · 16 comments

Comments

@mohsin106
Copy link

mohsin106 commented Oct 22, 2018

Hi, I'm seeing the following error when trying to implement a TLS session to my Juniper Router:

Could not initiate login check for 10.10.10.10:32767

Telegraf.conf:

[[inputs.jti_openconfig_telemetry]]
      servers = ["10.10.10.10:32767"]
      sample_frequency = "10000ms"
      username = "admin"
      password = "secure"
      client_id = "test"

      sensors = [
       "interfaces /junos/system/linecard/interface/",
      ]
      str_as_tags = false

      tls_cert = "/etc/telegraf/test-cert.pem"

If I comment out the username/password then I see the normal logs but it just sits there:

2018/10/22 15:19:28 I! Using config file: /etc/telegraf/telegraf.conf
2018-10-22T15:19:28Z I! Starting Telegraf 1.8.1
2018-10-22T15:19:28Z I! Loaded inputs: inputs.jti_openconfig_telemetry
2018-10-22T15:19:28Z I! Loaded aggregators: 
2018-10-22T15:19:28Z I! Loaded processors: 
2018-10-22T15:19:28Z I! Loaded outputs: influxdb
2018-10-22T15:19:28Z I! Tags enabled: host=44ff22066c46
2018-10-22T15:19:28Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"44ff22066c46", Flush Interval:5s 

I also tried putting "https://10.10.10.10:32767" in server field inside the inputs sections but that did not help either.

Thank you.

@glinton
Copy link
Contributor

glinton commented Oct 22, 2018

Can you try again with --debug set, and paste if any new output?

@mohsin106
Copy link
Author

mohsin106 commented Oct 22, 2018

For troubleshooting purposes we disabled TLS and confirmed we can connect to the router with Telegraf. We put the port back to 50051 and re-enabled TLS and below is the debug output from Telegraf. (We commented out the username and password from the telegraf.conf file.)

bash-4.3# telegraf --debug set
2018/10/22 19:26:24 I! Using config file: /etc/telegraf/telegraf.conf
2018-10-22T19:26:24Z D! Attempting connection to output: influxdb
2018-10-22T19:26:24Z D! Successfully connected to output: influxdb
2018-10-22T19:26:24Z I! Starting Telegraf 1.8.1
2018-10-22T19:26:24Z I! Loaded inputs: inputs.jti_openconfig_telemetry
2018-10-22T19:26:24Z I! Loaded aggregators: 
2018-10-22T19:26:24Z I! Loaded processors: 
2018-10-22T19:26:24Z I! Loaded outputs: influxdb
2018-10-22T19:26:24Z I! Tags enabled: host=c3cb65066c46
2018-10-22T19:26:24Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"c3cb65066c46", Flush Interval:5s 
2018-10-22T19:26:30Z D! Opened a new gRPC session to 10.10.10.10 on port 50051
2018-10-22T19:26:30Z E! **Error in plugin** [inputs.jti_openconfig_telemetry]: E! Failed to read from rpc error: code = Unavailable desc = transport is closing: 10.10.10.10
2018-10-22T19:26:30Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:31Z E! Error in plugin [inputs.jti_openconfig_telemetry]: E! Failed to read from rpc error: code = Unavailable desc = transport is closing: 10.10.10.10
2018-10-22T19:26:31Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:32Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:33Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:34Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:35Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-22T19:26:35Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:36Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:37Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:38Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:39Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:40Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-22T19:26:40Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:41Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:42Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:43Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:44Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:45Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-22T19:26:45Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:46Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:47Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:48Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:49Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-22T19:26:50Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-22T19:26:50Z D! Retrying 10.10.10.10 with timeout 1s
^C2018-10-22T19:26:50Z I! Hang on, flushing any cached metrics before shutdown
2018-10-22T19:26:50Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-22T19:26:51Z E! Error in plugin [inputs.jti_openconfig_telemetry]: E! Could not subscribe to 10.10.10.10: rpc error: code = Canceled desc = grpc: the client connection is closing

@mohsin106
Copy link
Author

mohsin106 commented Oct 22, 2018

When we uncomment out the username and password file this is the output we get:

bash-4.3# telegraf --debug set
2018/10/22 19:32:53 I! Using config file: /etc/telegraf/telegraf.conf
2018-10-22T19:32:53Z D! Attempting connection to output: influxdb
2018-10-22T19:32:53Z D! Successfully connected to output: influxdb
2018-10-22T19:32:53Z I! Starting Telegraf 1.8.1
2018-10-22T19:32:53Z I! Loaded inputs: inputs.jti_openconfig_telemetry
2018-10-22T19:32:53Z I! Loaded aggregators: 
2018-10-22T19:32:53Z I! Loaded processors: 
2018-10-22T19:32:53Z I! Loaded outputs: influxdb
2018-10-22T19:32:53Z I! Tags enabled: host=c3cb65066c46
2018-10-22T19:32:53Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"c3cb65066c46", Flush Interval:5s 
2018-10-22T19:33:00Z D! Opened a new gRPC session to 10.10.10.10 on port 50051
2018-10-22T19:33:00Z E! Could not initiate login check for 10.10.10.10:50051: <nil>
2018-10-22T19:33:05Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-22T19:33:10Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-22T19:33:15Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-22T19:33:20Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-22T19:33:25Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-22T19:33:30Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 

@glinton
Copy link
Contributor

glinton commented Oct 22, 2018

@mohsin106 can you try that last test with this build

If you need something other than a linux amd64 build, let me know what OS/architecture you're using

@mohsin106
Copy link
Author

mohsin106 commented Oct 23, 2018

I'm using centos 7. I tried using v1.9 and getting the same results:

With username/password commented out in telegraf.conf file:

2018/10/23 09:14:33 I! Using config file: /etc/telegraf/telegraf.conf
2018-10-23T13:14:33Z D! Attempting connection to output: influxdb
2018-10-23T13:14:33Z D! Successfully connected to output: influxdb
2018-10-23T13:14:33Z I! Starting Telegraf 
2018-10-23T13:14:33Z I! Loaded inputs: inputs.jti_openconfig_telemetry
2018-10-23T13:14:33Z I! Loaded aggregators: 
2018-10-23T13:14:33Z I! Loaded processors: 
2018-10-23T13:14:33Z I! Loaded outputs: influxdb
2018-10-23T13:14:33Z I! Tags enabled: host=hostname
2018-10-23T13:14:33Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"hostname", Flush Interval:5s 
2018-10-23T13:14:40Z D! Opened a new gRPC session to 10.10.10.10 on port 50051
2018-10-23T13:14:40Z E! Error in plugin [inputs.jti_openconfig_telemetry]: E! Failed to read from rpc error: code = Unavailable desc = transport is closing: 10.10.10.10
2018-10-23T13:14:40Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-23T13:14:41Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-23T13:14:42Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-23T13:14:43Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-23T13:14:44Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-23T13:14:45Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-23T13:14:45Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-23T13:14:46Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-23T13:14:47Z D! Retrying 10.10.10.10 with timeout 1s
2018-10-23T13:14:48Z D! Retrying 10.10.10.10 with timeout 1s
^C2018-10-23T13:14:48Z I! Hang on, flushing any cached metrics before shutdown
2018-10-23T13:14:48Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-23T13:14:49Z E! Error in plugin [inputs.jti_openconfig_telemetry]: E! Could not subscribe to 10.10.10.10: rpc error: code = Canceled desc = grpc: the client connection is closing

With username and password un-commented out in telegraf.conf file:

2018/10/23 09:12:51 I! Using config file: /etc/telegraf/telegraf.conf
2018-10-23T13:12:51Z D! Attempting connection to output: influxdb
2018-10-23T13:12:51Z D! Successfully connected to output: influxdb
2018-10-23T13:12:51Z I! Starting Telegraf 
2018-10-23T13:12:51Z I! Loaded inputs: inputs.jti_openconfig_telemetry
2018-10-23T13:12:51Z I! Loaded aggregators: 
2018-10-23T13:12:51Z I! Loaded processors: 
2018-10-23T13:12:51Z I! Loaded outputs: influxdb
2018-10-23T13:12:51Z I! Tags enabled: host=hostname
2018-10-23T13:12:51Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"hostname", Flush Interval:5s 
2018-10-23T13:13:00Z D! Opened a new gRPC session to 10.10.10.10 on port 50051
2018-10-23T13:13:00Z E! Could not initiate login check for 10.10.10.10:50051: <nil>
2018-10-23T13:13:05Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-23T13:13:10Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-23T13:13:15Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
2018-10-23T13:13:20Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 
^C2018-10-23T13:13:22Z I! Hang on, flushing any cached metrics before shutdown
2018-10-23T13:13:22Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics. 

@glinton
Copy link
Contributor

glinton commented Oct 23, 2018

The E! Could not initiate login check for 10.10.10.10:50051: was the line that would have changed with the new version, but it looks like the error doesn't contain anything. We'll have to continue debugging

It actually appears the new binary wasn't used for that test. Can you try installing the new rpm @mohsin106

@mohsin106
Copy link
Author

I believe I'm still getting the same error:

/etc/telegraf # telegraf --version
Telegraf unknown (git: master 8d0ec99)

/etc/telegraf # yum info telegraf
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile

  • base: reflector.westga.edu
  • extras: reflector.westga.edu
  • updates: reflector.westga.edu
    Installed Packages
    Name : telegraf
    Arch : x86_64
    Version : 1.9.0~8d0ec993
    Release : 0
    Size : 53 M
    Repo : installed
    Summary : Plugin-driven server agent for reporting metrics into InfluxDB.
    URL : https://github.com/influxdata/telegraf
    License : MIT
    Description : Plugin-driven server agent for reporting metrics into InfluxDB.

/etc/telegraf # telegraf --debug set
2018/10/23 15:45:11 I! Using config file: /etc/telegraf/telegraf.conf
2018-10-23T19:45:11Z D! Attempting connection to output: influxdb
2018-10-23T19:45:11Z D! Successfully connected to output: influxdb
2018-10-23T19:45:11Z I! Starting Telegraf
2018-10-23T19:45:11Z I! Loaded inputs: inputs.jti_openconfig_telemetry
2018-10-23T19:45:11Z I! Loaded aggregators:
2018-10-23T19:45:11Z I! Loaded processors:
2018-10-23T19:45:11Z I! Loaded outputs: influxdb
2018-10-23T19:45:11Z I! Tags enabled: host=hostname
2018-10-23T19:45:11Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"hostname", Flush Interval:10s
2018-10-23T19:45:20Z D! Opened a new gRPC session to 10.10.10.10 on port 50051
2018-10-23T19:45:20Z E! Could not initiate login check for 10.10.10.10:50051:
2018-10-23T19:45:30Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics.
2018-10-23T19:45:40Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics.
^C2018-10-23T19:45:41Z I! Hang on, flushing any cached metrics before shutdown
2018-10-23T19:45:41Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics.

@glinton
Copy link
Contributor

glinton commented Oct 24, 2018

The hash in the version (8d0ec993) says it is not using the correct binary. Sorry, I've confirmed this is the correct one you should install and post the results

@mohsin106
Copy link
Author

Thank you! I believe you fixed it. I am now seeing a session established and data flowing from the router and into my InfluxDB.
The only change I needed to make in my telegraf.conf file was change the IP address in the "servers" setting to FQDN:portNumber.
Would you be able to elaborate on what the fix was?
Thanks again!

@mohsin106
Copy link
Author

Sorry I forgot to post the output of the plugin before it starts streaming the data. Here it is:

/etc/telegraf # telegraf --debug set
2018/10/24 14:04:11 I! Using config file: /etc/telegraf/telegraf.conf
2018-10-24T18:04:11Z D! Attempting connection to output: influxdb
2018-10-24T18:04:11Z D! Successfully connected to output: influxdb
2018-10-24T18:04:11Z I! Starting Telegraf
2018-10-24T18:04:11Z I! Loaded inputs: inputs.jti_openconfig_telemetry
2018-10-24T18:04:11Z I! Loaded aggregators:
2018-10-24T18:04:11Z I! Loaded processors:
2018-10-24T18:04:11Z I! Loaded outputs: influxdb
2018-10-24T18:04:11Z I! Tags enabled: host=hostname
2018-10-24T18:04:11Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"hostname", Flush Interval:10s
2018-10-24T18:04:20Z D! Opened a new gRPC session to hostname.blah.com on port 50051

@glinton
Copy link
Contributor

glinton commented Oct 24, 2018

That binary didn't contain a fix, but just proper logging so we could become aware of why the login failed. Glad you were able to resolve this

@mohsin106
Copy link
Author

Hi Glinton,
I noticed that I am not able to make a request to the router FQDN by passing https with the URL. Is it possible for Telegraf v1.9 to initiate an HTTPs request without passing a TLS certificate file? I'd like to be able to use https://fqdn:portnumber in my telegraf.conf file and then let the router continue setting up the TLS connection. This is what I'm seeing when I try to use https in my telegraf.conf file:

/etc/telegraf # telegraf --debug set
2018/10/25 09:37:28 I! Using config file: /etc/telegraf/telegraf.conf
2018-10-25T13:37:28Z D! Attempting connection to output: influxdb
2018-10-25T13:37:28Z D! Successfully connected to output: influxdb
2018-10-25T13:37:28Z I! Starting Telegraf
2018-10-25T13:37:28Z I! Loaded inputs: inputs.jti_openconfig_telemetry
2018-10-25T13:37:28Z I! Loaded aggregators:
2018-10-25T13:37:28Z I! Loaded processors:
2018-10-25T13:37:28Z I! Loaded outputs: influxdb
2018-10-25T13:37:28Z I! Tags enabled: host=hostname
2018-10-25T13:37:28Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"hostname", Flush Interval:10s
2018-10-25T13:37:30Z E! Invalid server address: address https://fqdn:50051: too many colons in address

@mohsin106
Copy link
Author

mohsin106 commented Oct 25, 2018

Also is there an official Docker image of Telegraf v1.9 available or a PR?

@glinton
Copy link
Contributor

glinton commented Oct 25, 2018

Telegraf 1.9 has yet to be released, but you can use a nightly build. Regarding not using a TLS certificate file, it appears that is required by the client library we are using. There appears to be another way by using a tls.Config which may be possible, but I'm not sure at this time.

@smalenfant
Copy link

@glinton I work with mohsin. Do you have a private branch with those fix in? Would it be possible to either get a docker image or OSX build? Thanks. Trying to test a few different things.

@danielnelson
Copy link
Contributor

We included the logging fix in the 1.8.3 release earlier this week. We don't currently do builds for OSX (#4801), but you can get the docker image from docker hub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants