Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker image cannot connect to kafka(both in/out) using SASL/SCRAM #6072

Closed
phillui-37 opened this issue Jul 5, 2019 · 20 comments
Closed

Docker image cannot connect to kafka(both in/out) using SASL/SCRAM #6072

phillui-37 opened this issue Jul 5, 2019 · 20 comments
Labels
area/kafka feature request Requests for new plugin and for new features to existing plugins upstream bug or issues that rely on dependency fixes

Comments

@phillui-37
Copy link

phillui-37 commented Jul 5, 2019

Relevant telegraf.conf:

# There are two configs which related to two telegraf instance having the same issue.
## kafka -> influx ##
[[outputs.influxdb]]
url = "http://influx_dest:8086"
database = "telegraf"

[[inputs.kafka_consumer]]
brokers = ["a1:9094","a2:9094","a3:9094"]
topics = ["test-topic"]
version = "0.11.0.0"
sasl_username = "test"
sasl_password = "wtever"
consumer_group = "test"
offset = "oldest"
max_message_len = 1000000
data_format = "influx"
insecure_skip_verify = true

## influx -> kafka ##
[[inputs.influxdb]]
urls = ["http://influx_src:8086/debug/vars"]
timeout = "5s"

[[outputs.kafka]]
brokers = ["a1:9094","a2:9094","a3:9094"]
version = "0.11.0.0"
topic = "test-topic"
sasl_username = "test"
sasl_password = "wtever"
data_format = "influx"
compression_codec = 3
insecure_skip_verify = true
required_acks = 1

System info:

System: OSX 10.14.5/Arch Linux 5.1.15
Docker: (OSX) 18.09.2, build 6247962/ (Arch Linux) 18.09.7-ce, build 2d0083d657
Telegraf: 1.11.1 (edge docker image)
librdkafka: 1.1.0

Steps to reproduce:

Start the two telegraf docker instances with the two configs.

Expected behavior:

Connection is build between influxdb and kafka. Data synchronization start automatically.

Actual behavior:

Error occur and instance crash. Instance cannot connect to kafka. Telnet direct connect test ok.

Additional info:

influx -> kafka

2019-07-05T07:38:39Z I! Starting Telegraf 1.11.1
2019-07-05T07:38:39Z I! Using config file: /etc/telegraf/telegraf.conf
2019-07-05T07:38:39Z I! Loaded inputs: influxdb
2019-07-05T07:38:39Z I! Loaded aggregators: 
2019-07-05T07:38:39Z I! Loaded processors: 
2019-07-05T07:38:39Z I! Loaded outputs: kafka
2019-07-05T07:38:39Z I! Tags enabled: host=612ddcc85b3e
2019-07-05T07:38:39Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"612ddcc85b3e", Flush Interval:10s
2019-07-05T07:38:53Z E! [agent] Failed to connect to output kafka, retrying in 15s, error was 'kafka: client has run out of available brokers to talk to (Is your cluster reachable?)' 
2019-07-05T07:39:24Z E! [telegraf] Error running agent: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)

kafka -> influx

2019-07-05T07:49:34Z I! Starting Telegraf 1.11.1
2019-07-05T07:49:34Z I! Using config file: /etc/telegraf/telegraf.conf
2019-07-05T07:49:34Z I! Loaded inputs: kafka_consumer
2019-07-05T07:49:34Z I! Loaded aggregators: 
2019-07-05T07:49:34Z I! Loaded processors: 
2019-07-05T07:49:34Z I! Loaded outputs: influxdb
2019-07-05T07:49:34Z I! Tags enabled: host=436c2585696d
2019-07-05T07:49:34Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"436c2585696d", Flush Interval:10s
2019-07-05T07:49:49Z E! Error when creating Kafka Consumer, brokers: [a1:9094 a2:9094 a3:9094], topics: [test-topic]
2019-07-05T07:49:49Z E! [agent] Service for input inputs.kafka_consumer failed to start: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
2019-07-05T07:49:49Z E! [telegraf] Error running agent: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)

telnet test

$ docker exec -it test-fluent-kafka-influx-io_telegraf_k2i_1 bash -c "apt update; apt install -y telnet;telnet a1 9094"

Ign:1 http://deb.debian.org/debian stretch InRelease
Hit:2 http://security.debian.org/debian-security stretch/updates InRelease
Hit:3 http://deb.debian.org/debian stretch-updates InRelease
Hit:4 http://deb.debian.org/debian stretch Release 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
3 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  telnet
0 upgraded, 1 newly installed, 0 to remove and 3 not upgraded.
Need to get 72.0 kB of archives.
After this operation, 161 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian stretch/main amd64 telnet amd64 0.17-41 [72.0 kB]
Fetched 72.0 kB in 0s (82.0 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package telnet.
(Reading database ... 9452 files and directories currently installed.)
Preparing to unpack .../telnet_0.17-41_amd64.deb ...
Unpacking telnet (0.17-41) ...
Setting up telnet (0.17-41) ...
update-alternatives: using /usr/bin/telnet.netkit to provide /usr/bin/telnet (telnet) in auto mode
Trying xxx.xxx.xxx.xxx...
Connected to a1.
Escape character is '^]'.

@phillui-37
Copy link
Author

I don't know what is the issue root cause as I have tried to connect to same kafka service using Python->confluent-kafka which can send and receive message normally

@glinton
Copy link
Contributor

glinton commented Jul 5, 2019

Are you able to do a packet trace to determine where the packets are being dropped?

@phillui-37
Copy link
Author

I used tcpdump to trace packets. Result are listed here

@phillui-37
Copy link
Author

I used cloudkarafka to demo the status here

@phillui-37
Copy link
Author

I used image you have provided and still have the same result

tcpdump

@danielnelson
Copy link
Contributor

Can you run it with the --debug cli option, we are expecting to see a fair number of new log messages.

@phillui-37
Copy link
Author

phillui-37 commented Jul 11, 2019

log

@danielnelson
Copy link
Contributor

danielnelson commented Jul 11, 2019

Thanks @phillui-37, connection is fine but it looks like the server closes the connection once the SASL handshake is sent.

Would you be able to check the logs from one of the Kafka servers for any clues? Also, it might be helpful if you could put together a small sample program with your python code that can connect and send a message.

Finally, there is one connection option that we don't expose from Telegraf that perhaps could be of interest. Here is the library documentation:

// Whether or not to send the Kafka SASL handshake first if enabled
// (defaults to true). You should only set this to false if you're using
// a non-Kafka SASL proxy.
Handshake bool

@phillui-37
Copy link
Author

There is no logs provided by cloudkarfka free account, sorry...
well, the python test code is located here

@danielnelson
Copy link
Contributor

Looks like support for SASL/SCRAM requires a newer version of the sarama library than we have in Telegraf right now. I'll try to put together a new build tomorrow with the updated library for testing.

IBM/sarama#1295

@danielnelson
Copy link
Contributor

Give one of these builds a shot, you will need to add sasl_mechanism in the configurations for both the input and output:

[[outputs.kafka]]
  sasl_mechanism = "SCRAM-SHA-256"
  sasl_username = "foo"
  sasl_password = "foo"

@phillui-37
Copy link
Author

still not work

log and conf

@danielnelson
Copy link
Contributor

I opened an issue upstream with the Sarama project: IBM/sarama#1427

@danielnelson danielnelson added feature request Requests for new plugin and for new features to existing plugins upstream bug or issues that rely on dependency fixes area/kafka labels Jul 16, 2019
@marceloalmeida
Copy link
Contributor

Any updates? I'm still facing the same issue on version 1.14.1

@vpedosyuk
Copy link

It'd be great to try it with Sarama v1.26.4 as it seems to have no issues with scram-based auth

@hershdhillon
Copy link

Facing this today. Any updates on this particular issue?

@tosheer
Copy link

tosheer commented Oct 21, 2020

@danielnelson Can you share the repo of the telegraf fork which worked for you when connecting to kafka having SASL/SCRAM based authentication?

@tosheer
Copy link

tosheer commented Oct 21, 2020

@phillui-37 can you share telegraf fork which worked for you when connecting to kafka having SASL/SCRAM based authentication? I just need to see the implementation details.

@sjwang90
Copy link
Contributor

Closed in #8318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kafka feature request Requests for new plugin and for new features to existing plugins upstream bug or issues that rely on dependency fixes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants