Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent crashes if kafka_exporter fails to connect to kafka #403

Open
7840vz opened this issue Jul 21, 2023 · 7 comments
Open

agent crashes if kafka_exporter fails to connect to kafka #403

7840vz opened this issue Jul 21, 2023 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@7840vz
Copy link

7840vz commented Jul 21, 2023

What's wrong?

Agent stops immediately if agent fails to connect to kafka on start.

I think this is not right behavior.
Because config is valid, and it is just networking issue, other metrics/integrations/log collector should continue working while kafka_integration should try to reconnect instead of killing the agent.

Steps to reproduce

Add config like this:

integrations:
  kafka_exporter:
    enabled: true
    kafka_uris:
      - localhost:9092
    scrape_integration: true
    scrape_interval: 15s

System information

Linux

Software version

v0.34.3

Configuration

No response

Logs

Jul 21 13:35:52 mon-1 systemd[1]: grafana-agent.service: Main process exited, code=exited, status=1/FAILURE
Jul 21 13:35:52 mon-1 systemd[1]: grafana-agent.service: Failed with result 'exit-code'.
Jul 21 13:35:52 mon-1 systemd[1]: grafana-agent.service: Scheduled restart job, restart counter is at 5.
Jul 21 13:35:52 mon-1 systemd[1]: Stopped Grafana Agent.
Jul 21 13:35:52 mon-1 systemd[1]: grafana-agent.service: Start request repeated too quickly.
Jul 21 13:35:52 mon-1 systemd[1]: grafana-agent.service: Failed with result 'exit-code'.
Jul 21 13:35:52 mon-1 systemd[1]: Failed to start Grafana Agent.
Jul 21 13:36:20 mon-1 systemd[1]: Started Grafana Agent.
Jul 21 13:36:21 mon-1 grafana-agent[34170]: ts=2023-07-21T13:36:21.717477616Z caller=exporter.go:214 level=error integration=kafka_exporter msg="Error initiating kafka client: %s" err="kafka: client has run out of available brokers to talk to: dial tcp 127.0.0.1:9092: connect: connection refused"
Jul 21 13:36:21 mon-1 grafana-agent[34170]: ts=2023-07-21T13:36:21.719833625Z caller=manager.go:261 level=error msg="failed to initialize integration. it will not run or be scraped" integration=kafka_exporter err="could not instantiate kafka lag exporter: kafka: client has run out of available brokers to talk to: dial tcp 127.0.0.1:9092: connect: connection refused"
Jul 21 13:36:21 mon-1 grafana-agent[34170]: ts=2023-07-21T13:36:21.723370642Z caller=main.go:72 level=error msg="error creating the agent server entrypoint" err="failed applying config: not all integrations were correctly updated"
Jul 21 13:36:21 mon-1 systemd[1]: grafana-agent.service: Main process exited, code=exited, status=1/FAILURE
Jul 21 13:36:21 mon-1 systemd[1]: grafana-agent.service: Failed with result 'exit-code'.
@7840vz 7840vz added the bug Something isn't working label Jul 21, 2023
@marctc
Copy link
Contributor

marctc commented Jul 21, 2023

Thanks for reporting @7840vz. That's indeed an undesired behavior and I suspect it might happen in other integrations. In this case, I was able to reproduce it and it's either that it an issue that has to be fixed upstream or it's solved in a version that we are not using. Do you see the same behavior if you run the exporter manually?

@7840vz
Copy link
Author

7840vz commented Jul 21, 2023

Sorry, I haven't tried to use exporter separately.

If I run exporter separately and it does the same behaviour, I can at least setup exporter's container or systemd service to always restart after crash, and it ok, since exporter doesn't do anything else. Here it wouldn't help, as I need agent's other integrations/scrapes to be working, while kafka is not available....

@marctc marctc self-assigned this Jul 25, 2023
@marctc
Copy link
Contributor

marctc commented Aug 28, 2023

I have manually checked other integrations if they have similar behavior but doesn't seem to be the case (which is good). I'm pretty sure that including last kafka changes this probably would get fixed. The work to do that is ongoing and can be tracked here #464

@marctc marctc removed their assignment Aug 28, 2023
@github-actions
Copy link
Contributor

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!

@marctc
Copy link
Contributor

marctc commented Nov 6, 2023

I double checked and the problem happens upstream: the exporter basically crashes if kafka_exporter can't connect to a kafka broker. @7840vz i'd suggest to raise the issue to the original exporter to see if they are up for a fix.

@marctc marctc removed their assignment Nov 6, 2023
@BurningDog
Copy link

I had the same issue when attempting to connect to caddy via scrape_configs. I'd done some reconfiguring with networks in my docker-compose file and forgot to restart the caddy service, so couldn't reach it any more.

The error message - level=error msg="error creating the agent server entrypoint" err="failed applying config: not all integrations were correctly updated" is unhelpful. The only way to debug a network issue inside the container is to remove each integration one at a time and restart the container.

@ptodev ptodev self-assigned this Feb 15, 2024
@rfratto
Copy link
Member

rfratto commented Apr 11, 2024

Hi there 👋

On April 9, 2024, Grafana Labs announced Grafana Alloy, the spirital successor to Grafana Agent and the final form of Grafana Agent flow mode. As a result, Grafana Agent has been deprecated and will only be receiving bug and security fixes until its end-of-life around November 1, 2025.

To make things easier for maintainers, we're in the process of migrating all issues tagged variant/flow to the Grafana Alloy repository to have a single home for tracking issues. This issue is likely something we'll want to address in both Grafana Alloy and Grafana Agent, so just because it's being moved doesn't mean we won't address the issue in Grafana Agent :)

@rfratto rfratto transferred this issue from grafana/agent Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

No branches or pull requests

5 participants