Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluentbit randomly connects to ipv4 in an ipv6 only network #8214

Closed
nvima opened this issue Nov 25, 2023 · 1 comment · Fixed by #8216
Closed

Fluentbit randomly connects to ipv4 in an ipv6 only network #8214

nvima opened this issue Nov 25, 2023 · 1 comment · Fixed by #8216

Comments

@nvima
Copy link
Contributor

nvima commented Nov 25, 2023

Bug Report

Describe the bug

Hi, we are currently trying to remove the nat gateways in our AWS Fargate cluster and run with ipv6 only.
So far we have always used the aws-for-fluentbit image.
After removing the nat gateway, many logs appeared that chunks were not sent or no connection could be established.
Although some logs actually arrive in Loki.
Sometimes chunks could not be sent either.
After I saw that "aws-for-fluentbit" still uses an older Fluentbit version, I created a dummy config for fluentbit and tried the latest version "cr.fluentbit.io/fluent/fluent-bit:2.1.10-debug".
We have the same problems there, but the logs are slightly different.
With "aws-for-fluentbit" the current loki host dns name is in the logs, and with fluentbit the logs strangely contain the ipv4 addresses. Nevertheless, logs arrive occasionally and sometimes chunks cannot be sent.

Our Loki Host DNS has two alias entries with A and AAAA.
When I log into the fluentbit container in the aws vpc network I can connect to the loki host with curl without any problems.

I'm not sure where the problem lies, whether it's DNS problems? or whether it's fluentbit. What speaks against DNS problems is that logs actually arrive at loki, in a network that only has ipv6.
I suspect that fluentbit sometimes forces ipv4 and then has connection problems and sporadically uses ipv6 and then chunks can still be sent.

Are there people here who use fluentbit in an ipv6 only network?

To Reproduce

  • Use Fluentbit with Loki Plugin in an ipv6 Network
    Example Config:

[SERVICE]
Flush 1
Log_Level info

[INPUT]
Name dummy
Dummy {"test":"foobar"}
Rate 1

[OUTPUT]
Name loki
Match *
Host loki-test.example.com
Port 443
http_user loki-dev-user
http_passwd supersecretpw
Labels job=debug,group=debug
Line_Format json
Tls On

[2023/11/25 11:09:33] [error] [engine] chunk '6-1700910545.321356301.flb' cannot be retried: task_id=5, input=dummy.0 > output=loki.0
[2023/11/25 11:09:33] [error] [output:loki:loki.0] no upstream connections available
[2023/11/25 11:09:33] [error] [upstream] connection #51 to tcp://[ipv4-adress]:443 timed out after 10 seconds (connection timeout)

[2023/11/25 11:10:01] [ info] [engine] flush chunk '6-1700910579.321866379.flb' succeeded at retry 1: task_id=7, input=dummy.0 > output=loki.0 (out_id=0)
[2023/11/25 11:10:03] [ warn] [engine] failed to flush chunk '6-1700910591.321695942.flb', retry in 10 seconds: task_id=18, input=dummy.0 > output=loki.0 (out_id=0)

Your Environment

  • Version used: fluent-bit:2.1.10-debug
  • Configuration: see above
  • Environment name and version (e.g. Kubernetes? What version?): AWS Fargate ECS
  • Server type and version:
  • Operating System and version:
  • Filters and plugins: Loki

Additional context

Runnig Fluentbit in an ipv6 only network

@nvima
Copy link
Contributor Author

nvima commented Nov 26, 2023

I found out that fluentbit random connects to either the ipv4 or ipv6 address of the DNS request.
This causes problems in networks in which IPv4 is not available beyond the internet.
I put up a pull request that adds an option to prefer ipv6 network, similar to what has already been implemented for ipv4.

#4500

#8216

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant