Fluentbit randomly connects to ipv4 in an ipv6 only network #8214

nvima · 2023-11-25T11:40:31Z

Bug Report

Describe the bug

Hi, we are currently trying to remove the nat gateways in our AWS Fargate cluster and run with ipv6 only.
So far we have always used the aws-for-fluentbit image.
After removing the nat gateway, many logs appeared that chunks were not sent or no connection could be established.
Although some logs actually arrive in Loki.
Sometimes chunks could not be sent either.
After I saw that "aws-for-fluentbit" still uses an older Fluentbit version, I created a dummy config for fluentbit and tried the latest version "cr.fluentbit.io/fluent/fluent-bit:2.1.10-debug".
We have the same problems there, but the logs are slightly different.
With "aws-for-fluentbit" the current loki host dns name is in the logs, and with fluentbit the logs strangely contain the ipv4 addresses. Nevertheless, logs arrive occasionally and sometimes chunks cannot be sent.

Our Loki Host DNS has two alias entries with A and AAAA.
When I log into the fluentbit container in the aws vpc network I can connect to the loki host with curl without any problems.

I'm not sure where the problem lies, whether it's DNS problems? or whether it's fluentbit. What speaks against DNS problems is that logs actually arrive at loki, in a network that only has ipv6.
I suspect that fluentbit sometimes forces ipv4 and then has connection problems and sporadically uses ipv6 and then chunks can still be sent.

Are there people here who use fluentbit in an ipv6 only network?

To Reproduce

Use Fluentbit with Loki Plugin in an ipv6 Network
Example Config:

[SERVICE]
Flush 1
Log_Level info

[INPUT]
Name dummy
Dummy {"test":"foobar"}
Rate 1

[OUTPUT]
Name loki
Match *
Host loki-test.example.com
Port 443
http_user loki-dev-user
http_passwd supersecretpw
Labels job=debug,group=debug
Line_Format json
Tls On

[2023/11/25 11:09:33] [error] [engine] chunk '6-1700910545.321356301.flb' cannot be retried: task_id=5, input=dummy.0 > output=loki.0
[2023/11/25 11:09:33] [error] [output:loki:loki.0] no upstream connections available
[2023/11/25 11:09:33] [error] [upstream] connection #51 to tcp://[ipv4-adress]:443 timed out after 10 seconds (connection timeout)

[2023/11/25 11:10:01] [ info] [engine] flush chunk '6-1700910579.321866379.flb' succeeded at retry 1: task_id=7, input=dummy.0 > output=loki.0 (out_id=0)
[2023/11/25 11:10:03] [ warn] [engine] failed to flush chunk '6-1700910591.321695942.flb', retry in 10 seconds: task_id=18, input=dummy.0 > output=loki.0 (out_id=0)

Your Environment

Version used: fluent-bit:2.1.10-debug
Configuration: see above
Environment name and version (e.g. Kubernetes? What version?): AWS Fargate ECS
Server type and version:
Operating System and version:
Filters and plugins: Loki

Additional context

Runnig Fluentbit in an ipv6 only network

The text was updated successfully, but these errors were encountered:

nvima · 2023-11-26T19:26:31Z

I found out that fluentbit random connects to either the ipv4 or ipv6 address of the DNS request.
This causes problems in networks in which IPv4 is not available beyond the internet.
I put up a pull request that adds an option to prefer ipv6 network, similar to what has already been implemented for ipv4.

#4500

#8216

nvima added the status: waiting-for-triage label Nov 25, 2023

nvima mentioned this issue Nov 26, 2023

network: prefer ipv6 DNS results option #8216

Merged

6 tasks

edsiper closed this as completed in #8216 Jan 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fluentbit randomly connects to ipv4 in an ipv6 only network #8214

Fluentbit randomly connects to ipv4 in an ipv6 only network #8214

nvima commented Nov 25, 2023

nvima commented Nov 26, 2023

Fluentbit randomly connects to ipv4 in an ipv6 only network #8214

Fluentbit randomly connects to ipv4 in an ipv6 only network #8214

Comments

nvima commented Nov 25, 2023

Bug Report

nvima commented Nov 26, 2023