-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the overlay network between nodes seems to be broken in Debian 11 (bullseye) #3863
Comments
I've seen other issues with iptables on Debian, for example #3117 (comment) - I wonder if this is related? |
upgraded to v1.21.4+k3s1 and its working fine now. |
I'm still seeing this problem on
Now when I try to query the pod on the master node from the dnsutils container on the master node everything is working fine:
Now let's repeat the same, querying the pod on the agent node from the dnstest container on the master node:
It does seem to be an improvement over
local request within master node:
request from master node to agent node:
|
Indeed, it has semmed to work. But I've recreated the environment again, and it does not work. Nor does it work with the v1.22.1-rc1+k3s1. |
when launching the server with |
There have been a bunch of issues with udp checksum offload in recent kernels, I suspect that's the root cause here. |
@brandond do you have any links about that? any idea if there's an workaround? |
Not sure why people keep tagging @bradtopol in stuff... |
@brandond, sorry! I didn't pay attention. I really wanted to tag you. What happened in my case, is that for some odd reason your name is not the first on the github code completion. It seems it doesn't rank the people that are already in a issue to appear first. |
disabling offloading via it also doesn't seem to be fully consistent, for this testing i mostly kept rebuilding vms at hetzner cloud with varying results. it does however as @rgl already mentioned seem to be an issue that only affects vxlan. |
You need to disable checksum on the vxlan interface, not the physical interface. See: |
According to projectcalico/felix#2811 the offload bug only affects Linux <5.7. Debian 11 is using Linux 5.10. Even so, I gave it a try, but it did not work here. Something else seems to be braking vxlan :-( Here's how I've disable it on all machine and even all interfaces: for i in eth0 eth1 flannel.1 cni0; do
#ethtool -K $i tx-checksum-ip-generic off
ethtool -K $i tx off rx off
done |
I forgot to make this more explicit before, but I'm running this in KVM VMs (using virtio nic) on a Ubuntu 20.04 host. The code is at https://github.com/rgl/k3s-vagrant/tree/wip-vxlan. |
same encountered in my recently migrated Debian 11 (kernel 5.10.46 env), details are..
removing Edit: I have grabbed a pair of packet captures on the Debian 11 hosts. Confirmed what others have mentioned, only traffic from the Master between the Overlay network fails. But the nodes/agents can communicate without issue over the flannel network.
On k3s-master01 and k3s-node02 , I generated traffic to the certmanager pod, that resides on k3s-node01 (10.42.2.0) , by doing packet captures zipped up below..
I wonder if this is an iptables issue with the source address range used by the master (10.42.0.1/24) ? Though on the destinations (k3s-node[12]) i see the following.. ACCEPT all -- 10.42.0.0/16 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 10.42.0.0/16 @brandond : Is it possible for me to manually assign the flannel network range on my master something other than 10.42.0.1/2 ? |
Sure sounds like something's up with the Debian 11 vxlan kernel module... |
@jesseward as per https://rancher.com/docs/k3s/latest/en/installation/install-options/server-config/#networking that should easily be doable |
+1 I've encountered the same issue on a vanilla installation of Bullseye on an NUC7PJYH. |
Same here. I've got a cluster made of rapsberrys and one ASUS NP-51 running with debian 11. The one overlay network seems to fail only for the node running with debian. The connectivity between the rapsberrys is just fine. |
Duplicate of #4188 (comment) - we'll track the fix over there. |
@rgl I know the reason why vxlan can not connect in debian11. Every vxlan interface create in debian11 will be have the same mac address. |
@homerzhou, in the meantime this issue was fixed and vxlan is working in debian 11. |
Environmental Info:
K3s Version:
Node(s) CPU architecture, OS, and Version:
Cluster Configuration:
1 server (without
worker
role) and 1 agent as configured by my playground at https://github.com/rgl/k3s-vagrant/tree/debian-11-wip.Describe the bug:
the traefik dashboard is installed as https://github.com/rgl/k3s-vagrant/blob/debian-11-wip/provision-k3s-server.sh#L62-L167 and is available in all the cluster nodes at:
s1
node)a1
node)when running in Debian 10, accessing both addresses works fine (they return the expected web page). but when running in Debian 11, the
a1
node address does not work (the request times out).it does not even work when trying a
wget http://10.11.0.201:9000/dashboard/
from thes1
node (but works inside thea1
node).while running
tcpdump
at thes1
anda1
nodes, I can see the SYN packets leaves1
toa1
, buta1
never replies.do you have any idea why this is happening or what might be blocking this? or any clue how to make it work?
PS when launching the server with
--flannel-backend 'host-gw'
things seem to work. so it seems there's something going on with thevxlan
backend.Steps To Reproduce:
PS: That is running in KVM VMs (using virtio nic) on a Ubuntu 20.04 host. The code is at https://github.com/rgl/k3s-vagrant/tree/wip-vxlan.
Expected behavior:
expected it to work in Debian 11, like it does in Debian 10.
Actual behavior:
traffic between nodes in the overlay network does not seem to be working correctly.
Additional context / logs:
The text was updated successfully, but these errors were encountered: