"link has incompatible addresses" after restarting Flannel k8s pod #1060

ljfranklin · 2018-11-03T18:14:28Z

Flannel pod starts successfully if flannel.1 link doesn't exist. But running kubectl delete pod kube-flannel-... leaves the flannel.1 link and the subsequently created pod will fail to start with the following error:

I1103 17:54:41.197308       1 main.go:475] Determining IP address of default interface
I1103 17:54:41.198443       1 main.go:488] Using interface with name eth0 and address 192.168.1.244
I1103 17:54:41.198533       1 main.go:505] Defaulting external address to interface address (192.168.1.244)
I1103 17:54:41.698456       1 kube.go:131] Waiting 10m0s for node controller to sync
I1103 17:54:41.698812       1 kube.go:294] Starting kube subnet manager
I1103 17:54:42.699051       1 kube.go:138] Node controller sync successful
I1103 17:54:42.699241       1 main.go:235] Created subnet manager: Kubernetes Subnet Manager - k8s-worker2
I1103 17:54:42.699302       1 main.go:238] Installing signal handlers
I1103 17:54:42.796179       1 main.go:353] Found network config - Backend type: vxlan
I1103 17:54:42.796884       1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E1103 17:54:42.799114       1 main.go:280] Error registering network: failed to configure interface flannel.1: failed to ensure address of interface flannel.1: link has incompatible addresses. Remove additional addresses and try again. &netlink.Vxlan{LinkAttrs:netlink.LinkAttrs{Index:715, MTU:1450, TxQLen:0, Name:"flannel.1", HardwareAddr:net.HardwareAddr{0x96, 0x11, 0x68, 0xa1, 0x57, 0x2b}, Flags:0x13, RawFlags:0x11043, ParentIndex:0, MasterIndex:0, Namespace:interface {}(nil), Alias:"", Statistics:(*netlink.LinkStatistics)(0x13c1a0e4), Promisc:0, Xdp:(*netlink.LinkXdp)(0x13d673a0), EncapType:"ether", Protinfo:(*netlink.Protinfo)(nil), OperState:0x0}, VxlanId:1, VtepDevIndex:2, SrcAddr:net.IP{0xc0, 0xa8, 0x1, 0xf4}, Group:net.IP(nil), TTL:0, TOS:0, Learning:false, Proxy:false, RSC:false, L2miss:false, L3miss:false, UDPCSum:true, NoAge:false, GBP:false, Age:300, Limit:0, Port:8472, PortLow:0, PortHigh:0}
I1103 17:54:42.799393       1 main.go:333] Stopping shutdownHandler...

This is on a raspberry pi 3 B+, so the arm architecture may be a factor.

Expected Behavior

Recreating the flannel pod should succeed even if flannel.1 link exists.

Current Behavior

Flannel pod goes into CrashLoopBackoff after the pod is recreated. To allow the pod to start successfully, SSH onto the worker and run sudo ip link delete flannel.1. Recreating the pod will then start successfully.

Possible Solution

?

Steps to Reproduce (for bugs)

Deploy k8s with flannel
kubectl delete pod kube-flannel-...
See that recreated pod does not start and logs above error message

Context

About once a week my flannel pods enter this state, possibly due to a crash or restart of the pod, and I have to manually SSH in to delete the flannel link on each affected node.

Related issue: #883

Your Environment

Flannel version: v0.10.0
Backend used (e.g. vxlan or udp): vxlan
Etcd version: 3.2.24
Kubernetes version (if used): 1.12.1
Operating System and version: HypriotOS 1.9.0 on Raspberry Pi 3 B+
Link to your project (optional): n/a

The text was updated successfully, but these errors were encountered:

b3nw · 2018-12-18T03:18:20Z

Flannel version: v0.10.0
Backend used (e.g. vxlan or udp): vxlan
Etcd version: 3.2.24
Kubernetes version (if used): 1.13.1
Operating System and version: Debian 9.6 on Raspberry Pi 3 B

jmeridth · 2018-12-27T16:48:32Z

same setup as @b3nw. Only difference is I'm using HypriotOS 1.9.0 on Raspberry Pi 3 B+

mr-sour · 2019-01-17T00:05:54Z

Got this same thing on hypriotos 1.10.0-rc2 on a raspberry Pi 3 B+

bobmhong · 2019-01-20T14:55:16Z

Got this same thing on hypriotos 1.10.0-rc2 on a raspberry Pi 3 B+

@mr-sour My config is same as yours and have same the results. I'm glad at least that sudo ip link delete flannel.1 on the failing node allows the pod to recreate successfully after deleting the failing pod.

dippynark · 2019-08-01T00:04:42Z

seeing the same issue with flannel v0.11.0

uname -a: Linux pirate1 4.19.58-v7+ #1245 SMP Fri Jul 12 17:25:51 BST 2019 armv7l GNU/Linux

markus-seidl · 2019-09-29T14:33:04Z

Same issue from applying

https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

on a small bramble (two RPi 4s). I noticed that there are two pods, after deleting the flannel.1 network, the first one starts without problems, the second one is entering a crash loop (with the same error).

Has someone else two "kube-flannel-ds-arm-xxxx" pods? Maybe that's the problem?

alexandrujuncu · 2019-10-06T21:56:00Z

Still an issue with Hypriot v1.11.1 + K8s 1.16.1 + Flannel 0.11.0

mkuchenbecker · 2020-02-18T00:07:07Z

@markus-seidl I can confirm, have two.

pi@raspi-0:~ $ k get pods --all-namespaces
NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE
kube-system   coredns-5644d7b6d9-gtj6h          1/1     Running   0          10m
kube-system   coredns-5644d7b6d9-lx59g          1/1     Running   0          10m
kube-system   etcd-raspi-0                      1/1     Running   0          10m
kube-system   kube-apiserver-raspi-0            1/1     Running   0          10m
kube-system   kube-controller-manager-raspi-0   1/1     Running   0          10m
kube-system   kube-flannel-ds-arm-5brn7         1/1     Running   0          10m
kube-system   kube-flannel-ds-arm-c4hbv         0/1     Error     14         5m41s
kube-system   kube-proxy-htk7f                  1/1     Running   0          10m
kube-system   kube-proxy-kcql2                  1/1     Running   0          5m41s
kube-system   kube-scheduler-raspi-0            1/1     Running   0          10m

kluzny · 2020-05-27T05:49:34Z

@markus-seidl @mkuchenbecker It is normal to have an instance on each node.

kyle@noobuntu:~/Development/raspberry_patch$ kubectl get pods -o wide -n kube-system | grep flannel
kube-flannel-ds-arm-6lgx4       0/1     CrashLoopBackOff   5          4m49s   192.168.1.101   alpha   <none>           <none>
kube-flannel-ds-arm-nrrk7       0/1     CrashLoopBackOff   5          4m43s   192.168.1.102   beta    <none>           <none>
kube-flannel-ds-arm-nv8zx       0/1     CrashLoopBackOff   5          4m30s   192.168.1.104   delta   <none>           <none>
kube-flannel-ds-arm-rfkft       0/1     CrashLoopBackOff   5          4m57s   192.168.1.103   gamma   <none>           <none>

Manually deleting the link on the node and deleting the pod, as other have suggested, seems to be the resolution.

pikomen · 2020-08-02T12:42:27Z

same issue on k8s v1.16.11 ,flannel:v0.11.0
i have 2 clusters : first on vmware and second on Nutanix-AHV
the installation of k8s cluster performed by Kubespray in the same way.
on vmware cluster i can delete flannel pods without any problems but on Nutanix i see CrashLoopBackOff on some nodes, the solution is to delete the flannel.1 (just a temporary patch).

I0802 12:34:12.140814 1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E0802 12:34:12.141386 1 main.go:289] Error registering network: failed to configure interface flannel.1: failed to ensure address of interface flannel.1: link has incompatible addresses. Remove additional addresses and try again. &netlink.Vxlan{LinkAttrs:netlink.LinkAttrs{Index:25, MTU:1450, TxQLen:0, Name:"flannel.1", HardwareAddr:net.HardwareAddr{0xe6, 0x96, 0xa6, 0x84, 0x4b, 0x66}, Flags:0x13, RawFlags:0x11043, ParentIndex:0, MasterIndex:0, Namespace:interface {}(nil), Alias:"", Statistics:(*netlink.LinkStatistics)(0xc4201c50f4), Promisc:0, Xdp:(*netlink.LinkXdp)(0xc42042a360), EncapType:"ether", Protinfo:(*netlink.Protinfo)(nil), OperState:0x0}, VxlanId:1, VtepDevIndex:2, SrcAddr:net.IP{0xa, 0x35, 0xa2, 0x65}, Group:net.IP(nil), TTL:0, TOS:0, Learning:false, Proxy:false, RSC:false, L2miss:false, L3miss:false, UDPCSum:true, NoAge:false, GBP:false, Age:300, Limit:0, Port:8472, PortLow:0, PortHigh:0}
I0802 12:34:12.141422 1 main.go:366] Stopping shutdownHandler...

VUZhuangweiKang · 2020-11-17T00:34:31Z

I got the same issue, any progress on resolving this problem.

Letme · 2020-12-04T23:16:20Z

I had same issue when power went down and then tried to get nodes back... The trick with removing the link actually helps.

Currently flannel interface ip addresses are checked on startup when using vxlan and ipip backends. If multiple addresses are found, startup fails fatally. If only one address is found and is not the currently leased one, it will be assumed that it comes from a previous lease and be removed. This criteria seems arbitrary both in how it is done and in its timing. It may cause failures in situations where it might not be strictly necessary like for example if the node is running a dhcp client that is assigning link local addresses to all interfaces. It also might fail at flannel unexpected restarts which are completly unrelated to the external event that caused the unexpected modification in the flannel interface. This patch proposes to concern and check only ip address within the flannel network and takes the simple approach to ignore any other ip addresses assuming these would pose no problem on flannel operation. A discarded but more agressive alternative would be to remove all addresses that are not the currently leased one. Fixes flannel-io#1060 Signed-off-by: Jaime Caamaño Ruiz <[email protected]>

Currently flannel interface ip addresses are checked on startup when using vxlan and ipip backends. If multiple addresses are found, startup fails fatally. If only one address is found and is not the currently leased one, it will be assumed that it comes from a previous lease and be removed. This criteria seems arbitrary both in how it is done and in its timing. It may cause failures in situations where it might not be strictly necessary like for example if the node is running a dhcp client that is assigning link local addresses to all interfaces. It also might fail at flannel unexpected restarts which are completly unrelated to the external event that caused the unexpected modification in the flannel interface. This patch proposes to concern and check only ip address within the flannel network and takes the simple approach to ignore any other ip addresses assuming these would pose no problem on flannel operation. A discarded but more agressive alternative would be to remove all addresses that are not the currently leased one. Fixes flannel-io#1060 Signed-off-by: Jaime Caamaño Ruiz <[email protected]> (cherry picked from commit 33a2fac)

* vxlan: Generate MAC address before creating a link systemd 242+ assigns MAC addresses for all virtual devices which don't have the address assigned already. That resulted in systemd overriding MAC addresses of flannel.* interfaces. The fix which prevents systemd from setting the address is to define the concrete MAC address when creating the link. Fixes: flannel-io#1155 Ref: k3s-io/k3s#4188 Signed-off-by: Michal Rostecki <[email protected]> (cherry picked from commit 0198d5d) * Concern only about flannel ip addresses Currently flannel interface ip addresses are checked on startup when using vxlan and ipip backends. If multiple addresses are found, startup fails fatally. If only one address is found and is not the currently leased one, it will be assumed that it comes from a previous lease and be removed. This criteria seems arbitrary both in how it is done and in its timing. It may cause failures in situations where it might not be strictly necessary like for example if the node is running a dhcp client that is assigning link local addresses to all interfaces. It also might fail at flannel unexpected restarts which are completly unrelated to the external event that caused the unexpected modification in the flannel interface. This patch proposes to concern and check only ip address within the flannel network and takes the simple approach to ignore any other ip addresses assuming these would pose no problem on flannel operation. A discarded but more agressive alternative would be to remove all addresses that are not the currently leased one. Fixes flannel-io#1060 Signed-off-by: Jaime Caamaño Ruiz <[email protected]> (cherry picked from commit 33a2fac) * Fix flannel hang if lease expired (cherry picked from commit 78035d0) * subnets: move forward the cursor to skip illegal subnet This PR fixs an issue when flannel gets illegal subnet event in watching leases, it doesn't move forward the etcd cursor and will stuck in the same invalid event forever. (cherry picked from commit 1a1b6f1) * fix cherry-pick glitches and test failures * disable udp backend tests since we don't actually have the udp backend in our fork Co-authored-by: Michal Rostecki <[email protected]> Co-authored-by: Jaime Caamaño Ruiz <[email protected]> Co-authored-by: Chun Chen <[email protected]> Co-authored-by: huangxuesen <[email protected]>

jcaamano mentioned this issue Jan 20, 2021

Concern only about flannel ip addresses #1401

Merged

luthermonson closed this as completed in #1401 Mar 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"link has incompatible addresses" after restarting Flannel k8s pod #1060

"link has incompatible addresses" after restarting Flannel k8s pod #1060

ljfranklin commented Nov 3, 2018

b3nw commented Dec 18, 2018

jmeridth commented Dec 27, 2018

mr-sour commented Jan 17, 2019

bobmhong commented Jan 20, 2019

dippynark commented Aug 1, 2019

markus-seidl commented Sep 29, 2019

alexandrujuncu commented Oct 6, 2019

mkuchenbecker commented Feb 18, 2020

kluzny commented May 27, 2020

pikomen commented Aug 2, 2020

VUZhuangweiKang commented Nov 17, 2020

Letme commented Dec 4, 2020

"link has incompatible addresses" after restarting Flannel k8s pod #1060

"link has incompatible addresses" after restarting Flannel k8s pod #1060

Comments

ljfranklin commented Nov 3, 2018

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

b3nw commented Dec 18, 2018

jmeridth commented Dec 27, 2018

mr-sour commented Jan 17, 2019

bobmhong commented Jan 20, 2019

dippynark commented Aug 1, 2019

markus-seidl commented Sep 29, 2019

alexandrujuncu commented Oct 6, 2019

mkuchenbecker commented Feb 18, 2020

kluzny commented May 27, 2020

pikomen commented Aug 2, 2020

VUZhuangweiKang commented Nov 17, 2020

Letme commented Dec 4, 2020