Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubespan and Cilium compatiblity: etcd is failing #4836

Closed
ghost opened this issue Jan 21, 2022 · 13 comments · Fixed by #5945
Closed

Kubespan and Cilium compatiblity: etcd is failing #4836

ghost opened this issue Jan 21, 2022 · 13 comments · Fixed by #5945
Assignees

Comments

@ghost
Copy link

ghost commented Jan 21, 2022

Bug Report

Control plane machine config - https://paste.opendev.org/show/812287/
Worker node machine config - https://paste.opendev.org/show/812288/

Step by step installation without installing CNI (name set to none) and etcd is not failing either - https://paste.opendev.org/show/812289/

Then we proceed with Cilium installation -

export KUBERNETES_API_SERVER_ADDRESS=172.16.1.80
export KUBERNETES_API_SERVER_PORT=6443

helm install cilium cilium/cilium \
    --version 1.11.0 \
    --namespace kube-system \
    --set kubeProxyReplacement=strict \
    --set k8sServiceHost="${KUBERNETES_API_SERVER_ADDRESS}" \
    --set k8sServicePort="${KUBERNETES_API_SERVER_PORT}"

After couple of mins, etcd is failing:

k get po -A
NAMESPACE     NAME                                                             READY   STATUS     RESTARTS        AGE
kube-system   cilium-69vnw                                                     0/1     Init:0/2   0               79s
kube-system   cilium-9v6k7                                                     0/1     Init:0/2   0               79s
kube-system   cilium-hh952                                                     0/1     Pending    0               79s
kube-system   cilium-ll6qv                                                     0/1     Init:0/2   0               79s
kube-system   cilium-operator-746ffcc976-hgwqv                                 1/1     Running    0               77s
kube-system   cilium-operator-746ffcc976-m84c6                                 1/1     Running    0               77s
kube-system   cilium-stdj6                                                     0/1     Init:0/2   0               79s
kube-system   cilium-t2v6m                                                     0/1     Init:0/2   0               79s
kube-system   coredns-576cdb9d86-7xkmj                                         0/1     Pending    0               16m
kube-system   coredns-576cdb9d86-lzps6                                         0/1     Pending    0               16m
kube-system   kube-apiserver-talos-172-16-1-198                                1/1     Running    0               16m
kube-system   kube-apiserver-talos-fd804d6c22904e02244ac0fffe3ce9e8            1/1     Running    1 (3m25s ago)   14m
kube-system   kube-apiserver-talos-fd804d6c22904e027824d3fffe268e1c            1/1     Running    1 (3m54s ago)   13m
kube-system   kube-controller-manager-talos-172-16-1-198                       1/1     Running    2 (17m ago)     15m
kube-system   kube-controller-manager-talos-fd804d6c22904e02244ac0fffe3ce9e8   1/1     Running    4 (2m52s ago)   14m
kube-system   kube-controller-manager-talos-fd804d6c22904e027824d3fffe268e1c   1/1     Running    3 (2m25s ago)   13m
kube-system   kube-scheduler-talos-172-16-1-198                                1/1     Running    3 (14m ago)     16m
kube-system   kube-scheduler-talos-fd804d6c22904e02244ac0fffe3ce9e8            1/1     Running    3 (3m4s ago)    14m
kube-system   kube-scheduler-talos-fd804d6c22904e027824d3fffe268e1c            1/1     Running    4 (2m25s ago)   13m

k get po -A
Error from server: etcdserver: request timed out

k get po -A
Error from server: etcdserver: request timed out

k get po -A
Unable to connect to the server: dial tcp 172.16.1.80:6443: connect: no route to host

Logs

Other logs - https://paste.opendev.org/show/812290/

Master logs are attached to this Issue.
master-1.zip
master-2.zip
master-3.zip

Description

Cilium is working fine as CNI when you have talos cluster without Kubespan enabled. Once you enabled Kubespan and tries to install Cilium, masters etcd start failing. Likewise, cilium agents are not starting either. This is confirmed that without Kubespan, Cilium is working fine.

Environment

Bare metal / VMs

  • Talos version: v0.14
  • Kubernetes version: v1.22.3
  • Platform: Proxmox
@sauterp
Copy link
Contributor

sauterp commented Jun 27, 2022

Hi @Born2Bake ,
thanks for the detailed report!

We tried to replicate this issue in a qemu cluster with the latest version of Talos(1.1.0) and didn't have the problem you describe. Can we ask you to try this again with the latest version of Talos and if the problem persists, we'd be happy to have a live debugging session with you. We're available in our community slack.

@ghost
Copy link
Author

ghost commented Jun 27, 2022

Hi @Born2Bake , thanks for the detailed report!

We tried to replicate this issue in a qemu cluster with the latest version of Talos(1.1.0) and didn't have the problem you describe. Can we ask you to try this again with the latest version of Talos and if the problem persists, we'd be happy to have a live debugging session with you. We're available in our community slack.

May you try to create cluster with vip enabled? I did not have a lot of time for testing but I just spin up a new cluster with kubespan enabled and talos updated to 1.1.0 and cluster creation is still failing similar way:

bn@DESKTOP-IQ0K33P:~/test$ talosctl get links -n 192.168.1.130 -e 192.168.1.130
NODE            NAMESPACE   TYPE         ID             VERSION   TYPE       KIND        HW ADDR                                           OPER STATE   LINK STATE
192.168.1.130   network     LinkStatus   bond0          1         ether      bond        aa:b7:3b:58:9d:74                                 down         false
192.168.1.130   network     LinkStatus   cilium_host    2         ether      veth        06:59:38:90:06:59                                 up           true
192.168.1.130   network     LinkStatus   cilium_net     2         ether      veth        8e:dd:d6:45:18:79                                 up           true
192.168.1.130   network     LinkStatus   cilium_vxlan   2         ether      vxlan       f2:7f:43:ed:33:66                                 unknown      true
192.168.1.130   network     LinkStatus   dummy0         1         ether      dummy       06:75:94:03:a8:cd                                 down         false
192.168.1.130   network     LinkStatus   eth0           2         ether                  9e:d0:00:82:29:e4                                 up           true
192.168.1.130   network     LinkStatus   ip6tnl0        1         tunnel6    ip6tnl      00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00   down         false
192.168.1.130   network     LinkStatus   kubespan       7         nohdr      wireguard                                                     unknown      false
192.168.1.130   network     LinkStatus   lo             2         loopback               00:00:00:00:00:00                                 unknown      true
192.168.1.130   network     LinkStatus   lxc_health     2         ether      veth        b2:23:0a:15:54:18                                 up           true
192.168.1.130   network     LinkStatus   sit0           1         sit        sit         00:00:00:00                                       down         false
192.168.1.130   network     LinkStatus   teql0          1         void                                                                     down         false
192.168.1.130   network     LinkStatus   tunl0          1         ipip       ipip        00:00:00:00                                       down         false
bn@DESKTOP-IQ0K33P:~/test$ talosctl get addresses -n 192.168.1.130 -e 192.168.1.130
NODE            NAMESPACE   TYPE            ID                                                  VERSION   ADDRESS                                    LINK
192.168.1.130   network     AddressStatus   cilium_host/10.21.0.60/32                           1         10.21.0.60/32                              cilium_host
192.168.1.130   network     AddressStatus   cilium_host/fe80::459:38ff:fe90:659/64              1         fe80::459:38ff:fe90:659/64                 cilium_host
192.168.1.130   network     AddressStatus   cilium_net/fe80::8cdd:d6ff:fe45:1879/64             1         fe80::8cdd:d6ff:fe45:1879/64               cilium_net
192.168.1.130   network     AddressStatus   cilium_vxlan/fe80::f07f:43ff:feed:3366/64           2         fe80::f07f:43ff:feed:3366/64               cilium_vxlan
192.168.1.130   network     AddressStatus   eth0/192.168.1.130/24                               1         192.168.1.130/24                           eth0
192.168.1.130   network     AddressStatus   eth0/fe80::9cd0:ff:fe82:29e4/64                     2         fe80::9cd0:ff:fe82:29e4/64                 eth0
192.168.1.130   network     AddressStatus   kubespan/fd5c:2e5e:f0d6:3102:9cd0:ff:fe82:29e4/64   1         fd5c:2e5e:f0d6:3102:9cd0:ff:fe82:29e4/64   kubespan
192.168.1.130   network     AddressStatus   lo/127.0.0.1/8                                      1         127.0.0.1/8                                lo
192.168.1.130   network     AddressStatus   lo/::1/128                                          1         ::1/128                                    lo
192.168.1.130   network     AddressStatus   lxc_health/fe80::b023:aff:fe15:5418/64              2         fe80::b023:aff:fe15:5418/64                lxc_health
bn@DESKTOP-IQ0K33P:~/test$ talosctl service -n 192.168.1.130 -e 192.168.1.130
NODE            SERVICE      STATE     HEALTH   LAST CHANGE   LAST EVENT
192.168.1.130   apid         Running   OK       6m58s ago     Health check successful
192.168.1.130   containerd   Running   OK       7m4s ago      Health check successful
192.168.1.130   cri          Running   OK       6m52s ago     Health check successful
192.168.1.130   etcd         Running   Fail     2m56s ago     Health check failed: context deadline exceeded
192.168.1.130   kubelet      Running   OK       6m25s ago     Health check successful
192.168.1.130   machined     Running   ?        7m22s ago     Service started as goroutine
192.168.1.130   trustd       Running   OK       6m52s ago     Health check successful
192.168.1.130   udevd        Running   OK       6m53s ago     Health check successful
bn@DESKTOP-IQ0K33P:~/test$ talosctl get kubespanpeerstatuses -n 192.168.1.130 -e 192.168.1.130
NODE            NAMESPACE   TYPE                 ID                                             VERSION   LABEL                 ENDPOINT              STATE   RX        TX
192.168.1.130   kubespan    KubeSpanPeerStatus   0G5mQxFPKAYUO94R4MUOncqjEQNC2gIyDtwykbVYkQc=   3         talos-192-168-1-137   192.168.1.137:51820   down    888       996
192.168.1.130   kubespan    KubeSpanPeerStatus   D478jsbhHTghgZw1Iuwava6VS8W/tpRXI1qWbJXzgy4=   10        talos-192-168-1-136   192.168.1.136:51820   down    2960      3320
192.168.1.130   kubespan    KubeSpanPeerStatus   JvL8L8SJkuZJDzHfZBRXQuzs0dUKsfHlQ4oiiobrzB4=   17        master-3              192.168.1.132:51820   up      1376328   1479768500
192.168.1.130   kubespan    KubeSpanPeerStatus   kQl1L8NuA5ONhnDQYmw57BFlGX2oLB+aRTCjJxsq2wI=   14        talos-192-168-1-135   192.168.1.135:51820   up      12188     1283224
192.168.1.130   kubespan    KubeSpanPeerStatus   qryUznf+zNiu4xZwyttOgt2i/CBAwWf5I/8xkUjSnxs=   7         talos-192-168-1-134   192.168.1.134:51820   down    1924      2528
bn@DESKTOP-IQ0K33P:~/test$ talosctl get kubespanpeerstatuses -n 192.168.1.130 -e 192.168.1.130
NODE            NAMESPACE   TYPE                 ID                                             VERSION   LABEL                 ENDPOINT              STATE   RX        TX
192.168.1.130   kubespan    KubeSpanPeerStatus   0G5mQxFPKAYUO94R4MUOncqjEQNC2gIyDtwykbVYkQc=   3         talos-192-168-1-137   192.168.1.137:51820   down    888       996
192.168.1.130   kubespan    KubeSpanPeerStatus   D478jsbhHTghgZw1Iuwava6VS8W/tpRXI1qWbJXzgy4=   10        talos-192-168-1-136   192.168.1.136:51820   down    2960      3320
192.168.1.130   kubespan    KubeSpanPeerStatus   JvL8L8SJkuZJDzHfZBRXQuzs0dUKsfHlQ4oiiobrzB4=   17        master-3              192.168.1.132:51820   up      1376328   1479768500
192.168.1.130   kubespan    KubeSpanPeerStatus   kQl1L8NuA5ONhnDQYmw57BFlGX2oLB+aRTCjJxsq2wI=   14        talos-192-168-1-135   192.168.1.135:51820   up      12188     1283224
192.168.1.130   kubespan    KubeSpanPeerStatus   qryUznf+zNiu4xZwyttOgt2i/CBAwWf5I/8xkUjSnxs=   7         talos-192-168-1-134   192.168.1.134:51820   down    1924      2528

DMESG of master-1 - https://paste.opendev.org/show/bpUdwOTrDOZ7RrG84POi/
Control plane talos config and cilium config - https://paste.opendev.org/show/bKPfcIOCp7Uk79JwcQlF/

@ghost
Copy link
Author

ghost commented Jun 28, 2022

I have tested it without vip enabled and got the same problem with etcd.

bn@DESKTOP-IQ0K33P:~$ talosctl service -n 192.168.1.130 -e 192.168.1.130
NODE            SERVICE      STATE     HEALTH   LAST CHANGE   LAST EVENT
192.168.1.130   apid         Running   OK       3m16s ago     Health check successful
192.168.1.130   containerd   Running   OK       3m22s ago     Health check successful
192.168.1.130   cri          Running   OK       3m11s ago     Health check successful
192.168.1.130   etcd         Running   OK       2m24s ago     Health check successful
192.168.1.130   kubelet      Running   OK       2m45s ago     Health check successful
192.168.1.130   machined     Running   ?        3m35s ago     Service started as goroutine
192.168.1.130   trustd       Running   OK       3m11s ago     Health check successful
192.168.1.130   udevd        Running   OK       3m12s ago     Health check successful
bn@DESKTOP-IQ0K33P:~$ talosctl get links -n 192.168.1.130 -e 192.168.1.130
NODE            NAMESPACE   TYPE         ID                VERSION   TYPE       KIND        HW ADDR                                           OPER STATE   LINK STATE
192.168.1.130   network     LinkStatus   bond0             1         ether      bond        da:8f:13:20:e7:2b                                 down         false
192.168.1.130   network     LinkStatus   cilium_host       2         ether      veth        d6:55:56:79:b5:ae                                 up           true
192.168.1.130   network     LinkStatus   cilium_net        2         ether      veth        6e:e3:13:09:cd:9b                                 up           true
192.168.1.130   network     LinkStatus   cilium_vxlan      2         ether      vxlan       0e:4a:72:c5:9e:18                                 unknown      true
192.168.1.130   network     LinkStatus   dummy0            1         ether      dummy       7a:6c:86:48:c8:84                                 down         false
192.168.1.130   network     LinkStatus   eth0              2         ether                  1a:be:22:3a:17:71                                 up           true
192.168.1.130   network     LinkStatus   ip6tnl0           1         tunnel6    ip6tnl      00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00   down         false
192.168.1.130   network     LinkStatus   kubespan          5         nohdr      wireguard                                                     unknown      false
192.168.1.130   network     LinkStatus   lo                1         loopback               00:00:00:00:00:00                                 unknown      true
192.168.1.130   network     LinkStatus   lxc9664702786c1   2         ether      veth        4a:f7:af:5d:84:94                                 up           true
192.168.1.130   network     LinkStatus   lxc_health        2         ether      veth        56:9d:a5:56:f2:24                                 up           true
192.168.1.130   network     LinkStatus   lxcbeeb839f25e9   2         ether      veth        8a:4e:90:03:6d:cc                                 up           true
192.168.1.130   network     LinkStatus   lxcfa82b568feba   3         ether      veth        86:5e:34:16:77:b1                                 up           true
192.168.1.130   network     LinkStatus   sit0              1         sit        sit         00:00:00:00                                       down         false
192.168.1.130   network     LinkStatus   teql0             1         void                                                                     down         false
192.168.1.130   network     LinkStatus   tunl0             1         ipip       ipip        00:00:00:00                                       down         false

192.168.1.130: user: warning: [2022-06-28T08:12:56.112129679Z]: [talos] task updateBootloader (1/1): done, 40.811566ms
192.168.1.130: user: warning: [2022-06-28T08:12:56.114331679Z]: [talos] phase bootloader (19/19): done, 45.131161ms
192.168.1.130: user: warning: [2022-06-28T08:12:56.116303679Z]: [talos] boot sequence: done: 3m8.924443116s
192.168.1.130: kern:    info: [2022-06-28T08:12:58.401451679Z]: IPv6: ADDRCONF(NETDEV_CHANGE): lxc_health: link becomes ready
192.168.1.130: kern:    info: [2022-06-28T08:12:58.702931679Z]: eth0: renamed from tmpaf272
192.168.1.130: kern:    info: [2022-06-28T08:12:58.722401679Z]: IPv6: ADDRCONF(NETDEV_CHANGE): lxcbeeb839f25e9: link becomes ready
192.168.1.130: kern:    info: [2022-06-28T08:12:58.795152679Z]: eth0: renamed from tmp17410
192.168.1.130: kern:    info: [2022-06-28T08:12:58.810886679Z]: IPv6: ADDRCONF(NETDEV_CHANGE): lxc9664702786c1: link becomes ready
192.168.1.130: kern:    info: [2022-06-28T08:12:59.014542679Z]: eth0: renamed from tmp35a1a
192.168.1.130: kern:    info: [2022-06-28T08:12:59.032927679Z]: IPv6: ADDRCONF(NETDEV_CHANGE): lxcfa82b568feba: link becomes ready
192.168.1.130: user: warning: [2022-06-28T08:13:08.011634679Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: Get \x5c"https://127.0.0.1:10250/pods/?timeout=30s\x5c": remote error: tls: internal error"}
192.168.1.130: user: warning: [2022-06-28T08:13:28.902243679Z]: [talos] service[etcd](Running): Health check failed: context deadline exceeded

bn@DESKTOP-IQ0K33P:~$ talosctl service -n 192.168.1.130 -e 192.168.1.130
NODE            SERVICE      STATE     HEALTH   LAST CHANGE   LAST EVENT
192.168.1.130   apid         Running   OK       4m16s ago     Health check successful
192.168.1.130   containerd   Running   OK       4m22s ago     Health check successful
192.168.1.130   cri          Running   OK       4m10s ago     Health check successful
192.168.1.130   etcd         Running   Fail     53s ago       Health check failed: context deadline exceeded
192.168.1.130   kubelet      Running   OK       3m45s ago     Health check successful
192.168.1.130   machined     Running   ?        4m35s ago     Service started as goroutine
192.168.1.130   trustd       Running   OK       4m10s ago     Health check successful
192.168.1.130   udevd        Running   OK       4m12s ago     Health check successful

bn@DESKTOP-IQ0K33P:~$ talosctl get links -n 192.168.1.130 -e 192.168.1.130
NODE            NAMESPACE   TYPE         ID                VERSION   TYPE       KIND        HW ADDR                                           OPER STATE   LINK STATE
192.168.1.130   network     LinkStatus   bond0             1         ether      bond        da:8f:13:20:e7:2b                                 down         false
192.168.1.130   network     LinkStatus   cilium_host       2         ether      veth        d6:55:56:79:b5:ae                                 up           true
192.168.1.130   network     LinkStatus   cilium_net        2         ether      veth        6e:e3:13:09:cd:9b                                 up           true
192.168.1.130   network     LinkStatus   cilium_vxlan      2         ether      vxlan       0e:4a:72:c5:9e:18                                 unknown      true
192.168.1.130   network     LinkStatus   dummy0            1         ether      dummy       7a:6c:86:48:c8:84                                 down         false
192.168.1.130   network     LinkStatus   eth0              2         ether                  1a:be:22:3a:17:71                                 up           true
192.168.1.130   network     LinkStatus   ip6tnl0           1         tunnel6    ip6tnl      00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00   down         false
192.168.1.130   network     LinkStatus   kubespan          6         nohdr      wireguard                                                     unknown      false
192.168.1.130   network     LinkStatus   lo                1         loopback               00:00:00:00:00:00                                 unknown      true
192.168.1.130   network     LinkStatus   lxc9b3fee020b0a   2         ether      veth        ce:e5:df:0a:0e:92                                 up           true
192.168.1.130   network     LinkStatus   lxc_health        2         ether      veth        56:9d:a5:56:f2:24                                 up           true
192.168.1.130   network     LinkStatus   lxcbdbd6b5f6f69   2         ether      veth        5e:94:a6:f8:79:91                                 up           true
192.168.1.130   network     LinkStatus   lxcd8f5571edd38   2         ether      veth        9e:32:98:ca:4e:ac                                 up           true
192.168.1.130   network     LinkStatus   sit0              1         sit        sit         00:00:00:00                                       down         false
192.168.1.130   network     LinkStatus   teql0             1         void                                                                     down         false
192.168.1.130   network     LinkStatus   tunl0             1         ipip       ipip        00:00:00:00                                       down         false
bn@DESKTOP-IQ0K33P:~$ talosctl get kubespanpeerstatuses -n 192.168.1.130 -e 192.168.1.130
NODE            NAMESPACE   TYPE                 ID                                             VERSION   LABEL                 ENDPOINT              STATE   RX        TX
192.168.1.130   kubespan    KubeSpanPeerStatus   9ejFziqqvxQLUa9FRc0ypqtqRwgIXv1/Nwyi7M9ldmc=   11        master-2              192.168.1.131:51820   up      2159828   632106500
192.168.1.130   kubespan    KubeSpanPeerStatus   L5tgiicsrHrHEGZgDfwVPj+quiUf3WlpQAQdOljFBkw=   3         talos-192-168-1-135   192.168.1.135:51820   down    1332      1864
192.168.1.130   kubespan    KubeSpanPeerStatus   OTLkf5K4nyBm2JSHaarZ40rLxM6Gs1dTJ32gJjp+nCg=   14        master-3              192.168.1.132:51820   up      2691668   850649576
192.168.1.130   kubespan    KubeSpanPeerStatus   dKeOAxBY3mXeHn0T6WB8UmlxhctQkwa2XKDe8mU5Z0s=   6         talos-192-168-1-134   192.168.1.134:51820   up      3828      50560

bn@DESKTOP-IQ0K33P:~$ talosctl get kubespanendpoints -n 192.168.1.130 -e 192.168.1.130
NODE            NAMESPACE   TYPE               ID                                             VERSION   ENDPOINT              AFFILIATE ID
192.168.1.130   kubespan    KubeSpanEndpoint   9ejFziqqvxQLUa9FRc0ypqtqRwgIXv1/Nwyi7M9ldmc=   1         192.168.1.131:51820   qX7lzh06CC6K3XvOoceokJh6JkQw0xy4qHU0PMpN3WA
192.168.1.130   kubespan    KubeSpanEndpoint   OTLkf5K4nyBm2JSHaarZ40rLxM6Gs1dTJ32gJjp+nCg=   1         192.168.1.132:51820   RVULuYgB04tcDdRPrIXaY2H7vpK3bQHz5GxJQ5eZvmf
192.168.1.130   kubespan    KubeSpanEndpoint   dKeOAxBY3mXeHn0T6WB8UmlxhctQkwa2XKDe8mU5Z0s=   1         192.168.1.134:51820   UkXHZ8kzh2s7l068R2KIbQtj9kxtD1AEb3w9jVdwfkC

@smira
Copy link
Member

smira commented Jun 29, 2022

@sauterp @Ulexus this is still an issue with Cilum, reproducing in QEMU:

$ sudo -E talosctl cluster create ... --config-patch '[{"op":"replace", "path":"/cluster/network/cni", "value": {"name":"none"}}]' --with-kubespan
# (while above is running)
$ talosctl -n 172.20.0.2 kubeconfig -f
$ helm install cilium cilium/cilium \                                      
                                               --version 1.11.0 \
                                               --namespace kube-system \
                                               --set kubeProxyReplacement=strict \
                                               --set k8sServiceHost="172.20.0.1" \
                                               --set k8sServicePort="6443"
# once Cilium is up, networking is all gone

@Ulexus
Copy link
Contributor

Ulexus commented Jun 29, 2022

Some progress: first, I can definitely reproduce.

It seems if kubespan is enabled after the Cilium agents come up, things work.

  • disable kubespan before a reboot, reenable after Cilium is back up: works
  • install Talos then enable kubespan: works (however, reboot will break it if it isn't disabled again first)

In the failed state, at least the first time (and some other circumstances), Kubespan is up and running, all of the nftables rulesets look fine, and all of the route rules look fine... but Cilium intercepts and drops host-to-host traffic which would be flowing over kubespan.

Once a failure occurs, it is insufficient to merely disable kubespan. The nodes must then be rebooted: all the peers, it seems, because the problem is not on the receiving side, but on the sending side (needs more investigation).
If this is not done, then KubeSpan itself cannot reestablish the link, because Cilium intercepts and blocks even that.

It is also worth noting that I have seen this problem of Cilium interdicting and dropping host traffic before: cilium/cilium#11263 . In that case, the issue was with handling of certain host-level IPv6 traffic. It does not appear to be related except in as much as the host's traffic is dropped.

@smira smira assigned smira and sauterp and unassigned Ulexus Jul 11, 2022
@sauterp
Copy link
Contributor

sauterp commented Jul 12, 2022

I ran another experiment today, where I installed cilium with --set hostServices.enabled=true. First etcd failed too, but then after 8 minutes it became healthy and stayed that way for at least 20 minutes. No idea wether that's connected.

sudo --preserve-env=HOME ~/bin/talosctl cluster create --config-patch '[{"op":"replace", "path":"/cluster/network/cni", "value": {"name":"none"}}]' --with-kubespan --provisioner=qemu --masters 3 --workers 2     --cidr=172.20.0.0/24 \
--registry-mirror docker.io=http://172.20.0.1:5000 \
--registry-mirror k8s.gcr.io=http://172.20.0.1:5001  \
--registry-mirror quay.io=http://172.20.0.1:5002 \
--registry-mirror gcr.io=http://172.20.0.1:5003 \
--registry-mirror ghcr.io=http://172.20.0.1:5004 \
--registry-mirror 127.0.0.1:5005=http://172.20.0.1:5005

talosctl -n 172.20.0.2 kubeconfig -f

helm install cilium cilium/cilium \
--version 1.11.0 \
--namespace kube-system \
--set kubeProxyReplacement=strict \
--set k8sServiceHost="172.20.0.1" \
--set k8sServicePort="6443" \
--set hostServices.enabled=true
sauterp@robert:~/s/g/s/t/talos|master⚡?
➤ talosctl service -n 172.20.0.2 -e 172.20.0.2
NODE         SERVICE      STATE     HEALTH   LAST CHANGE   LAST EVENT
172.20.0.2   apid         Running   OK       10m56s ago    Health check successful
172.20.0.2   containerd   Running   OK       8m57s ago     Health check successful
172.20.0.2   cri          Running   OK       11m0s ago     Health check successful
172.20.0.2   etcd         Running   Fail     8m48s ago     Health check failed: context deadline exceeded
172.20.0.2   kubelet      Running   OK       7m53s ago     Health check successful
172.20.0.2   machined     Running   ?        11m10s ago    Service started as goroutine
172.20.0.2   trustd       Running   OK       11m0s ago     Health check successful
172.20.0.2   udevd        Running   OK       11m1s ago     Health check successful
sauterp@robert:~/s/g/s/t/talos|master⚡?
➤ talosctl get kubespanidentities -n 172.20.0.2 -e 172.20.0.2
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.20.0.2:50000: connect: connection refused"
sauterp@robert:~/s/g/s/t/talos|master⚡?
➤ talosctl get links -n 172.20.0.2 -e 172.20.0.2
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.20.0.2:50000: connect: connection refused"
sauterp@robert:~/s/g/s/t/talos|master⚡?
➤ talosctl get addresses -n 172.20.0.2 -e 172.20.0.2
NODE         NAMESPACE   TYPE            ID                                                    VERSION   ADDRESS                                      LINK
172.20.0.2   network     AddressStatus   eth0/172.20.0.2/24                                    1         172.20.0.2/24                                eth0
172.20.0.2   network     AddressStatus   eth0/fe80::a052:9fff:fe38:9752/64                     2         fe80::a052:9fff:fe38:9752/64                 eth0
172.20.0.2   network     AddressStatus   kubespan/fd15:b185:7d9a:7502:a052:9fff:fe38:9752/64   1         fd15:b185:7d9a:7502:a052:9fff:fe38:9752/64   kubespan
172.20.0.2   network     AddressStatus   lo/127.0.0.1/8                                        1         127.0.0.1/8                                  lo
172.20.0.2   network     AddressStatus   lo/::1/128                                            1         ::1/128                                      lo
sauterp@robert:~/s/g/s/t/talos|master⚡?
➤ talosctl get links -n 172.20.0.2 -e 172.20.0.2
NODE         NAMESPACE   TYPE         ID         VERSION   TYPE       KIND        HW ADDR                                           OPER STATE   LINK STATE
172.20.0.2   network     LinkStatus   bond0      1         ether      bond        f6:cb:1c:9e:f5:f9                                 down         false
172.20.0.2   network     LinkStatus   dummy0     1         ether      dummy       9e:e3:03:5b:12:ec                                 down         false
172.20.0.2   network     LinkStatus   eth0       2         ether                  a2:52:9f:38:97:52                                 up           true
172.20.0.2   network     LinkStatus   ip6tnl0    1         tunnel6    ip6tnl      00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00   down         false
172.20.0.2   network     LinkStatus   kubespan   3         nohdr      wireguard                                                     unknown      false
172.20.0.2   network     LinkStatus   lo         2         loopback               00:00:00:00:00:00                                 unknown      true
172.20.0.2   network     LinkStatus   sit0       1         sit        sit         00:00:00:00                                       down         false
172.20.0.2   network     LinkStatus   teql0      1         void                                                                     down         false
172.20.0.2   network     LinkStatus   tunl0      1         ipip       ipip        00:00:00:00                                       down         false
sauterp@robert:~/s/g/s/t/talos|master⚡?
➤ talosctl service -n 172.20.0.2
NODE         SERVICE      STATE     HEALTH   LAST CHANGE   LAST EVENT
172.20.0.2   apid         Running   OK       2m0s ago      Health check successful
172.20.0.2   containerd   Running   OK       2m6s ago      Health check successful
172.20.0.2   cri          Running   OK       2m4s ago      Health check successful
172.20.0.2   etcd         Running   OK       1m58s ago     Health check successful
172.20.0.2   kubelet      Running   OK       1m56s ago     Health check successful
172.20.0.2   machined     Running   ?        2m13s ago     Service started as goroutine
172.20.0.2   trustd       Running   OK       2m4s ago      Health check successful
172.20.0.2   udevd        Running   OK       2m5s ago      Health check successful

talos-default-master-1.log

@smira
Copy link
Member

smira commented Jul 12, 2022

Looking at the system stats, I think there might a routing loop of some sort:

image

@sauterp
Copy link
Contributor

sauterp commented Jul 13, 2022

I'm seeing spikes in cpu usage aswell.

I tried to install cilium with policyAuditMode enabled. My hope was to get into a state where cilium doesn't block any traffic but tells us what it would actually block if it's policies were enforced.

I did this for the installation:

helm install cilium cilium/cilium \
--version 1.11.0 \
--namespace kube-system \
--values - <<EOF
kubeProxyReplacement: strict
k8sServiceHost: "172.20.0.1"
k8sServicePort: 6443
policyEnforcementMode: always    # enforce network policies
policyAuditMode: true            # do not block traffic
externalIPs:
  enabled: true
nodePort:
  enabled: true
hostPort:
  enabled: true
hubble:
  enabled: true
  relay:
    enabled: true
  ui:
    enabled: true
EOF

Being new to helm I'm unsure if this actually does what I think it does, but I get the same behaviour as before, etcd fails then becomes healthy and no network when cilium is up.
If I was indeed able to enable policyAuditMode, that should mean, that the problem is not caused by some policy or other rule by cilium and we should look somewhere else.

@frezbo
Copy link
Member

frezbo commented Jul 13, 2022

By default there are no network policies installed by cilium, so it's not anything to do with network policies.

@smira
Copy link
Member

smira commented Jul 13, 2022

My hypothesis is that Cilium manages to capture the redirected packet going out on the Wireguard interface and re-routes it once again, so it gets capture and re-routed and so on. We need a way to tell Cilium that it should ignore/skip packets going out on kubespan link.

@smira
Copy link
Member

smira commented Jul 18, 2022

Confirmed the problem with Calico as well.

See also original design doc: https://github.com/siderolabs/talos/pull/3577/files#diff-1a5562bea8cb1382b687cf6734e093790f45ddf40ce5cda6bcd83d4cff801663

@sauterp
Copy link
Contributor

sauterp commented Jul 18, 2022

There is an interval, after cilium is installed and the cilium network interfaces come up, where there is enough time to run cilium sysdump --debug before we loose connectivity.
cilium-sysdump-20220719-012518.zip
I think this file contains a lot of interesting information. Among other things, the routing tables and state of all the network interfaces. Maybe it's useful

smira added a commit to smira/talos that referenced this issue Jul 19, 2022
Fixes siderolabs#4836

Firewall mark is `uint32` attached to the packet in the Linux kernel
(it's not transmitted on the wire). This is a shared value for all
networking software, so multiple components might attempt to set and
match on the firewall mark.

Cilium and Calico CNIs are using firewall marks internally, but they
touch only some bits of the firewall mark.

The way KubeSpan was implemented before this PR, it was doing direct
match on the firewall mark, and setting the whole `uint32`, so it comes
into conflict with any other networking component using firewall marks.

The other problem was that firewall mark 0x51820 (0x51821) was too
"wide" touching random bits of the 32-bit value for no good reason.

So this change contains two fixes:

* make firewall mark exactly a single bit (we use bits `0x20` and `0x40`
  now)
* match and mark packets with the mask (don't touch bits outside of the
  mask when setting the mark and ignore bits outside of the mask when
  matching on the mark).

This was tested successfully with both Cilium CNI (default config +
`ipam.mode=kubernetes`) and Calico CNI (default config).

One thing to note is that for KubeSpan and Talos it's important to make
sure that `podSubnets` in the machine config match CNI setting for
`podCIDRs`.

Signed-off-by: Andrey Smirnov <[email protected]>
smira added a commit to smira/talos that referenced this issue Jul 19, 2022
Fixes siderolabs#4836

Firewall mark is `uint32` attached to the packet in the Linux kernel
(it's not transmitted on the wire). This is a shared value for all
networking software, so multiple components might attempt to set and
match on the firewall mark.

Cilium and Calico CNIs are using firewall marks internally, but they
touch only some bits of the firewall mark.

The way KubeSpan was implemented before this PR, it was doing direct
match on the firewall mark, and setting the whole `uint32`, so it comes
into conflict with any other networking component using firewall marks.

The other problem was that firewall mark 0x51820 (0x51821) was too
"wide" touching random bits of the 32-bit value for no good reason.

So this change contains two fixes:

* make firewall mark exactly a single bit (we use bits `0x20` and `0x40`
  now)
* match and mark packets with the mask (don't touch bits outside of the
  mask when setting the mark and ignore bits outside of the mask when
  matching on the mark).

This was tested successfully with both Cilium CNI (default config +
`ipam.mode=kubernetes`) and Calico CNI (default config).

One thing to note is that for KubeSpan and Talos it's important to make
sure that `podSubnets` in the machine config match CNI setting for
`podCIDRs`.

Signed-off-by: Andrey Smirnov <[email protected]>
@smira
Copy link
Member

smira commented Jul 20, 2022

we'll get the fix backported to Talos 1.1.2

smira added a commit to smira/talos that referenced this issue Jul 26, 2022
Fixes siderolabs#4836

Firewall mark is `uint32` attached to the packet in the Linux kernel
(it's not transmitted on the wire). This is a shared value for all
networking software, so multiple components might attempt to set and
match on the firewall mark.

Cilium and Calico CNIs are using firewall marks internally, but they
touch only some bits of the firewall mark.

The way KubeSpan was implemented before this PR, it was doing direct
match on the firewall mark, and setting the whole `uint32`, so it comes
into conflict with any other networking component using firewall marks.

The other problem was that firewall mark 0x51820 (0x51821) was too
"wide" touching random bits of the 32-bit value for no good reason.

So this change contains two fixes:

* make firewall mark exactly a single bit (we use bits `0x20` and `0x40`
  now)
* match and mark packets with the mask (don't touch bits outside of the
  mask when setting the mark and ignore bits outside of the mask when
  matching on the mark).

This was tested successfully with both Cilium CNI (default config +
`ipam.mode=kubernetes`) and Calico CNI (default config).

One thing to note is that for KubeSpan and Talos it's important to make
sure that `podSubnets` in the machine config match CNI setting for
`podCIDRs`.

Signed-off-by: Andrey Smirnov <[email protected]>
(cherry picked from commit 644e803)
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants