-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubespan and Cilium compatiblity: etcd is failing #4836
Comments
Hi @Born2Bake , We tried to replicate this issue in a qemu cluster with the latest version of Talos(1.1.0) and didn't have the problem you describe. Can we ask you to try this again with the latest version of Talos and if the problem persists, we'd be happy to have a live debugging session with you. We're available in our community slack. |
May you try to create cluster with vip enabled? I did not have a lot of time for testing but I just spin up a new cluster with kubespan enabled and talos updated to 1.1.0 and cluster creation is still failing similar way:
DMESG of master-1 - https://paste.opendev.org/show/bpUdwOTrDOZ7RrG84POi/ |
I have tested it without vip enabled and got the same problem with etcd.
|
@sauterp @Ulexus this is still an issue with Cilum, reproducing in QEMU:
|
Some progress: first, I can definitely reproduce. It seems if kubespan is enabled after the Cilium agents come up, things work.
In the failed state, at least the first time (and some other circumstances), Kubespan is up and running, all of the nftables rulesets look fine, and all of the route rules look fine... but Cilium intercepts and drops host-to-host traffic which would be flowing over kubespan. Once a failure occurs, it is insufficient to merely disable kubespan. The nodes must then be rebooted: It is also worth noting that I have seen this problem of Cilium interdicting and dropping host traffic before: cilium/cilium#11263 . In that case, the issue was with handling of certain host-level IPv6 traffic. It does not appear to be related except in as much as the host's traffic is dropped. |
I ran another experiment today, where I installed cilium with
|
I'm seeing spikes in cpu usage aswell. I tried to install cilium with policyAuditMode enabled. My hope was to get into a state where cilium doesn't block any traffic but tells us what it would actually block if it's policies were enforced. I did this for the installation:
Being new to helm I'm unsure if this actually does what I think it does, but I get the same behaviour as before, etcd fails then becomes healthy and no network when cilium is up. |
By default there are no network policies installed by cilium, so it's not anything to do with network policies. |
My hypothesis is that Cilium manages to capture the redirected packet going out on the Wireguard interface and re-routes it once again, so it gets capture and re-routed and so on. We need a way to tell Cilium that it should ignore/skip packets going out on |
Confirmed the problem with Calico as well. See also original design doc: https://github.com/siderolabs/talos/pull/3577/files#diff-1a5562bea8cb1382b687cf6734e093790f45ddf40ce5cda6bcd83d4cff801663 |
There is an interval, after cilium is installed and the cilium network interfaces come up, where there is enough time to run |
Fixes siderolabs#4836 Firewall mark is `uint32` attached to the packet in the Linux kernel (it's not transmitted on the wire). This is a shared value for all networking software, so multiple components might attempt to set and match on the firewall mark. Cilium and Calico CNIs are using firewall marks internally, but they touch only some bits of the firewall mark. The way KubeSpan was implemented before this PR, it was doing direct match on the firewall mark, and setting the whole `uint32`, so it comes into conflict with any other networking component using firewall marks. The other problem was that firewall mark 0x51820 (0x51821) was too "wide" touching random bits of the 32-bit value for no good reason. So this change contains two fixes: * make firewall mark exactly a single bit (we use bits `0x20` and `0x40` now) * match and mark packets with the mask (don't touch bits outside of the mask when setting the mark and ignore bits outside of the mask when matching on the mark). This was tested successfully with both Cilium CNI (default config + `ipam.mode=kubernetes`) and Calico CNI (default config). One thing to note is that for KubeSpan and Talos it's important to make sure that `podSubnets` in the machine config match CNI setting for `podCIDRs`. Signed-off-by: Andrey Smirnov <[email protected]>
Fixes siderolabs#4836 Firewall mark is `uint32` attached to the packet in the Linux kernel (it's not transmitted on the wire). This is a shared value for all networking software, so multiple components might attempt to set and match on the firewall mark. Cilium and Calico CNIs are using firewall marks internally, but they touch only some bits of the firewall mark. The way KubeSpan was implemented before this PR, it was doing direct match on the firewall mark, and setting the whole `uint32`, so it comes into conflict with any other networking component using firewall marks. The other problem was that firewall mark 0x51820 (0x51821) was too "wide" touching random bits of the 32-bit value for no good reason. So this change contains two fixes: * make firewall mark exactly a single bit (we use bits `0x20` and `0x40` now) * match and mark packets with the mask (don't touch bits outside of the mask when setting the mark and ignore bits outside of the mask when matching on the mark). This was tested successfully with both Cilium CNI (default config + `ipam.mode=kubernetes`) and Calico CNI (default config). One thing to note is that for KubeSpan and Talos it's important to make sure that `podSubnets` in the machine config match CNI setting for `podCIDRs`. Signed-off-by: Andrey Smirnov <[email protected]>
we'll get the fix backported to Talos 1.1.2 |
Fixes siderolabs#4836 Firewall mark is `uint32` attached to the packet in the Linux kernel (it's not transmitted on the wire). This is a shared value for all networking software, so multiple components might attempt to set and match on the firewall mark. Cilium and Calico CNIs are using firewall marks internally, but they touch only some bits of the firewall mark. The way KubeSpan was implemented before this PR, it was doing direct match on the firewall mark, and setting the whole `uint32`, so it comes into conflict with any other networking component using firewall marks. The other problem was that firewall mark 0x51820 (0x51821) was too "wide" touching random bits of the 32-bit value for no good reason. So this change contains two fixes: * make firewall mark exactly a single bit (we use bits `0x20` and `0x40` now) * match and mark packets with the mask (don't touch bits outside of the mask when setting the mark and ignore bits outside of the mask when matching on the mark). This was tested successfully with both Cilium CNI (default config + `ipam.mode=kubernetes`) and Calico CNI (default config). One thing to note is that for KubeSpan and Talos it's important to make sure that `podSubnets` in the machine config match CNI setting for `podCIDRs`. Signed-off-by: Andrey Smirnov <[email protected]> (cherry picked from commit 644e803)
Bug Report
Control plane machine config - https://paste.opendev.org/show/812287/
Worker node machine config - https://paste.opendev.org/show/812288/
Step by step installation without installing CNI (name set to none) and etcd is not failing either - https://paste.opendev.org/show/812289/
Then we proceed with Cilium installation -
After couple of mins, etcd is failing:
Logs
Other logs - https://paste.opendev.org/show/812290/
Master logs are attached to this Issue.
master-1.zip
master-2.zip
master-3.zip
Description
Cilium is working fine as CNI when you have talos cluster without Kubespan enabled. Once you enabled Kubespan and tries to install Cilium, masters etcd start failing. Likewise, cilium agents are not starting either. This is confirmed that without Kubespan, Cilium is working fine.
Environment
Bare metal / VMs
The text was updated successfully, but these errors were encountered: