Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VXLAN: bad UDP checksums #8992

Closed
maxpain opened this issue Jun 16, 2022 · 8 comments · Fixed by #9249 or #9388
Closed

VXLAN: bad UDP checksums #8992

maxpain opened this issue Jun 16, 2022 · 8 comments · Fixed by #9249 or #9388
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@maxpain
Copy link
Contributor

maxpain commented Jun 16, 2022

Environment:

  • Cloud provider or hardware configuration:
    Virtual machine

  • OS:

Linux 5.15.0-37-generic x86_64
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Kubespray version:
v2.19.0

Network plugin used:
calico with vxlan

The problem:
When using calico with vxlan as a tunnel there is no connectivity between containers on different nodes:

image

Workaround:

sudo ethtool -K vxlan.calico tx-checksum-ip-generic off

or

featureDetectOverride: "ChecksumOffloadBroken=true"

But this has a performance impact

Related issues:
projectcalico/calico#3145
projectcalico/calico#4865
flannel-io/flannel#1279
rancher/rke2#1541

@maxpain maxpain added the kind/bug Categorizes issue or PR as related to a bug. label Jun 16, 2022
@champtar
Copy link
Contributor

This is usually a combination kernel version + driver + firmware version, could you give us ethtool -i <eth0> (name might not be eth0) and more details on the hypervisor ?
Side note, with hardware offload it's normal to have bad checksum on outgoing packets in tcpdump, so when investigating only look at incoming packets (tcpdump -Q in ...)

@maxpain
Copy link
Contributor Author

maxpain commented Jun 17, 2022

@champtar

driver: vmxnet3
version: 1.6.0.0-k-NAPI
firmware-version:
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

@maxpain
Copy link
Contributor Author

maxpain commented Jun 17, 2022

Side note, with hardware offload it's normal to have bad checksum on outgoing packets

Yes, but these packets don't send.

@champtar
Copy link
Contributor

Would be worth a shot to open a vmware ticket

@cristicalin
Copy link
Contributor

Agreed with @champtar this looks more like a kernel/hypervisor issue, you can also reach out the the project calico folks with the details above since kubespray only deploys calico not modifies it in any way.

@DomHoney
Copy link

DomHoney commented Jul 11, 2022

This issue suggests its a recent kernel driver update that's causing the issue:
I didn't spot champtar in this thread, and I didn't properly look at the originally linked issues. Leaving this here as another reference.
projectcalico/calico#4727

I haven't confirmed via tcpdump but I have been struggling with calico vxlan on Ubuntu 20.04 with HWE 5.13 kernel. vxlan works fine on standard 5.4 kernel.
Redeploying with IP in IP mode allows things to work for 5.13 kernel.

@yankay
Copy link
Member

yankay commented Sep 2, 2022

I have the same issue at the RockyLinux 9.
kernel version: 5.14.0-70.13.1.el9_0.x86_64

@maxpain
Copy link
Contributor Author

maxpain commented Sep 2, 2022

I finally switched from Kubespray to Talos OS, and I no more have any configuration problems with OS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
5 participants