Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tunnel csum option configurable and default to false #4250

Merged
merged 1 commit into from
Oct 13, 2022

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Sep 27, 2022

For Linux kernel before Mar 2021, UDP checksum must be present to trigger GRO on the receiver for better performance of Geneve and VXLAN tunnels. The issue has been fixed in Linux kernel [1], thus computing UDP checksum is no longer necessary.

This patch exposes a configuration parameter TunnelCsum which determines whether to compute UDP encapsulation header (Geneve or VXLAN) checksums on outgoing packets and makes it default to false. It should only be set to true when Nodes run an unpatched Linux kernel and poor transfer performance is observed.

[1] torvalds/linux@89e5c58

Signed-off-by: Quan Tian [email protected]

I have confirmed the following distributions have included the above patch:

Making it default to false could avoid UDP checksum issues caused by checksum offload in some virtual and physical network adapters.

@codecov
Copy link

codecov bot commented Sep 27, 2022

Codecov Report

Merging #4250 (64a88b1) into main (0f77529) will decrease coverage by 2.08%.
The diff coverage is 62.54%.

❗ Current head 64a88b1 differs from pull request most recent head 0f8891b. Consider uploading reports for the commit 0f8891b to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4250      +/-   ##
==========================================
- Coverage   62.26%   60.17%   -2.09%     
==========================================
  Files         385      385              
  Lines       54501    55019     +518     
==========================================
- Hits        33933    33110     -823     
- Misses      18069    19433    +1364     
+ Partials     2499     2476      -23     
Flag Coverage Δ *Carryforward flag
e2e-tests 39.52% <49.68%> (?)
integration-tests 35.00% <44.49%> (+0.16%) ⬆️ Carriedforward from 0f8891b
kind-e2e-tests 41.58% <45.91%> (-6.59%) ⬇️ Carriedforward from 0f8891b
unit-tests 43.95% <26.24%> (+0.09%) ⬆️ Carriedforward from 0f8891b

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files Coverage Δ
cmd/antrea-agent/agent.go 0.00% <0.00%> (ø)
multicluster/cmd/multicluster-controller/leader.go 0.00% <0.00%> (ø)
multicluster/cmd/multicluster-controller/member.go 0.00% <0.00%> (ø)
...ulticluster/cmd/multicluster-controller/options.go 9.30% <0.00%> (ø)
pkg/agent/agent_windows.go 0.57% <ø> (+0.57%) ⬆️
pkg/agent/config/node_config.go 96.00% <ø> (ø)
pkg/agent/openflow/pipeline.go 79.45% <0.00%> (-4.10%) ⬇️
...catesigningrequest/ipsec_csr_signing_controller.go 59.71% <0.00%> (-1.95%) ⬇️
pkg/controller/serviceexternalip/controller.go 59.78% <0.00%> (-10.50%) ⬇️
pkg/ovs/openflow/ofctrl_meter.go 44.77% <0.00%> (ø)
... and 93 more

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it default to false could avoid checksum issues caused by checksum offload in some virtual and physical network adapters.

Is that only observed when the Node network is itself an overlay? Otherwise, this checksum is the outermost transport checksum, so it's hard to imagine that there would be an issue there.

antoninbas
antoninbas previously approved these changes Sep 27, 2022
jianjuns
jianjuns previously approved these changes Sep 27, 2022
@tnqn
Copy link
Member Author

tnqn commented Sep 28, 2022

Making it default to false could avoid checksum issues caused by checksum offload in some virtual and physical network adapters.

Is that only observed when the Node network is itself an overlay? Otherwise, this checksum is the outermost transport checksum, so it's hard to imagine that there would be an issue there.

I remember some adapters didn't calculate UDP checksums correctly when tx-udp_tnl-segmentation and tx-udp_tnl-csum-segmentation were on, but I don't know if it's double encap or not, let me check with people who experienced that issue.

luolanzone
luolanzone previously approved these changes Sep 28, 2022
@tnqn
Copy link
Member Author

tnqn commented Sep 28, 2022

Making it default to false could avoid checksum issues caused by checksum offload in some virtual and physical network adapters.

Is that only observed when the Node network is itself an overlay? Otherwise, this checksum is the outermost transport checksum, so it's hard to imagine that there would be an issue there.

I remember some adapters didn't calculate UDP checksums correctly when tx-udp_tnl-segmentation and tx-udp_tnl-csum-segmentation were on, but I don't know if it's double encap or not, let me check with people who experienced that issue.

Just found that was bad inner checksums in single encap scenario and you were the one who troubleshooted the issue. So yes, AFAIK, the issue only happened with double encap.

For Linux kernel before Mar 2021, UDP checksum must be present to
trigger GRO on the receiver for better performance of Geneve and VXLAN
tunnels. The issue has been fixed in Linux kernel [1], thus computing
UDP checksum is no longer necessary.

This patch exposes a configuration parameter TunnelCsum which
determines whether to compute UDP encapsulation header (Geneve or
VXLAN) checksums on outgoing packets and makes it default to false. It
should only be set to true when Nodes run an unpatched Linux kernel and
poor transfer performance is observed.

[1] torvalds/linux@89e5c58

Signed-off-by: Quan Tian <[email protected]>
@tnqn
Copy link
Member Author

tnqn commented Sep 29, 2022

Latest change fixed windows unit test

@tnqn
Copy link
Member Author

tnqn commented Sep 29, 2022

/test-all

@tnqn tnqn added this to the Antrea v1.9 release milestone Oct 11, 2022
@tnqn
Copy link
Member Author

tnqn commented Oct 12, 2022

/test-all

@tnqn
Copy link
Member Author

tnqn commented Oct 12, 2022

@jianjuns @antoninbas @luolanzone may I get another approval?

@tnqn tnqn merged commit 543e515 into antrea-io:main Oct 13, 2022
@tnqn tnqn deleted the csum-config branch October 13, 2022 07:49
heanlan pushed a commit to heanlan/antrea that referenced this pull request Mar 29, 2023
…4250)

For Linux kernel before Mar 2021, UDP checksum must be present to
trigger GRO on the receiver for better performance of Geneve and VXLAN
tunnels. The issue has been fixed in Linux kernel [1], thus computing
UDP checksum is no longer necessary.

This patch exposes a configuration parameter TunnelCsum which
determines whether to compute UDP encapsulation header (Geneve or
VXLAN) checksums on outgoing packets and makes it default to false. It
should only be set to true when Nodes run an unpatched Linux kernel and
poor transfer performance is observed.

[1] torvalds/linux@89e5c58

Signed-off-by: Quan Tian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants