Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: call appendMssRule function to resolved mss according problem #429

Merged
merged 1 commit into from
Jul 30, 2020

Conversation

fafucoder
Copy link
Contributor

call appendMssRule function to resolved mss according problem

@fafucoder
Copy link
Contributor Author

Generally, the MTU of the interface is set to 1400. But in special cases, a special pod (docker in docker) will introduce the docker0 interface to the pod. The MTU of docker0 is 1500. the network application in pod will calculate the TCP MSS according to the MTU of docker0, and then initiate communication with others. After the other party sends a response, the kernel protocol stack of Linux host will send ICMP unreachable message to the other party, indicating that IP fragmentation is needed, which is not supported by the other party, resulting in communication failure.

@oilbeater oilbeater merged commit 3dbc2f8 into kubeovn:master Jul 30, 2020
@oilbeater
Copy link
Collaborator

Thanks! @fafucoder

@Orfheo
Copy link

Orfheo commented Jul 6, 2024

I think I'm facing a "side effect" of this "fix" in my experimental K8s kube-ovn cluster.

I'm running the cluster on a rather weird, probably rare, network setup. Some node of the cluster sit over an Infiniband Mellanox NIC (IB), a set of other nodes are using a standard Ethernet Gbps NIC, so the nodes have quite different MTU, IB needs a large MTU=65520, while my non-jumbo Ethernet NIC has a standard MTU=1500.

To let them to coexist, of course, the kube-ovn daemonset "kube-ovn-cni" is configured with the maximum MTU which the nodes may share over the non fragmenting UDP OVN geneve tunnels, "--mtu=1432".

Things seems to work nicely, the cluster look heathly, I didn't found any network problem so far, until I tested, once again, using qperf, the host node IB bandwidth.

Quite weirdly, I got 180Mb/sec on my "qperf tcp_bw" test instead of the expected 1.8Gb/sec, what usually these IB nodes may achieve under Ubuntu-20.04 with kernel standard IB drivers. The UDP bandwidth instead was the expected usual 3Gb/sec.

Even more weird, the bandwidth was "normal" until the kube-ovn daemon started, then it drops to the 10% of the expected.

I did some investigation and I had been surprised to find on the IB nodes iptables this rule:

21537 1292K TCPMSS tcp -- * ibp2s0 0.0.0.0/0 0.0.0.0/0 tcp flags:0x06/0x02 TCPMSS set 1392

in the mangle chain, for my IB NIC "ibp2s0". This rule, if I understand correctly, rewrite the SYN tcp MSS frame field forcing its value to MTU-20=1432-20=1392.

That explain the bandwidth drop, the IB NIC can't work efficiently with such a small negotiated MSS value.

A simple hack on the node iptables:

Chain POSTROUTING (policy ACCEPT 20M packets, 9543M bytes)
pkts bytes target prot opt in out source destination
553M 2142G OVN-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0 /* kube-ovn postrouting rules */
1295K 184M RETURN all -- * ibp2s0 0.0.0.0/0 0.0.0.0/0
21537 1292K TCPMSS tcp -- * ibp2s0 0.0.0.0/0 0.0.0.0/0 tcp flags:0x06/0x02 TCPMSS set 1392

restored the IB NIC perfomance to its 1.8Gb/sec value.

I think I understand why this rule had been added by this fix, almost 3 years ago, but this is definitely an "heavy" side effect, losing 90% of the bandwidth available it is not definitely a nice surprise.

I'm wondering if this rule may become a configurable option of the "kube-ovn-cni" daemon, like the MTU value, to let users to choose the correct value for their environment.

If I got it correctly, leaving the TCP stack free to negotiate the best MSS value for each connected socket shouldn't create problems beside for the docker0 peculiar trouble, which, I guess, arise in the quite important KIND kube-ovn developing environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants