Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: enable jumbo frames for GCP VPCs #1146

Merged
merged 1 commit into from
Feb 6, 2023
Merged

cli: enable jumbo frames for GCP VPCs #1146

merged 1 commit into from
Feb 6, 2023

Conversation

Nirusu
Copy link
Contributor

@Nirusu Nirusu commented Feb 3, 2023

Proposed change(s)

  • Raise MTU from 1460 to 8896 (jumbo frames)

This enhances general network performance inside Kubernetes with Cilium by ~4x.
(From ~1 Gbit/s to ~4 Gbit/s)

Note that Terraform's documentation on the mtu field is not correct that the maximum setting for MTU is 1500. I'll open a PR with them later to fix this.

@netlify
Copy link

netlify bot commented Feb 3, 2023

Deploy Preview for constellation-docs canceled.

Name Link
🔨 Latest commit 58d151d
🔍 Latest deploy log https://app.netlify.com/sites/constellation-docs/deploys/63dd43da1ab1de000923ef36

@Nirusu Nirusu added the bug fix Fixing a bug label Feb 3, 2023
@m1ghtym0
Copy link
Member

m1ghtym0 commented Feb 3, 2023

From a benchmarking perspective looks good and makes sense.
I'm not a networking expert, though, do jump frames have any other side-effect that we should be aware of?

@malt3
Copy link
Contributor

malt3 commented Feb 3, 2023

From a benchmarking perspective looks good and makes sense.
I'm not a networking expert, though, do jump frames have any other side-effect that we should be aware of?

A theoretical side-effect is package loss if the environment does no actually offer this MTU. This is a very controlled environment so I think the risk of someone changing the MTU on our VPCs is negligible.

Copy link
Contributor

@malt3 malt3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not try this end to end but the change looks good to me.

@Nirusu
Copy link
Contributor Author

Nirusu commented Feb 4, 2023

For more information, see here: https://cloud.google.com/vpc/docs/mtu
Generally for internal traffic, this shouldn't be a problem as everything should support high MTUs here in our VPCs - only our machines are here, after all.

For TCP internet traffic, Google Cloud performs MSS clamping, so TCP should behave fine.
For UDP internet traffic, Google rejects packets with >1600 bytes and sends an ICMP package that Fragmentation is needed.
Google Cloud does not support IP fragmentation, so the sender needs to adjust. Generally, any sane network stack should respond to this and lower the size.

Usually the Linux UDP stack does Path MTU Discovery to avoid this... but you never know. I am also not super experienced with UDP deployments.

What I could see is that some badly programmed service running in our cluster could misbehave if it takes the MTU from a local network adapter and just spews out large UDP packets - without responding to PMTU and adjusting them in size. That could break things.

Unfortunately I have no idea what good UDP real-world applications are to test. All UDP tests I did that could send some larger UDP packets (iperf & nc) seemed to behave relatively the same with the default MTU and the high MTU. The same means in both good & bad ways - it's still UDP ;)

All default cluster services seem to run fine. The sonobuoy tests pass fine. If there's an issue, it seems to be subtle and likely limited to UDP. But I don't see anything.

So for now I would say, let's merge it and test it a bit more with normal usage? The next release is still ~2 weeks away so there's hopefully plenty of time to test this and see unexpected breakage. Even if any breakage comes later, you can turn down the MTU size without having to destroy the cluster - you just need to shutdown all the machines, modify the VPC and turn the cluster back on.

@Nirusu Nirusu merged commit 0331e2d into main Feb 6, 2023
@Nirusu Nirusu deleted the ref/gcp-jumbo-frames branch February 6, 2023 10:07
@Nirusu Nirusu removed the bug fix Fixing a bug label Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants