Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomise outgoing port for connections in the SNAT iptables rule. #246

Merged
merged 16 commits into from
Jan 9, 2019

Conversation

taylorb-syd
Copy link
Contributor

Issue #, if available: N/A

Description of changes: This change introduces the --random flag to outgoing SNAT connections, and enables it by default.

When making a large number of outgoing connections to the same endpoint there is a known race condition with SNAT rules that means that two connections can be assigned the same outgoing port from the ephemeral range. This results in connection traffic (for both the client and server) being unable to track both connections because they will have the same Source IP Address, Source Port, Destination IP Address, and Destination Port, resulting in one process being "orphaned".

To prevent this edge case from occurring you can enable the --random flag within the SNAT rule which will select a random (unused) port from the ephemeral range rather than assigning them sequentially.

The impact for enabling this should be minimal for most customers, so this option is enabled by default in this pull request.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@taylorb-syd
Copy link
Contributor Author

Realized that since kernel 3.14 there has been an option in the extensions called --random-fully that uses PRNG rather than hash based random allocation. Added prng as third option for the environment variable for people who need --random-fully

@miguelbernadi
Copy link

As the PR adds the AWS_VPC_K8S_CNI_RANDOMISESNAT variable, I think it would be better to specify the different possible string values instead of relying on an obscure meaning of true and false. I'd suggest using the following terms:

  • none: equal to false
  • hashrandom: equal to true
  • prng: equal to prng

I'm not aware of the reasoning behind the original naming scheme, but this should make clear what each option actually means and may also make the code a bit cleaner/less convoluted.

@taylorb-syd
Copy link
Contributor Author

taylorb-syd commented Jan 2, 2019

As the PR adds the AWS_VPC_K8S_CNI_RANDOMISESNAT variable, I think it would be better to specify the different possible string values instead of relying on an obscure meaning of true and false. I'd suggest using the following terms:

  • none: equal to false
  • hashrandom: equal to true
  • prng: equal to prng

I'm not aware of the reasoning behind the original naming scheme, but this should make clear what each option actually means and may also make the code a bit cleaner/less convoluted.

The first revision I wrote was tested on an old kernel that didn't support --random-fully, so I originally made it a true/false option. I then found that newer kernels, including Amazon Linux 2 (default for EKS) supports --random-fully and that there are benefits to using PRNG over the hash based random, especially with lots of calls to the same destination (say an API endpoint like S3).

I modified the code to therefore support prng as a "special case". Given that reasoning doesn't hold true for the actual library (i.e. there will be no backwards compatibly reasons) I have modified as per your suggestion.

Copy link
Contributor

@mogren mogren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing!

Could you please squash these changes to a single commit?

pkg/networkutils/network.go Outdated Show resolved Hide resolved
pkg/networkutils/network.go Outdated Show resolved Hide resolved
pkg/networkutils/network.go Outdated Show resolved Hide resolved
@nak3
Copy link
Contributor

nak3 commented Jan 5, 2019

I like this idea and believe that it is necessary. But I guess that updated SNAT rule is not available until the host was rebooted? I mean that if you change none to prng, iptables rule will be appended but previous none's rule still exists like this?:

iptables -t nat -A POSTROUTING -s x.x.x.x -j SNAT --to y.y.y.y
iptables -t nat -A POSTROUTING -s x.x.x.x -j SNAT --to y.y.y.y --random-fully

Even if the answer is yes, it might be better to leave it as a known limitation rather than fixing it, though.

@taylorb-syd
Copy link
Contributor Author

Sorry for the mess of commits, probably shouldn't be coding late at night. That's what Squash on Merge is for right?

I like this idea and believe that it is necessary. But I guess that updated SNAT rule is not available until the host was rebooted?

Right now, any changes to the network environment (including modification of this setting) will require an instance reboot to action. It might be possible to make it work with just a CNI plugin restart by looping through and inspecting all of the rules. I've written some code to this end, but I want to do some robust testing of it before I create a PR. This also means writing unit tests for the code, right now we don't have many for SNAT.

For example: if a customer adds or removes VPC CIDR to their VPC, the number of AWS-SNAT-CHAIN-%d chains created will change, as will the order of these chains. You can find my branch for testing here if you wanna help me: https://github.com/taylorb-syd/amazon-vpc-cni-k8s/tree/cleanupiptrnc

Gopkg.toml Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants