Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix misleading typo in SNAT explanation #53

Merged
merged 2 commits into from
Nov 6, 2019
Merged

Conversation

tkornai
Copy link

@tkornai tkornai commented Aug 26, 2019

My understanding is that the edited sentence only makes sense with the proposed change. A node does not know about its public IP, private to public NAT happens on the InternetGateways. Also, nodes can only have a single public IP, a single primary private IP, and many secondary private IPs. So it doesn't make sense to talk about a 'primary public IP', but it is meaningful to say 'primary private IP'.

Unfortunately, the attached image is also misleading, as it shows a private to public SNAT done by the CNI plugin, whereas it should be a SNAT between a secondary private to the primary private address. The primary private to public SNAT happens on the gateway, this could also be shown.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

My understanding is that the edited sentence only makes sense with the proposed change. A node does not know about its public IP, private to public NAT happens on the InternetGateways. Also, nodes can only have a single public IP, a single primary private IP, and many secondary private IPs. So it doesn't make sense to talk about a 'primary public IP', but it is meaningful to say 'primary private IP'.

Unfortunately, the attached image is also misleading, as it shows a private to public SNAT done by the CNI plugin, whereas it is a SNAT between a secondary private to the primary private address. The primary private to public SNAT happens on the gateway, this could be also shown.
Copy link

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest a different wording. See inline comment.

1. Save the file and exit your text editor\.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what was changed in this line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing, I cannot even see a whitespace change. Probably a GitHub bug?

@@ -1,6 +1,6 @@
# External Source Network Address Translation \(SNAT\)<a name="external-snat"></a>

By default, the [Amazon VPC CNI plugin for Kubernetes](https://github.com/aws/amazon-vpc-cni-k8s) configures pods with source network address translation \(SNAT\) enabled\. This sets the return address for a packet to the primary public IP of the instance and allows for communication with the internet\. In this default configuration, when you use an internet gateway and a public address, the return packet is routed to the correct Amazon EC2 instance\.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Yeah, this is confusing. I would actually suggest rewording this differently, like so:

"By default, the Amazon VPC CNI plugin for Kubernetes configures pods with source network address translation (SNAT) enabled. For pods running on worker nodes that are in a VPC with a private subnet, SNAT will set the return address for a packet to the IP of the subnet's associated Internet Gateway. For pods running on worker nodes that are in a VPC with a public subnet, SNAT will set the return address for a packet to the worker node's public IP address. This allows for communication from the pod to the Internet and ensures the return packet is routed to the correct Amazon EC2 instance"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely agree that covering the case of public and the case of private subnets distinctly can make the documentation clearer. I have a different understanding of what you explained above, I'll try to quickly explain my version.

Let me first cover the case for worker nodes in a public subnet. I believe that an EC2 instance does not know its own public IP. I base this on the following explanation, taken from the official VPC Internet Gateway documentation (https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html):

Your instance is only aware of the private (internal) IP address space defined within the VPC and subnet. The internet gateway logically provides the one-to-one NAT on behalf of your instance, so that when traffic leaves your VPC subnet and goes to the internet, the reply address field is set to the public IPv4 address or Elastic IP address of your instance, and not its private IP address. Conversely, traffic that's destined for the public IPv4 address or Elastic IP address of your instance has its destination address translated into the instance's private IPv4 address before the traffic is delivered to the VPC.

If this assumption holds, then the CNI plugin cannot do a translation from secondary private to public, as it doesn't even know the public address. What it can do is translate from the secondary private address to the primary private address. This is a must, as the Internet Gateway can only do the mapping between the primary private and the public address, it cannot map between secondary private addresses and the public IP. This is exactly the reason, why the external SNAT cannot be enabled for public subnets.

The case of nodes in a private subnet is different as the NAT Gateway can route not only to the primary private IP but to the secondary private IPs as well. So it is possible to not do SNAT on the worker node itself, the NAT Gateway will take care of it.

Still, the SNAT that can take place on the node is always between secondary private IP and primary private IP, the public IP is never involved there.

I think the explainer image should be also updated (maybe even have one for the public and one for the private case). Do you know if there is a supported way to update images in PRs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on the above?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tkornai my apologies. still mulling this over. want to chat with @anguslees about this further. I don't believe there is a way of updating the images in a PR through Github. You'd need to git commit a new image.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jaypipes for getting back to me. I'm happy to propose updated images in case I can get access to the source of the originals and if you can point me to the diagramming tool used.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mogren, how about this text:

"By default, the Amazon VPC CNI plugin for Kubernetes configures pods with source network address translation (SNAT) enabled for traffic that leaves the VPC. Communication within the VPC (such as pod to pod) is direct and SNAT does not occur.

For pods running on worker nodes that are in a VPC with a private subnet, SNAT will set the return address for a packet to the IP of the subnet's associated Internet Gateway. For pods running on worker nodes that are in a VPC with a public subnet, SNAT will set the return address for a packet to the worker node's public IP address. This allows for communication from the pod to the Internet and ensures the return packet is routed to the correct Amazon EC2 instance."

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrdlngr LGTM!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrdlngr My understanding is that private subnets are always accessing the internet through NAT gateways and never through Internet Gateways. I base this on the official documentation quoted below.

An internet gateway serves two purposes: to provide a target in your VPC route tables for internet-routable traffic, and to perform network address translation (NAT) for instances that have been assigned public IPv4 addresses.

source: https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html

You can use a network address translation (NAT) gateway to enable instances in a private subnet to connect to the internet or other AWS services, but prevent the internet from initiating a connection with those instances. For more information about NAT, see NAT.

source: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html

I also quoted the official documentation in my previous reply about how EC2 instances are not aware of their public IP - so SNAT on the node can only happen between private IPs. That is secondary private IPs get changed to the primary private IP of the node.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that instances in private subnets use NAT gateways for communication to the internet, but that must also go through an Internet Gateway. Without and Internet Gateway, nothing in your VPC can connect to the internet.

Also, the CNI uses the EC2 DescribeInstances API call to get information about the instance it is running on, so it doesn't matter what the host knows or doesn't know about its IP addresses. It gets permissions for this call from the worker node's instance IAM role.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nrdlngr, sorry for being so persistent about this issue, but I think the information in the documentation is still wrong. I believe your two statements above are mostly true, but they do not address my primary concern.

My understanding is that inside the VPC all routing is based on private IPs. Translation from private IP to public IP only happens on the InternetGateway. This is confirmed by the official documentation:

To enable communication over the internet for IPv4, your instance must have a public IPv4 address or an Elastic IP address that's associated with a private IPv4 address on your instance. Your instance is only aware of the private (internal) IP address space defined within the VPC and subnet. The internet gateway logically provides the one-to-one NAT on behalf of your instance, so that when traffic leaves your VPC subnet and goes to the internet, the reply address field is set to the public IPv4 address or Elastic IP address of your instance, and not its private IP address. Conversely, traffic that's destined for the public IPv4 address or Elastic IP address of your instance has its destination address translated into the instance's private IPv4 address before the traffic is delivered to the VPC.

Hence the VPC CNI plugin should never set the source IP to a public IP.

If my understanding is incorrect, please help me understand where I'm mistaken.

@nrdlngr
Copy link

nrdlngr commented Nov 1, 2019

I've updated this doc to use the text suggested by @jaypipes and approved by @mogren.

Closing this issue. Thanks for helping us to improve the clarity of this topic!

@nrdlngr nrdlngr closed this Nov 1, 2019
@tkornai
Copy link
Author

tkornai commented Nov 3, 2019

I think the documentation is still wrong and this PR was closed with an invalid resolution. Please find my explanation above.

@tkornai
Copy link
Author

tkornai commented Nov 3, 2019

For pods running on worker nodes in a public subnet, SNAT will set the return address for a packet to the worker node's public IP address. This allows for communication from the pod to the Internet and ensures the return packet is routed to the correct Amazon EC2 instance.

In that case the InternetGateway would receive the response packets, but it wouldn't be able to figure out where to forward them inside the VPC.

What actually happens is that the primary private IP is set as the source IP (by the vpc-cni plugin), and the InternetGateway will swap it with the node's public IP. When the return packet is received for that public IP it will do the inverse transformation for the primary private IP.

@mogren
Copy link

mogren commented Nov 3, 2019

@tkornai Thanks for being persistent about this. You are right that when SNAT is enabled, which is the default, the CNI does this by NAT:ing the pod IP with the primary private IP of eth0. The relevant output from sudo iptables -nL -t nat on a worker node:

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
AWS-SNAT-CHAIN-0  all  --  0.0.0.0/0            0.0.0.0/0            /* AWS SNAT CHAIN */

Chain AWS-SNAT-CHAIN-0 (1 references)
target     prot opt source               destination
AWS-SNAT-CHAIN-1  all  --  0.0.0.0/0           !10.10.0.0/16         /* AWS SNAT CHAIN */

Chain AWS-SNAT-CHAIN-1 (1 references)
target     prot opt source               destination
SNAT       all  --  0.0.0.0/0            0.0.0.0/0            /* AWS, SNAT */ ADDRTYPE match dst-type !LOCAL to:10.10.11.230 random

Where 10.10.11.230 is the private IP of the node, fetched from the EC2 API.

@jaypipes
Copy link

jaypipes commented Nov 4, 2019

@nrdlngr if you'd like to include the additional wording from @tkornai that adds some extra colour to the explanation about SNAT for EC2 worker node instances with a public IP (exposed via the Internet Gateway), that would be fine with me.

Specifically, Thomas stated this (in a couple places, reworded for clarity/brevity):

"Inside the VPC all routing is based on private IPs. Translation from private IP to public IP only happens on the Internet Gateway. The Internet Gateway performs SNAT by translating the EC2 worker node instance's primary private IP address to its public IP address for traffic transmitted from a container workload on the EC2 worker node to the Internet."

@nrdlngr nrdlngr reopened this Nov 6, 2019
@nrdlngr nrdlngr merged commit 3b2e07a into awsdocs:master Nov 6, 2019
@nrdlngr
Copy link

nrdlngr commented Nov 6, 2019

Obviously through our conversations, more than your suggestion was added to this topic. But your original suggestion was correct, and I apologize that it took this long to go through. Thank you for working with us to improve our documentation.

I removed the graphic that was inaccurate, and I've created an internal work item to fix the graphic and get it back in.

Thanks!

@tkornai
Copy link
Author

tkornai commented Nov 7, 2019

@nrdlngr thanks for following through this question with me. I think this is a fairly complex topic, this is why I think the documentation should be as clear and accurate as possible.

For traffic that leaves the VPC, the CNI sets the source address for a packet to the primary private IP of the worker node's eth0 interface. When this traffic reaches the Internet Gateway, SNAT works differently in private and public subnets:

  • For pods running on worker nodes in a private subnet, the Internet Gateway translates the node's primary private IP address to the public IP address of the Internet Gateway.
  • For pods running on worker nodes in a public subnet, the Internet Gateway translates the node's primary private IP address to the node's public IP address.

I think the public subnet case is correct in its current form. For the case of the private subnet I think it would help to make clear that SNAT happens by default 3 times:

  1. The VPC CNI plugin sets the packet's source IP from the pod's IP to the node's primary private IP.
  2. The NAT gateway swaps the node's primary private IP to the NAT gateway's private IP.
  3. The InternetGateway swaps the NAT gateway's private IP to the NAT gateway's public IP (which is an ElasticIP actually).

When the external SNAT flag is turned on the first SNAT is skipped.

It worth to mention that the network address translation will likely affect port numbers as well, not just IPs. The InternetGateway is an exception to that, it only changes IPs, does not need to fiddle with ports. (The NAT gateway maps multiple private IPs to the same public IP so port collisions are likely to happen. The InternetGateway does a one-to-one mapping from private IPs to public IPs, so there are no port collisions.)

For public subnets SNAT must happen on the node by the VPC CNI plugin, otherwise the InternetGateway would receive a packet with a secondary private IP (the pod's IP) which does not have a public IP pair. This is why the source IP has to be changed to the node's primary private IP, as that IP has a corresponding public IP mapped to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants