Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SNAT instead of MASQUERADE to source NAT outbound IPVS traffic #668

Merged

Conversation

lucasmundim
Copy link
Contributor

This disambiguates masqueraded source ip when a container with hostNetwork=true and dnsPolicy=ClusterFirstWithHostNet on as host with multiple interfaces or ip alias on loopback interface tries to access a pod behind an IPVS service.

It also deletes the old rule so it won't break when upgrading.

Fixes #549 and Fixes #667

@murali-reddy
Copy link
Member

murali-reddy commented Mar 13, 2019

@lucasmundim please rebase

Still i wonder existing rule to use a source IP (as mentioned in #549 (comment)) is not taking effect.

IMO, in case when no masqurading is used we will still have problem if wrong source IP is used right?

@lucasmundim
Copy link
Contributor Author

@lucasmundim please rebase

done

Still i wonder existing rule to use a source IP (as mentioned in #549 (comment)) is not taking effect.

I don't know if this is an expected or unwanted behavior when using IPVS. But this PR fixes the problem, at least for my case.

IMO, in case when no masqurading is used we will still have problem if wrong source IP is used right?

humm, maybe. I haven't tested this as the environment where I run it right now would not support not masquerading. We have plans to setup BGP external peering on our cluster so we have routable podCIDRs throughout our company's network. I'll share any problems and/or solutions we find.

@lucasmundim
Copy link
Contributor Author

Just rebased. Not sure why the Travis CI build failed.

@trnl
Copy link

trnl commented Jun 6, 2019

@lucasmundim, we are facing the same issue. Are you using your custom fork now?

Still i wonder existing rule to use a source IP (as mentioned in #549 (comment)) is not taking effect.

This is indeed what's happening. Having different source IP that specified in local route table.

@lucasmundim
Copy link
Contributor Author

lucasmundim commented Jun 8, 2019

@lucasmundim, we are facing the same issue. Are you using your custom fork now?

@trnl Yes, we are. We've been using it in our clusters, including the production one, since Feb 2019.

@trnl
Copy link

trnl commented Jun 12, 2019

@lucasmundim as @murali-reddy and no-one from the cloudnative team is not responding I have a feeling that we need to fork.

You just using yours from https://github.com/lucasmundim/kube-router/tree/snat-outbound-ipvs-traffic? How about rebasing on top of 0.3.1?

@murali-reddy
Copy link
Member

@trnl sorry for late response. I still need to figure out the concern (#668 (comment)) that I had before merging.

This should go in for 0.4, there are couple of features lined up for 0.4. I am hoping to get a release out in next 2-3 weeks.

@arminbuerkle
Copy link

I'm using @lucasmundim branch in 2 of our clusters and it works just fine for pod to pod networking and ipvs.

However i just hit a similar problem when i tried to use kube-router for cross cluster networking:

Basically i have 2 clusters which share a private subnet on eth1, eth0 is used for public internet.

I managed to get pod routes propagated with route reflectors and bgp peering, though when i tried to ping from cluster 1 - pod 1 to cluster 2 - pod 2 traffic gets masqueraded with the ip of eth0 instead of eth1. It does reach the destination host with a the public ip instead of the private on eth1

I believe the problem is in pod_egress.go:

// set up MASQUERADE rule so that egress traffic from the pods gets masqueraded to node's IP

var (
	podEgressArgs4 = []string{"-m", "set", "--match-set", podSubnetsIPSetName, "src",
		"-m", "set", "!", "--match-set", podSubnetsIPSetName, "dst",
		"-m", "set", "!", "--match-set", nodeAddrsIPSetName, "dst",
"-j", "MASQUERADE"}
...

Adding a SNAT rule before the pod egress rules similar to this solves my problem:
iptables -t nat -I POSTROUTING 1 -d 172.16.0.0/16 -j SNAT --to-source ${NODE_IP}

I suspect all podEgressArgs* rules could be changed to SNAT?

@trnl
Copy link

trnl commented Aug 22, 2019

@murali-reddy,

@trnl sorry for late response. I still need to figure out the concern (#668 (comment)) that I had before merging.

After some discussion with @kirill-korzun I think we can explain what's happening here.

According to (documentation)[http://linux-ip.net/html/tools-ip-route.html]:

The final characteristic available to us in each line of the local routing table output is the src keyword. This is treated as a hint to the kernel about what IP address to select for a source address on outgoing packets on this interface.

So for a line lie the following:

bash-4.2# ip route | grep 130
10.0.130.0/24 dev tun-17223125164 proto 17 src 172.0.0.4

It basically mean that we'll hint kernel which IP to use when sending traffic over this interface. But it's if traffic originating on the VM itself. However if the traffic entering though the port and then adjusted according to IPVs rules, I believe we not gonna take this hint into account and just take the 1st interface.
May be also the MASQUARADE rule selects the default SRC IP (1st) as the tun- devices are just internal?

Chain POSTROUTING (policy ACCEPT 34382 packets, 2069K bytes)
 pkts bytes target     prot opt in     out     source               destination
 107M 6959M KUBE-POSTROUTING  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
    0     0 MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0
5768K  345M MASQUERADE  all  --  *      *      !10.0.58.0/24        !10.0.58.0/24         vdir ORIGINAL vmethod MASQ /*  */
1011K   61M MASQUERADE  all  --  *      *       0.0.0.0/0            0.0.0.0/0            match-set kube-router-pod-subnets src ! match-set kube-router-pod-subnets dst ! match-set kube-router-node-ips dst

In anyway I think it make sense to merge this pull request as it's explicitly set to use the same SIP for SNAT as the one in configured tunnels.

@Onlinehead
Copy link

I just want to confirm that I have a similar problem with CentOS 7 (3.10.0-957.27.2.el7.x86_64) and 2 interfaces (ext and int).
Adding SNAT rule iptables -t nat -I POSTROUTING 1 -d 172.16.0.0/16 -j SNAT --to-source ${INT_IP} fixed the problem.
It will be great if that MR will be merged and released.

@trnl
Copy link

trnl commented Sep 3, 2019

@murali-reddy, I wonder if there are there any feedback?

@lucasmundim
Copy link
Contributor Author

@murali-reddy, I've just rebased on top of v0.4.0-rc3, and now travis-ci is green (thanks to #823).

I wonder if you still have any concerns about merging it.

@murali-reddy
Copy link
Member

@lucasmundim thanks for rebasing. apologies for not giving enough attention to this PR. I will review and give it a try coming week

@murali-reddy
Copy link
Member

murali-reddy commented Feb 10, 2020

The final characteristic available to us in each line of the local routing table output is the src keyword. This is treated as a hint to the kernel about what IP address to select for a source address on outgoing packets on this interface.

@trnl thanks for pointing out to kernel documentation and your explnationation

In anyway I think it make sense to merge this pull request as it's explicitly set to use the same SIP for SNAT as the one in configured tunnels.

make sense

PR working is as expected (SNAT instead of masqurade). I will test some more and see if there are any other cases it will break. So far looks good.

@murali-reddy murali-reddy merged commit 13421da into cloudnativelabs:master Feb 16, 2020
@murali-reddy
Copy link
Member

thanks @lucasmundim for the PR and for your patience :) Code changes look good to me and did not find any issues in testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants