Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antrea IPAM picks overlapping pod IP addresses #119

Closed
varunmar opened this issue Nov 20, 2019 · 7 comments · Fixed by #228
Closed

Antrea IPAM picks overlapping pod IP addresses #119

varunmar opened this issue Nov 20, 2019 · 7 comments · Fixed by #228
Assignees

Comments

@varunmar
Copy link

Describe the bug
When installing Antrea on an existing cluster, pod IPs allocated may conflict with previously created pods (like kube-dns)

To Reproduce
Create a GKE cluster. Install Antrea. Create enough pod so that some of those pods will end up on nodes that already have existing kube-system pods.

Expected
Pods are allocated IPs that do not overlap existing pod IPs.

From a 2-node GKE cluster, with Antrea deployed using vmware-tanzu/antrea/master/build/yamls/antrea.yml file, and kube-dns pods already existing on the node. If we add a few iperf pods, I can see this -

k get pods -o wide --all-namespaces

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default iperf-client 1/1 Running 0 2m37s 10.44.0.2 gke-antrea-test-default-pool-41d51a90-ltw0
default iperf-server-cbdb86575-qrl5p 1/1 Running 0 2m40s 10.44.1.2 gke-antrea-test-default-pool-41d51a90-5bvq

kube-system kube-dns-79868f54c5-h9c4m 4/4 Running 0 9m31s 10.44.0.2 gke-antrea-test-default-pool-41d51a90-ltw0
kube-system kube-dns-79868f54c5-km95f 4/4 Running 0 8m52s 10.44.1.2 gke-antrea-test-default-pool-41d51a90-5bvq

This is a cluster running kubernetes v1.13.11

@varunmar varunmar added the bug label Nov 20, 2019
@jianjuns
Copy link
Contributor

Thanks for reporting this!
When you replace the CNI to Antrea, do you expect the existing Pods (created before Antrea is deployed) continue to have network connectivity? For now Antrea could not guarantee that.
Or you actually mean even the existing Pods lose connectivity, you would still avoid Antrea allocates IPs that conflict with the existing Pods, even that does not impact new Pods' connectivity?

@tnqn
Copy link
Member

tnqn commented Nov 20, 2019

I see currently GKE supports GKE native CNI and Calico, so the existing Pods should get their IPs from one of them. Even if the IPs don't conflict, I think it's not guaranteed that Pods whose network are created by different CNIs can communicate, as Pods are connected via different approaches (linux bridge, routing mode, openvswitch bridge), and the gateway interfaces created by each CNI in default namespace might have IP conflict too. If a GKE cluster can start with a non CNI enabled state like kubeadm init, or provide a mechanism to start with other CNIs, it should work.

@varunmar
Copy link
Author

Ah interesting point about the gateway interfaces. Yes, that would also be unfortunate - we were just lucky that the default GKE CNI doesn't create any gateway devices.
I wouldn't expect existing pods to be able to communicate with the newly created ones, but I would have expected the old ones to communicate with whatever they had connectivity to before (the external world, for example) and all the new pods would be managed by Antrea's CNI.

But you're right - that's a confusing situation, and I can't imagine any production clusters wanting to be in that mode. One easy way to fix this is to just have all the pods restart after the Antrea CNI is installed. Then, when they all get recreated they can join the Antrea overlay.

Thanks!

@antoninbas
Copy link
Contributor

Should we take the action item to add some documentation to the getting started guide about deploying Antrea in a cluster which already uses a different CNI plugin?

We could suggest the following steps: 1) delete existing CNI, 2) apply Antrea's yaml, 3) drain / uncordon each node one by one.

@antoninbas
Copy link
Contributor

Similarly we may want to make sure that Antrea cleans up after itself properly (deleting the gw interface, etc.) when it is deleted from a cluster :/

@jianjuns
Copy link
Contributor

Make sense to me. We might have a cleanup DaemonSet to do the cleanup.

@antoninbas
Copy link
Contributor

Assigning this to me. Discussed at the 12/04/2019 Antrea community meeting. Only AI is to document how to deploy Antrea on a cluster which already has a CNI / running Pods.

@antoninbas antoninbas self-assigned this Dec 4, 2019
@antoninbas antoninbas added this to the Antrea v0.2.0 release milestone Dec 5, 2019
antoninbas added a commit to antoninbas/antrea that referenced this issue Dec 14, 2019
antoninbas added a commit to antoninbas/antrea that referenced this issue Dec 14, 2019
antoninbas added a commit to antoninbas/antrea that referenced this issue Dec 14, 2019
antoninbas added a commit to antoninbas/antrea that referenced this issue Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants