Antrea IPAM picks overlapping pod IP addresses #119

varunmar · 2019-11-20T01:26:59Z

Describe the bug
When installing Antrea on an existing cluster, pod IPs allocated may conflict with previously created pods (like kube-dns)

To Reproduce
Create a GKE cluster. Install Antrea. Create enough pod so that some of those pods will end up on nodes that already have existing kube-system pods.

Expected
Pods are allocated IPs that do not overlap existing pod IPs.

From a 2-node GKE cluster, with Antrea deployed using vmware-tanzu/antrea/master/build/yamls/antrea.yml file, and kube-dns pods already existing on the node. If we add a few iperf pods, I can see this -

k get pods -o wide --all-namespaces

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default iperf-client 1/1 Running 0 2m37s 10.44.0.2 gke-antrea-test-default-pool-41d51a90-ltw0
default iperf-server-cbdb86575-qrl5p 1/1 Running 0 2m40s 10.44.1.2 gke-antrea-test-default-pool-41d51a90-5bvq

kube-system kube-dns-79868f54c5-h9c4m 4/4 Running 0 9m31s 10.44.0.2 gke-antrea-test-default-pool-41d51a90-ltw0
kube-system kube-dns-79868f54c5-km95f 4/4 Running 0 8m52s 10.44.1.2 gke-antrea-test-default-pool-41d51a90-5bvq

This is a cluster running kubernetes v1.13.11

jianjuns · 2019-11-20T07:30:12Z

Thanks for reporting this!
When you replace the CNI to Antrea, do you expect the existing Pods (created before Antrea is deployed) continue to have network connectivity? For now Antrea could not guarantee that.
Or you actually mean even the existing Pods lose connectivity, you would still avoid Antrea allocates IPs that conflict with the existing Pods, even that does not impact new Pods' connectivity?

tnqn · 2019-11-20T07:50:09Z

I see currently GKE supports GKE native CNI and Calico, so the existing Pods should get their IPs from one of them. Even if the IPs don't conflict, I think it's not guaranteed that Pods whose network are created by different CNIs can communicate, as Pods are connected via different approaches (linux bridge, routing mode, openvswitch bridge), and the gateway interfaces created by each CNI in default namespace might have IP conflict too. If a GKE cluster can start with a non CNI enabled state like kubeadm init, or provide a mechanism to start with other CNIs, it should work.

varunmar · 2019-11-20T23:42:03Z

Ah interesting point about the gateway interfaces. Yes, that would also be unfortunate - we were just lucky that the default GKE CNI doesn't create any gateway devices.
I wouldn't expect existing pods to be able to communicate with the newly created ones, but I would have expected the old ones to communicate with whatever they had connectivity to before (the external world, for example) and all the new pods would be managed by Antrea's CNI.

But you're right - that's a confusing situation, and I can't imagine any production clusters wanting to be in that mode. One easy way to fix this is to just have all the pods restart after the Antrea CNI is installed. Then, when they all get recreated they can join the Antrea overlay.

Thanks!

antoninbas · 2019-11-21T00:25:28Z

Should we take the action item to add some documentation to the getting started guide about deploying Antrea in a cluster which already uses a different CNI plugin?

We could suggest the following steps: 1) delete existing CNI, 2) apply Antrea's yaml, 3) drain / uncordon each node one by one.

antoninbas · 2019-11-21T00:26:29Z

Similarly we may want to make sure that Antrea cleans up after itself properly (deleting the gw interface, etc.) when it is deleted from a cluster :/

jianjuns · 2019-11-21T00:29:51Z

Make sense to me. We might have a cleanup DaemonSet to do the cleanup.

antoninbas · 2019-12-04T17:50:21Z

Assigning this to me. Discussed at the 12/04/2019 Antrea community meeting. Only AI is to document how to deploy Antrea on a cluster which already has a CNI / running Pods.

Fixes antrea-io#119

Fixes #119

varunmar added the bug label Nov 20, 2019

antoninbas self-assigned this Dec 4, 2019

antoninbas added this to the Antrea v0.2.0 release milestone Dec 5, 2019

antoninbas added documentation and removed bug labels Dec 11, 2019

antoninbas added a commit to antoninbas/antrea that referenced this issue Dec 14, 2019

Add a note about replacing other CNI with Antrea

d4cd76a

Fixes antrea-io#119

antoninbas added a commit to antoninbas/antrea that referenced this issue Dec 14, 2019

Add a note about replacing other CNI with Antrea

5bff4c1

Fixes antrea-io#119

antoninbas mentioned this issue Dec 14, 2019

Add a note about replacing other CNI with Antrea #228

Merged

antoninbas added a commit to antoninbas/antrea that referenced this issue Dec 14, 2019

Add a note about replacing other CNI with Antrea

ad44120

Fixes antrea-io#119

antoninbas added a commit to antoninbas/antrea that referenced this issue Dec 16, 2019

Add a note about replacing other CNI with Antrea

d54b904

Fixes antrea-io#119

antoninbas closed this as completed in #228 Dec 16, 2019

antoninbas added a commit that referenced this issue Dec 16, 2019

Add a note about replacing other CNI with Antrea

fb35206

Fixes #119

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Antrea IPAM picks overlapping pod IP addresses #119

Antrea IPAM picks overlapping pod IP addresses #119

varunmar commented Nov 20, 2019

jianjuns commented Nov 20, 2019

tnqn commented Nov 20, 2019

varunmar commented Nov 20, 2019

antoninbas commented Nov 21, 2019

antoninbas commented Nov 21, 2019

jianjuns commented Nov 21, 2019

antoninbas commented Dec 4, 2019

Antrea IPAM picks overlapping pod IP addresses #119

Antrea IPAM picks overlapping pod IP addresses #119

Comments

varunmar commented Nov 20, 2019

k get pods -o wide --all-namespaces

jianjuns commented Nov 20, 2019

tnqn commented Nov 20, 2019

varunmar commented Nov 20, 2019

antoninbas commented Nov 21, 2019

antoninbas commented Nov 21, 2019

jianjuns commented Nov 21, 2019

antoninbas commented Dec 4, 2019