IP may leak after agent restart #4326

wenyingd · 2022-10-21T05:24:48Z

Describe the bug

In the existing reconcile logic (), antrea-agent remove OVS ports/ OpenFlow entries / items in the memory interface store if the Pod is deleted when agent is down. But the logic does not call IPAM to recycle the allocated IP. So this may lead to a leak on the IP resources.

To Reproduce

Expected

Actual behavior

Versions:

Antrea: main branch

Additional context

github-actions · 2023-01-20T00:41:02Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

github-actions · 2023-08-30T00:03:19Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

antoninbas · 2023-10-30T22:25:09Z

I was taking a look at this old issue and I tried to reproduce the IP leakage.
Based on my tests and based on the kubelet code, it seems very difficult to reproduce this situation.
Even with the following steps:

schedule a workload Pod on nodeX
remove the antrea-agent Pod from nodeX by editing the DaemonSet's affinity
force delete the workload Pod with kubectl
wait for a few minutes
kill the kubelet service (it will restart immediately)
revert the change to the antrea-agent DaemonSet

After a few seconds (less than a minute), the new antrea-agent Pod receives a CmdDel command for the old workload Pod, and the IP is released.

It seems that the kubelet garbage collector is taking care of removing the old sandbox. When I increase kubelet log verbosity, I see:

Oct 30 22:20:02 kind-worker kubelet[37935]: I1030 22:20:02.300464   37935 kuberuntime_gc.go:175] "Removing sandbox" sandboxID="4eb3e402b4d2342278fdf12e5d3b33a206363f4ec93af271020a8736604a8442"
Oct 30 22:20:02 kind-worker kubelet[37935]: E1030 22:20:02.322027   37935 kuberuntime_gc.go:180] "Failed to stop sandbox before removing" err="rpc error: code = Unknown desc = failed to destroy network for sandbox \"4eb3e402b4d2342278fdf12e5d3b33a206363f4ec93af271020a8736604a8442\": plugin type=\"antrea\" failed (delete): rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /var/run/antrea/cni.sock: connect: connection refused\"" sandboxID="4eb3e402b4d2342278fdf12e5d3b33a206363f4ec93af271020a8736604a8442"

kubelet is able to list all sandboxes from the container runtime, and the container runtime I am using (containerd v1.7.1) will not remove the sandbox until the network has been deleted:

root@kind-worker:/# crictl pods
POD ID              CREATED             STATE               NAME                NAMESPACE           ATTEMPT             RUNTIME
5a64643d69ed8       32 minutes ago      NotReady            toolbox-rhc2g       default             0                   (default)
4eb3e402b4d23       34 minutes ago      NotReady            toolbox-qfnfr       default             1                   (default)
b49dac301f1fe       2 hours ago         Ready               kube-proxy-kz7zx    kube-system         0                   (default)

It is possible that some older versions of containerd were not that robust.

@tnqn do you think we should have a more defensive posture in Antrea and add logic to proactively release IPs on agent (re)start?

tnqn · 2023-10-31T06:00:53Z

@antoninbas I have tried the following K8s versions deployed via kind and haven't managed to reproduce IP leak with the steps you described:

v1.26.6
v1.24.15
v1.23.17
v1.22.17
v1.21.14

So I feel this case may have been taken care of in normal workflow, but it's hard to say it's handled correctly in abnormal case, for example, containerd previously had this issue: containerd/containerd#5569, and I remember you were investigating another leak with newer container versions in the last few weeks which we don't know the root cause yet? And there could be other CRIs which implement the cleanup differently. Therefore, I tend to agree that, instead of relying on kubelet/container runtime to be bug free regarding CNI cleanup, it's more robust to perform a conservative garbage collection on agent start. We should log such cleanup explicitly as we don't actually expect it unless there is a bug in container runtime or antrea.

antoninbas · 2023-10-31T16:59:35Z

I remember you were investigating another leak with newer container versions in the last few weeks which we don't know the root cause yet?

The person who reported the issue confirmed that when using a more recent version of containerd, the issue cannot be reproduced anymore (perhaps because of this fix: containerd/containerd#5904). With an older containerd version, they were able to reproduce the issue consistently.

Therefore, I tend to agree that, instead of relying on kubelet/container runtime to be bug free regarding CNI cleanup, it's more robust to perform a conservative garbage collection on agent start.

Sounds good to me. I can take a stab at it.
It probably won't be possible to validate that code with e2e testing, but we can have a unit test.

antoninbas · 2023-11-01T02:21:59Z

@tnqn I wrote some PoC code so see if I could come up with a viable approach: fbfa402

However, it feels very much like an anti-pattern: we have to craft the CNI arguments from "scratch" in order to invoke the IPAM plugin. It can work (both when delegating to host-local IPAM or when using AntreaIPAM), but it feels like a hack.

The code I wrote uses the interfaceStore (populated from the persisted OVSDB) to determine for which containers we should do "IPAM garbage collection". An alternative approach would be to maintain and persist a CNI result cache, which would be updated whenever we invoke the IPAM plugin (ideally, AntreaIPAM too). We can use libcni for this. With this approach, we can use the cached results to easily generate DEL commands for Pods that no longer exist. This may be a cleaner approach (I need to check if it works), but it's probably a bit more disruptive.

Going back to the relevance of this issue, it does feel inconsistent to treat interfaces and IP addresses differently. Either we assume that the container runtime (+ kubelet) will always eventually invoke CNI DEL, in which case that part of the reconciliation logic is not really needed at all. Or we assume that resource leak is possible under some conditions, and we have to take care of all resources consistently.

tnqn · 2023-11-01T11:02:39Z

I remember you were investigating another leak with newer container versions in the last few weeks which we don't know the root cause yet?

The person who reported the issue confirmed that when using a more recent version of containerd, the issue cannot be reproduced anymore (perhaps because of this fix: containerd/containerd#5904). With an older containerd version, they were able to reproduce the issue consistently.

I took a look at the issue and the PR, they seem ensuring CNI will be called repeatedly until it returns success. But I remember in our case there was no CNI invocation, so looks different. Do you know which version the issue can be reproduced and which can not? According to the enhancements that have been made in containerd, it does seem CNI cleanup is ensured now.

However, it feels very much like an anti-pattern: we have to craft the CNI arguments from "scratch" in order to invoke the IPAM plugin. It can work (both when delegating to host-local IPAM or when using AntreaIPAM), but it feels like a hack.

I was thinking something more hacky, accessing the IPAM dir and deleting IPs that shouldn't exist.. I remember we offered a script for users to do the cleanup when they encounted the IP leak: https://github.com/antrea-io/antrea/blob/main/hack/gc-host-local.sh.

Going back to the relevance of this issue, it does feel inconsistent to treat interfaces and IP addresses differently. Either we assume that the container runtime (+ kubelet) will always eventually invoke CNI DEL, in which case that part of the reconciliation logic is not really needed at all. Or we assume that resource leak is possible under some conditions, and we have to take care of all resources consistently.

Agree. When we added this logic, kubelet and container runtime were not that robust, notwithstanding the logic is not complete. But looking at the related issues in containerd and its fixes, I'm also wondering if it's still worth making it more complex to cover this case.

If we want to keep the logic for now, how about just using the simple (but hacky) approach to release known leaked IPs bypassing the IPAM plugin, without adding more cache/storage? It will be very specific to host-local plugin and could be broken if host-local changes its implementation, but it's the only case we have encountered in production, and this is a workaround anway. If IP leak still happens, right direction is still patching and upgrading container runtime.

antoninbas · 2023-11-01T17:48:43Z

I took a look at the issue and the PR, they seem ensuring CNI will be called repeatedly until it returns success. But I remember in our case there was no CNI invocation, so looks different. Do you know which version the issue can be reproduced and which can not? According to the enhancements that have been made in containerd, it does seem CNI cleanup is ensured now.

Yes that PR seemed like a bit of a stretch, so maybe it was some other patch.
The issue could be reproduced consistently with containerd 1.6.6, but could not be reproduced with containerd 1.6.18.
One it became apparent that the issue was because of containerd, not Antrea, and that it could not be reproduced with a more recent version of containerd, I didn't try to identify the exact patch.
We also have this in Antrea now, which avoids that issue as well: #5548

If we want to keep the logic for now, how about just using the simple (but hacky) approach to release known leaked IPs bypassing the IPAM plugin, without adding more cache/storage?

Hmm, I was thinking it would be nice to have a unified approach which also works with AntreaIPAM. Don't you think the same issue can happen there if there is a missing CNI DEL? I.e., the IPPool CR has leaked IPs.

tnqn · 2023-11-02T11:04:08Z

If we want to keep the logic for now, how about just using the simple (but hacky) approach to release known leaked IPs bypassing the IPAM plugin, without adding more cache/storage?

Hmm, I was thinking it would be nice to have a unified approach which also works with AntreaIPAM. Don't you think the same issue can happen there if there is a missing CNI DEL? I.e., the IPPool CR has leaked IPs.

I remember it has been taken care of by antrea-controller:

antrea/pkg/controller/ipam/antrea_ipam_controller.go

Lines 162 to 165 in e8f5d93

    
           // Inspect all IPPools for stale IP Address entries. 
        
           // This may happen if controller was down during StatefulSet delete event. 
        
           // If such entry is found, enqueue cleanup event for this StatefulSet. 
        
           func (c *AntreaIPAMController) cleanupStaleAddresses() {

.

antoninbas · 2023-11-02T18:05:56Z

OK, thanks for the link. If AntreaIIPAM has its own GC, then I think it's acceptable to have a solution which is specific to host-local.

During CNIServer reconciliation, we perform host-local IPAM garbage collection (GC) by comparing the set of IPs allocated to local Pods and the set of IPs currently reserved by the plugin. We release any IP reserved by the plugin that is not in-use by a local Pod. The purpose is to avoid leaking IP addresses when there is a bug in the container runtime, which has happened in the past. Two key design choices that were made: * We do not invoke CNI DEL to release IPs, instead we access the host-local data which is persisted on the Node, and modify it as needed. * We do not rely on the interface store (as persisted to OVSDB) to determine the set of IPs that may have been leaked. In case of an Antrea bug, it could be possible (although unlikely) for an IP to still be allocated by host-local but be missing from the interface store. Intead, we list all allocated IPs from the host-local data (an allocated IP corresponds to one disk file). This approach is essentially the same as our existing script: https://github.com/antrea-io/antrea/blob/main/hack/gc-host-local.sh Fixes antrea-io#4326 Signed-off-by: Antonin Bas <[email protected]>

During CNIServer reconciliation, we perform host-local IPAM garbage collection (GC) by comparing the set of IPs allocated to local Pods and the set of IPs currently reserved by the plugin. We release any IP reserved by the plugin that is not in-use by a local Pod. The purpose is to avoid leaking IP addresses when there is a bug in the container runtime, which has happened in the past. Two key design choices that were made: * We do not invoke CNI DEL to release IPs, instead we access the host-local data which is persisted on the Node, and modify it as needed. * We do not rely on the interface store (as persisted to OVSDB) to determine the set of IPs that may have been leaked. In case of an Antrea bug, it could be possible (although unlikely) for an IP to still be allocated by host-local but be missing from the interface store. Instead, we list all allocated IPs from the host-local data (an allocated IP corresponds to one disk file). This approach is essentially the same as our existing script: https://github.com/antrea-io/antrea/blob/main/hack/gc-host-local.sh Fixes #4326 Signed-off-by: Antonin Bas <[email protected]>

wenyingd added the kind/bug Categorizes issue or PR as related to a bug. label Oct 21, 2022

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 21, 2023

tnqn removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 31, 2023

tnqn reopened this May 31, 2023

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 30, 2023

antoninbas removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 30, 2023

antoninbas self-assigned this Oct 31, 2023

antoninbas added this to the Antrea v1.15 release milestone Nov 2, 2023

antoninbas mentioned this issue Nov 3, 2023

Add host-local IPAM GC on startup #5660

Merged

antoninbas closed this as completed in #5660 Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IP may leak after agent restart #4326

IP may leak after agent restart #4326

wenyingd commented Oct 21, 2022 •

edited

Loading

github-actions bot commented Jan 20, 2023

github-actions bot commented Aug 30, 2023

antoninbas commented Oct 30, 2023

tnqn commented Oct 31, 2023

antoninbas commented Oct 31, 2023

antoninbas commented Nov 1, 2023

tnqn commented Nov 1, 2023

antoninbas commented Nov 1, 2023

tnqn commented Nov 2, 2023

antoninbas commented Nov 2, 2023

IP may leak after agent restart #4326

IP may leak after agent restart #4326

Comments

wenyingd commented Oct 21, 2022 • edited Loading

github-actions bot commented Jan 20, 2023

github-actions bot commented Aug 30, 2023

antoninbas commented Oct 30, 2023

tnqn commented Oct 31, 2023

antoninbas commented Oct 31, 2023

antoninbas commented Nov 1, 2023

tnqn commented Nov 1, 2023

antoninbas commented Nov 1, 2023

tnqn commented Nov 2, 2023

antoninbas commented Nov 2, 2023

wenyingd commented Oct 21, 2022 •

edited

Loading