-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete OVS port and flows before releasing Pod IP #5788
Conversation
Pod IP is recorded as OVS port attribute and used in Pod specific flows. If releasing Pod IP before deleting OVS port and flows, it could happen that multiple Pod ports and flows reference to the same Pod IP as the same time in some corner cases and lead to corrupted network. This commit ensures all resources that reference to Pod IP are deleted before releasing the IP. Signed-off-by: Quan Tian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
klog.InfoS("Deleted interfaces for container", "container", cniConfig.ContainerId) | ||
|
||
// Release IP to IPAM driver | ||
if err := ipam.ExecIPAMDelete(cniConfig.CniCmdArgs, cniConfig.K8sArgs, cniConfig.IPAM.Type, infraContainer); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one point I was trying to make earlier (in the other PR) is that the main reason we have observed (AFAIK) for a CNI DEL to fail is OVS cleanup. As long as s.podConfigurator.removeInterfaces
keeps failing, we will not be releasing the Pod IP address, so it will not be available for new Pods.
However, it's probably fair to assume that if s.podConfigurator.removeInterfaces
keeps failing, then s.podConfigurator.configureInterfaces
cannot also succeed consistently, so Pod creation is impacted anyway. So releasing the IP address first would not help in this case.
The observation here is that the pool of IPs is much smaller than the pool of OVS resources (e.g. OF ports). But I don't know if it is that relevant. Typically you want to release things in the reverse order compared to allocation, and your point about OVS flows referencing the IP address makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, creating resource should be more demanding than releasing resources. Releasing IPs first wouldn't help much in theory. Besides:
- The issue didn't happen only when there is something wrong but could also happen when Pod A is being created and Pod B is being deleted, and there is no other available IP.
- We should avoid creating a case that multiple Pods share the same IP as IP is the identifier in datapath, it may lead to misbehavior to Service and NetworkPolicy. If there is a case cleaning up OVS resources keeps failing, releasing the IP to another Pod could make another Pod be applied this Pod's Policy and used as an irrelavant Service's endpoint wrongly.
/test-all |
Pod IP is recorded as OVS port attribute and used in Pod specific flows. If releasing Pod IP before deleting OVS port and flows, it could happen that multiple Pod ports and flows reference to the same Pod IP as the same time in some corner cases and lead to corrupted network.
This commit ensures all resources that reference to Pod IP are deleted before releasing the IP.
Thanks @antoninbas for suggesting this solution.