-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calico CNI command never terminates, which causes a dockershim server handler to stuck forever #2098
Comments
Yeah, sounds like we should be handling this better. Do we know which call is hanging in this specific instance? |
I've gathered some corresponding logs. Issue is caused by removing workload kubelet:
Full kubelet log: calico-node:
Full calico-node log: dmesg:
I hope it'll be useful. |
If the plugin (either networking or IPAM) takes more than 30 seconds then panic. Fixes projectcalico/calico#2098
To give some context it's based on already closed issue:
#1109
If this is redundant feel free to close it.
Please take a look at comment which is crucial:
kubernetes/kubernetes#45419 (comment)
Summary:
The bug is caused by a calico CNI command that never terminates, which causes a dockershim server handler to stuck forever. As a result, the RPC calls PodSandboxStatus() for a bad pod always timeout, thus makes the PLEG to be unhealthy.
Expected Behavior
Calico CNI command timeout and terminates with success or failure.
Current Behavior
Calico CNI command that never terminates.
This bug could be probably also fixed in CNI (or kubelet): containernetworking/cni#568
But fixing it directly in calico can increase resiliency of the system.
The text was updated successfully, but these errors were encountered: