Skip to content
This repository has been archived by the owner on Mar 26, 2018. It is now read-only.

Error messages when pod is deleted #32

Closed
chrislovecnm opened this issue Oct 9, 2016 · 18 comments
Closed

Error messages when pod is deleted #32

chrislovecnm opened this issue Oct 9, 2016 · 18 comments

Comments

@chrislovecnm
Copy link

chrislovecnm commented Oct 9, 2016

I am seeing a couple of quirky things. I have installed weave-kube, and working on getting it running using kops - kubernetes/kops#621

Running K8s 1.4 with the latest weave yaml. AWS vpc cloud hosted environment.

It is working since Cassandra nodes are able to communicate with each other via gossip. Able to delete and expand nodes.

First

Oct  9 04:09:44 ip-172-20-111-165 kubelet[999]: E1009 04:09:44.019978     999 kubelet.go:2095] Failed killing the pod "cassandra-5": failed to "TeardownNetwork" for "cassandra-5_default" with TeardownNetworkError: "Failed to teardown network for pod \"b4f4f534-8dd5-11e6-a44d-02c1f7bba931\" using network plugins \"cni\": no such file or directory"

I am also getting errors that the default cni directory does not exist as well.

Thoughts?

cc: @kris-nova

@bboreham
Copy link
Contributor

Could you post a bit more of the logs please? I'm not clear what files are not existing.

The standard configuration puts binaries in /opt/cni/bin and configuration in /etc/cni/net.d. If the former directory is found unwritable (e.g. on Google ContainerVM) then it uses /home/kubernetes/bin

@chrislovecnm
Copy link
Author

From a node that is tearing down a pod:

syslog:Oct 13 22:44:21 ip-172-20-38-108 kubelet[850]: E1013 22:44:21.684951     850 cni.go:163] error updating cni config: No networks found in /etc/cni/net.d
syslog:Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: E1013 23:44:57.162043     850 cni.go:273] Error deleting network: no such file or directory
syslog:Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: E1013 23:44:57.162481     850 docker_manager.go:1419] Failed to teardown network for pod "ac46745c-919d-11e6-80cb-022fd0fb199d" using network plugins "cni": no such file or directory
syslog:Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: E1013 23:44:57.166057     850 kubelet.go:2095] Failed killing the pod "cassandra-3": failed to "TeardownNetwork" for "cassandra-3_default" with TeardownNetworkError: "Failed to teardown network for pod \"ac46745c-919d-11e6-80cb-022fd0fb199d\" using network plugins \"cni\": no such file or directory"

Weave is up and pods can communicate between each other, so I am guessing that it is working. I can recreate this, so please let me know what you need.

@chrislovecnm
Copy link
Author

Here is more of the full log

ct 13 23:44:57 ip-172-20-38-108 kubelet[850]: I1013 23:44:57.091365     850 kubelet.go:2331] SyncLoop (PLEG): "cassandra-3_default(ac46745c-919d-11e6-80cb-022fd0fb199d)", event: &pleg.PodLifecycleEvent{ID:"ac46745c-919d-11e6-80cb-022fd0fb199d", Type:"ContainerDied", Data:"8a23a0d1194b221b6726846d923742363094e4e933b99a6ec4030999ca47c03b"}
Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: I1013 23:44:57.092064     850 kubelet.go:2331] SyncLoop (PLEG): "cassandra-3_default(ac46745c-919d-11e6-80cb-022fd0fb199d)", event: &pleg.PodLifecycleEvent{ID:"ac46745c-919d-11e6-80cb-022fd0fb199d", Type:"ContainerDied", Data:"210f865d0ee7067119ad97b7584541553b815f0bf2ef400f89072c36b85f5c7e"}
Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: I1013 23:44:57.135933     850 kubelet.go:2312] SyncLoop (DELETE, "api"): "cassandra-3_default(ac46745c-919d-11e6-80cb-022fd0fb199d)"
Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: I1013 23:44:57.151514     850 kubelet.go:2306] SyncLoop (REMOVE, "api"): "cassandra-3_default(ac46745c-919d-11e6-80cb-022fd0fb199d)"
Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: I1013 23:44:57.152062     850 kubelet.go:2092] Killing unwanted pod "cassandra-3"
Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: I1013 23:44:57.152532     850 docker_manager.go:1492] Killing container "8a23a0d1194b221b6726846d923742363094e4e933b99a6ec4030999ca47c03b cassandra default/cassandra-3" with 0 second grace period
Oct 13 23:44:57 ip-172-20-38-108 docker[945]: time="2016-10-13T23:44:57.153455647Z" level=error msg="Handler for POST /containers/8a23a0d1194b221b6726846d923742363094e4e933b99a6ec4030999ca47c03b/stop returned error: Container 8a23a0d1194b221b6726846d923742363094e4e933b99a6ec4030999ca47c03b is already stopped"
Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: I1013 23:44:57.154042     850 docker_manager.go:1531] Container "8a23a0d1194b221b6726846d923742363094e4e933b99a6ec4030999ca47c03b cassandra default/cassandra-3" exited after 1.057256ms
Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: W1013 23:44:57.154479     850 docker_manager.go:1537] No ref for pod '"8a23a0d1194b221b6726846d923742363094e4e933b99a6ec4030999ca47c03b cassandra default/cassandra-3"'
Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: E1013 23:44:57.162043     850 cni.go:273] Error deleting network: no such file or directory
Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: E1013 23:44:57.162481     850 docker_manager.go:1419] Failed to teardown network for pod "ac46745c-919d-11e6-80cb-022fd0fb199d" using network plugins "cni": no such file or directory
Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: I1013 23:44:57.163941     850 docker_manager.go:1492] Killing container "210f865d0ee7067119ad97b7584541553b815f0bf2ef400f89072c36b85f5c7e default/cassandra-3" with 0 second grace period
Oct 13 23:44:57 ip-172-20-38-108 docker[945]: time="2016-10-13T23:44:57.164717002Z" level=error msg="Handler for POST /containers/210f865d0ee7067119ad97b7584541553b815f0bf2ef400f89072c36b85f5c7e/stop returned error: Container 210f865d0ee7067119ad97b7584541553b815f0bf2ef400f89072c36b85f5c7e is already stopped"

@bboreham
Copy link
Contributor

There is an hour between "No networks found in /etc/cni/net.d" and the next message, so maybe that comes from when the cluster was being set up? That would be normal.

I'm still mystified about the "no such file or directory". If you are able to re-run kubelet with -v=4 we would see a bit more of what it thinks it is doing.

@chrislovecnm
Copy link
Author

chrislovecnm commented Oct 14, 2016

Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: E1013 23:44:57.162043     850 cni.go:273] Error deleting network: no such file or directory
Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: E1013 23:44:57.162481     850 docker_manager.go:1419] Failed to teardown network for pod "ac46745c-919d-11e6-80cb-022fd0fb199d" using network plugins "cni": no such file or directory
Oct 13 23:44:57 ip-172-20-38-108 kubelet[850]: I1013 23:44:57.163941     850 docker_manager.go:1492] Killing container "210f865d0ee7067119ad97b7584541553b815f0bf2ef400f89072c36b85f5c7e default/cassandra-3" with 0 second grace period

The above is happening right after I delete the pod. Let me dig through the code and turn up the logging.

@chrislovecnm
Copy link
Author

@bboreham here is the block of code that is being called, and the culprit is cninet.DelNetwork(netconf, rt)

func (network *cniNetwork) deleteFromNetwork(podName string, podNamespace string, podInfraContainerID kubecontainer.ContainerID, podNetnsPath string) error {
    rt, err := buildCNIRuntimeConf(podName, podNamespace, podInfraContainerID, podNetnsPath)
    if err != nil {
        glog.Errorf("Error deleting network: %v", err)
        return err
    }

    netconf, cninet := network.NetworkConfig, network.CNIConfig
    glog.V(4).Infof("About to run with conf.Network.Type=%v", netconf.Network.Type)
    err = cninet.DelNetwork(netconf, rt)
    if err != nil {
        glog.Errorf("Error deleting network: %v", err)
        return err
    }
    return nil
}

@chrislovecnm
Copy link
Author

chrislovecnm commented Oct 17, 2016

Digging more through the code I am seeing that we are executing the plugin, and the error seems to be coming from execing the plugin.

https://github.com/containernetworking/cni/blob/v0.3/pkg/invoke/exec.go#L59

Going to try to turn up logging on weave-net first.

@chrislovecnm chrislovecnm changed the title Log messages when running - a couple of questions Log messages when running - unable to delete network Oct 17, 2016
@chrislovecnm chrislovecnm changed the title Log messages when running - unable to delete network Error messages when pod is deleted Oct 17, 2016
@chrislovecnm
Copy link
Author

So when weave-net is being called by kubelet, I think it is failing.

execve("/opt/cni/bin/weave-net", ["/opt/cni/bin/weave-net"], ["MAIL=/var/mail/root", "USER=root", "SHLVL=1", "HOME=/root", "LOGNAME=root", "_=./run-kubelet.sh", "TERM=xterm-256color", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "LANG=en_US.UTF-8", "SHELL=/bin/bash", "PWD=/root", "CNI_COMMAND=DEL", "CNI_CONTAINERID=83de8a02619b125eca81360b96d8304168824bba3d047e2cc4c543e74b9dab41", "CNI_NETNS=/proc/0/ns/net", "CNI_ARGS=IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=busybox0;K8S_POD_INFRA_CONTAINER_ID=83de8a02619b125eca81360b96d8304168824bba3d047e2cc4c543e74b9dab41", "CNI_IFNAME=eth0", "CNI_PATH=/opt/cni/bin:/opt/weave-net/bin"]) = 0

@bboreham
Copy link
Contributor

OK, but why is it returning "No such file or directory" ? Is the file there? Is there a missing dynamic library? (run ldd /opt/cni/bin/weave-net)

@chrislovecnm
Copy link
Author

chrislovecnm commented Oct 18, 2016

No you guys made it as a static executable.

root@ip-172-20-114-111:/opt/cni/bin# ldd weave-net
    not a dynamic executable

@chrislovecnm
Copy link
Author

To add into:

admin@ip-172-20-108-48:~$ uname -a
Linux ip-172-20-108-48 4.4.23-k8s #1 SMP Fri Sep 30 13:51:02 UTC 2016 x86_64 GNU/Linux
admin@ip-172-20-108-48:~$ cat /etc/debian_version
8.6

@bboreham
Copy link
Contributor

Aha!

execve( ... "CNI_NETNS=/proc/0/ns/net", ...

There is, most likely "no such file or directory" 🙂

So, I think the weave-net binary should do a better job of reporting what went wrong, but given that evidence it seems Kubernetes is asking weave-net to do something weird, and weave-net is unable to comply.

@errordeveloper
Copy link
Contributor

@chrislovecnm could out please provide exact version of kubelet you are using? (Just to be sure).

@chrislovecnm
Copy link
Author

admin@ip-172-20-108-48:~$ kubelet --version
Kubernetes v1.4.0

@bboreham
Copy link
Contributor

Suspected to be caused by kubernetes/kubernetes#20379

@chrislovecnm
Copy link
Author

@bboreham you want me to close this?

@bboreham
Copy link
Contributor

If you're satisfied, feel free to close.

@bboreham
Copy link
Contributor

I improved the error messages at weaveworks/weave@b38a55c

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants