-
Notifications
You must be signed in to change notification settings - Fork 670
weave-npc not adding running pod's IP existing entries during transition from pending to running #3610
Comments
I had to delete the pod to resolve the issue. It fixed the issue b/c the pod received a new IP address from weave that was different that the previous IP. |
Thanks for the detailed report. The way the code works is it compares the previous version of the pod data, so if you could dig out those from your log, or just post the entire log as an attachment, this would help to understand what happened. In particular, I don't think |
ok, digging that up. |
@bboreham , I will close this since I think I jumped to conclusions. The end result was that there was a pod unreachable within the weave-net overlay, but I would like to do more investigations to figure out why. I don't think it's b/c of status changes. |
@bboreham Jut to followup, and give a reason why I'm closing this. I had two nodes in our cluster, where every pod on the node was unreachable. It wasn't just one pod, which is what I thought at first. From the logs, its looks like I had a mac conflict? I'm running in AWS. It's like I had two EC2 instances sending out the same MAC/LAN interface? I see these errors since I initially upgraded weave to version 2.5.1 on the EC2 nodes: ERRO: 2019/03/04 15:03:19.562815 Captured frame from MAC (02:a7:12:86:21:25) to (ea:a5:0a:71:d9:c2) associated with another peer ce:87:81:de:c9:70(ip-10-202-20-71.ec2.internal) and the weave-net log is saturated with these messages up until the current time: ERRO: 2019/03/08 04:53:00.829664 Captured frame from MAC (0e:8c:b4:52:64:c3) to (26:a3:81:ae:25:48) associated with another peer 12:23:6c:24:3a:20(ip-10-202-20-112.ec2.internal) I am just destroying the ec2 instances for now. |
Some discussion at #2877 of the "Captured frame from MAC" message. |
What you expected to happen?
When the pod transitioned to running, the IP address should be recognized by weave and added to its collection of running IP addresses (update routes, update network policy structures)
What happened?
the IP address was not reachable from any pod within the weave network
How to reproduce it?
I saw a pod transition to pending, which in our case causes an autoscaling event
. This event (switch to pending) caused all of the weave-npc controllers to remove the IP from their meta-data structures: I saw these events from every weave-npc in the daemonset:
INFO: 2019/03/08 16:15:21.767275 deleting entry 100.106.0.4 from weave-%.n=[2]4bcU)@h+_HH}[kHl}x of 0a901e63-3533-11e9-8abb-0a2c58af28b4
But, when the pod was scheduled, and the status switched to running, I did not see weave add the entries back. The ip address stayed the same when it was running again.
But, the pod is now not reachable.
Anything else we need to know?
aws cloud provider. Using weave-net version 2.5.1
Versions:
k8s version 1.11.7
$ weave version 2.5.1
$ docker version
$ uname -a
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T19:45:23Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.6", GitCommit:"b1d75deca493a24a2f87eb1efde1a569e52fc8d9", GitTreeState:"clean", BuildDate:"2018-12-16T04:30:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Logs:
or, if using Kubernetes:
the logged event that caused the removal
DEBU: 2019/03/08 16:15:21.779114 EVENT UpdatePod {"metadata":{"creationTimestamp":"2019-02-20T17:14:54Z","deletionGracePeriodSeconds":30,"deletionTimestamp":"2019-03-08T16:15:20Z","generateName":"kibana-68db7c85b4-","labels":{"app":"kibana","es-proxy-client":"true","internalcert":"wildcard","pod-template-hash":"2486374160","wildcardcertmanager":"aware"},"name":"kibana-68db7c85b4-9hcbn","namespace":"qa-logging","resourceVersion":"248625274","selfLink":"/api/v1/namespaces/qa-logging/pods/kibana-68db7c85b4-9hcbn","uid":"0a901e63-3533-11e9-8abb-0a2c58af28b4"},"spec":{"containers":[{"image":"docker.elastic.co/kibana/kibana-oss:6.4.2","imagePullPolicy":"IfNotPresent","name":"kibana","ports":[{"containerPort":5601,"name":"ui","protocol":"TCP"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}],"dnsPolicy":"ClusterFirst","nodeName":"ip-10-202-20-134.ec2.internal","priority":0,"restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"serviceAccount":"default","serviceAccountName":"default","terminationGracePeriodSeconds":30},"status":{"conditions":[{"lastProbeTime":null,"lastTransitionTime":"2019-02-20T17:14:54Z","status":"True","type":"Initialized"},{"lastProbeTime":null,"lastTransitionTime":"2019-02-20T17:18:46Z","status":"True","type":"Ready"},{"lastProbeTime":null,"lastTransitionTime":null,"status":"True","type":"ContainersReady"},{"lastProbeTime":null,"lastTransitionTime":"2019-02-20T17:14:54Z","status":"True","type":"PodScheduled"}],"hostIP":"10.202.20.134","phase":"Running","podIP":"100.106.0.4","qosClass":"Burstable","startTime":"2019-02-20T17:14:54Z"}} {"metadata":{"creationTimestamp":"2019-02-20T17:14:54Z","deletionGracePeriodSeconds":30,"deletionTimestamp":"2019-03-08T16:15:20Z","generateName":"kibana-68db7c85b4-","labels":{"app":"kibana","es-proxy-client":"true","internalcert":"wildcard","pod-template-hash":"2486374160","wildcardcertmanager":"aware"},"name":"kibana-68db7c85b4-9hcbn","namespace":"qa-logging","resourceVersion":"248625512","selfLink":"/api/v1/namespaces/qa-logging/pods/kibana-68db7c85b4-9hcbn","uid":"0a901e63-3533-11e9-8abb-0a2c58af28b4"},"spec":{"containers":[{"image":"docker.elastic.co/kibana/kibana-oss:6.4.2","imagePullPolicy":"IfNotPresent","name":"kibana","ports":[{"containerPort":5601,"name":"ui","protocol":"TCP"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}],"dnsPolicy":"ClusterFirst","nodeName":"ip-10-202-20-134.ec2.internal","priority":0,"restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"serviceAccount":"default","serviceAccountName":"default","terminationGracePeriodSeconds":30},"status":{"conditions":[{"lastProbeTime":null,"lastTransitionTime":"2019-02-20T17:14:54Z","status":"True","type":"Initialized"},{"lastProbeTime":null,"lastTransitionTime":"2019-03-08T16:15:21Z","message":"containers with unready status: [kibana]","reason":"ContainersNotReady","status":"False","type":"Ready"},{"lastProbeTime":null,"lastTransitionTime":null,"message":"containers with unready status: [kibana]","reason":"ContainersNotReady","status":"False","type":"ContainersReady"},{"lastProbeTime":null,"lastTransitionTime":"2019-02-20T17:14:54Z","status":"True","type":"PodScheduled"}],"hostIP":"10.202.20.134","phase":"Pending","qosClass":"Burstable","startTime":"2019-02-20T17:14:54Z"}}
The logged event that did not cause the addition of the IP back:
DEBU: 2019/03/08 16:14:50.177228 EVENT UpdatePod {"metadata":{"creationTimestamp":"2019-02-20T17:14:54Z","generateName":"kibana-68db7c85b4-","labels":{"app":"kibana","es-proxy-client":"true","internalcert":"wildcard","pod-template-hash":"2486374160","wildcardcertmanager":"aware"},"name":"kibana-68db7c85b4-9hcbn","namespace":"qa-logging","resourceVersion":"239696997","selfLink":"/api/v1/namespaces/qa-logging/pods/kibana-68db7c85b4-9hcbn","uid":"0a901e63-3533-11e9-8abb-0a2c58af28b4"},"spec":{"containers":[{"image":"docker.elastic.co/kibana/kibana-oss:6.4.2","imagePullPolicy":"IfNotPresent","name":"kibana","ports":[{"containerPort":5601,"name":"ui","protocol":"TCP"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}],"dnsPolicy":"ClusterFirst","nodeName":"ip-10-202-20-134.ec2.internal","priority":0,"restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"serviceAccount":"default","serviceAccountName":"default","terminationGracePeriodSeconds":30},"status":{"conditions":[{"lastProbeTime":null,"lastTransitionTime":"2019-02-20T17:14:54Z","status":"True","type":"Initialized"},{"lastProbeTime":null,"lastTransitionTime":"2019-02-20T17:18:46Z","status":"True","type":"Ready"},{"lastProbeTime":null,"lastTransitionTime":null,"status":"True","type":"ContainersReady"},{"lastProbeTime":null,"lastTransitionTime":"2019-02-20T17:14:54Z","status":"True","type":"PodScheduled"}],"hostIP":"10.202.20.134","phase":"Running","podIP":"100.106.0.4","qosClass":"Burstable","startTime":"2019-02-20T17:14:54Z"}} {"metadata":{"creationTimestamp":"2019-02-20T17:14:54Z","deletionGracePeriodSeconds":30,"deletionTimestamp":"2019-03-08T16:15:20Z","generateName":"kibana-68db7c85b4-","labels":{"app":"kibana","es-proxy-client":"true","internalcert":"wildcard","pod-template-hash":"2486374160","wildcardcertmanager":"aware"},"name":"kibana-68db7c85b4-9hcbn","namespace":"qa-logging","resourceVersion":"248625274","selfLink":"/api/v1/namespaces/qa-logging/pods/kibana-68db7c85b4-9hcbn","uid":"0a901e63-3533-11e9-8abb-0a2c58af28b4"},"spec":{"containers":[{"image":"docker.elastic.co/kibana/kibana-oss:6.4.2","imagePullPolicy":"IfNotPresent","name":"kibana","ports":[{"containerPort":5601,"name":"ui","protocol":"TCP"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File"}],"dnsPolicy":"ClusterFirst","nodeName":"ip-10-202-20-134.ec2.internal","priority":0,"restartPolicy":"Always","schedulerName":"default-scheduler","securityContext":{},"serviceAccount":"default","serviceAccountName":"default","terminationGracePeriodSeconds":30},"status":{"conditions":[{"lastProbeTime":null,"lastTransitionTime":"2019-02-20T17:14:54Z","status":"True","type":"Initialized"},{"lastProbeTime":null,"lastTransitionTime":"2019-02-20T17:18:46Z","status":"True","type":"Ready"},{"lastProbeTime":null,"lastTransitionTime":null,"status":"True","type":"ContainersReady"},{"lastProbeTime":null,"lastTransitionTime":"2019-02-20T17:14:54Z","status":"True","type":"PodScheduled"}],"hostIP":"10.202.20.134","phase":"Running","podIP":"100.106.0.4","qosClass":"Burstable","startTime":"2019-02-20T17:14:54Z"}}
Network:
The text was updated successfully, but these errors were encountered: