-
Notifications
You must be signed in to change notification settings - Fork 670
weave losing connections to other nodes with error: Multiple connections (Kubernetes CNI) #3619
Comments
I see two issues. First there are heartbeat misses. Under normal circumstances this should not have happened.
Second, looks like
Could you please enable DEBUG logging and share the logs when you encounter this issue again? |
It happened again, so here are the weave outputs in failure state:
Will enable debug logging now too. Probably noteworthy is that there is heavy UDP traffic going on already in the cluster. Is there maybe a parameter to do the heartbeat over TCP? |
Its over UDP only. Also weave considers a hearbeat miss only on consecutive(hard-coded to be 6) misses. DEBUG logs should help to see whats going on. |
weave.log
|
I think we're hitting the same issue. Symptoms look identical. We have nodes with these log chunks repeating over and over: On
On
|
We've experienced the same error a couple of times now, each time hitting a node that hosts a Traefik load balancer instance. We're seeing the same pattern of this block repeating:
We're seeing the same log messages as @mikebryant . In each case rebooting/terminating the node resolves the issue but restarting Weave on the node without doing that does not. The symptoms are that the Weave Net appears to be otherwise healthy except that all connections are reported as "sleeve" (all good nodes report "fastdp") and all packets that would transit the Weave network are dropped with "no route to host". iptables rules appear to be normal and consistent with good nodes. I've attached reports generated by both the unhealthy node and a health node for comparison. Hopefully they will help shed some light on this: I also enabled debug logging so hopefully I have some more concrete information to report if this happens again. |
Hello, Is there a workarround or a version that is not affected by this bug ? |
Common theme across the shared logs is
@jntakpe We dont the root cause yet to suggest a work around. |
we faced the same issue too. logs repeated over again and again
|
any workaround to this ? I'm using weave 2.5.2 on kubernetes version 1.11.10 and seeing these errors continuously. In addition, I'm also seeing these errors
I'm using default MTU value = 8912. Does this mean my MTU configuration is incorrect ? Do I need to update it ? |
Please share your logs to reason why you are seeing this error Note that original issue is case where there no working forwarders found, and there were heartbeat timeouts etc. Please see if you have similar symptoms else open a new issue with relevant logs.
Regarding MTU, are you using |
@murali-reddy sure. let me share the log lines. Also i'm using fastdp mode only
weave status
weave connections status
|
What OS are you guys using? We've encountered tons of See #3: https://www.weave.works/blog/running-a-weave-network-on-coreos/ Since I'm using Kops, adding a drop-in to install this file:
Seems to have helped. [edit] Made the example drop-in more explicit |
@mars64 thanks for the input, but this resolution is for
|
I'm saying that I encountered many Re: MTU, please see @murali-reddy's comment on your other issue. |
Have same issue: Logs:
|
After last issue reported, I've enabled DEBUG logging, and it happened again. Pls check attached log. And as usual, issue fixed after restart of all weave pods. |
Same issue on K8s 1.16 and CoreOS stable (2303.3.0). It correlated exactly with memory pressure caused by a pod running on the affected nodes. Weave started logging what others have posted here, when the node hit around 150MB-0MB free memory reported by the Prometheus node-exporter. So far, I could correlate 3 incidents with memory pressure. Is anyone else seeing this correlation? |
This is the same as my issue I think Any help is appreciated. |
The original trail is long-dead and it's too confusing to respond to multiple threads of conversation in a GitHub issue; please open a new issue rather than commenting on this. The new issue template will request info that is essential to debug. Note that "multiple connections" is not really a problem; just a transient condition that gets reported in passing. People commenting here were having other issues. |
I faced the same issue, If you see the message: IP allocation was seeded by different peers, this means that some Weave Net peers were initialized into one cluster and some into another cluster; Weave Net cannot operate in this state. |
What you expected to happen?
Inter-node cluster-internal traffic to work
What happened?
At random times one nodes' pod network becomes unreachable/can't connect to other nodes' pod-network.
Nodes internal traffic still works
Deleting the pod fixes the issue temporarily
Anything else we need to know?
Baremetal deployment with 3 nodes (1 master, 2 workers), metallb in L2 mode and
WEAVE_MTU
set to1500
andNO_MASQ_LOCAL
set to1
Versions:
Logs:
Error occurred around 08:36 server time
weave status in working state:
I will try to get the weave outputs during failure state, but didn't have the weave script installed at the time/looked up weave troubleshooting and had to get the issue fixed asap.
The text was updated successfully, but these errors were encountered: