You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.
When a new node joins the cluster, any existing node should not become unroutable.
What happened?
Today we got one more unroutable alert for one of the kubernetes node (10.2.20.238). We saw that node became unroutable just after a new node 10.2.20.227 joined the cluster.
When i say healthy or routable i mean curl node_ip:node_port/endpoint has started working
I0227 05:40:23 > 10.2.20.238 node unhealthy event 👎 (continuously unhealthy till 05:51:54)
I0227 05:50:25 > 10.2.20.227 became healthy for the first time(routable) 👍
I0227 05:51:54 > 10.2.20.238 got healthy 👍
I0227 06:01:15 > 10.2.20.227 node delete event
How to reproduce it?
Not sure. May be if the node with this ip joins again in the current network, this can be reproduced again. I am keeping an eye. I will update it here when i see a pattern or able to reproduce it.
Anything else we need to know?
kops1.15.0 made cluster.
Versions:
$ weave version
2.6.0
$ docker version
18.06.3-ce
$ uname -a
Linux ip-10-2-21-229 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u1 (2019-09-20) x86_64 GNU/Linux
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:16:51Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:07:57Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
MTU Setting
admin@ip-10-2-20-238:~$ sudo ifconfig| grep -i MTU | grep -v veth
datapath: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 8912
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
vxlan-6784: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65485
weave: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 8912
If you check the weave log of 10.2.20.238 ^
The weave log is filled up with the Captured frame from MAC issues after 10.2.20.227 joined the cluster and this 10.2.20.238 was continuosly unhealthy after that.
Also seeing the same error message with Weave 2.8.1, we are not seeing this behavior in our clusters still running Weave 2.7.0
EDIT: I believe our problems were mostly because we upgraded to Weave 2.8 without using the new DaemonSet that was introduced. So we were using the DaemonSet for v2.7 with the 2.8 image of Weave.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Creating this based on #2877 (comment)
What you expected to happen?
When a new node joins the cluster, any existing node should not become unroutable.
What happened?
Today we got one more unroutable alert for one of the kubernetes node (
10.2.20.238
). We saw that node became unroutable just after a new node10.2.20.227
joined the cluster.When i say healthy or routable i mean curl node_ip:node_port/endpoint has started working
Events
I0227 05:39:49 > 10.2.20.227 node add event
I0227 05:40:23 > 10.2.20.238 node unhealthy event 👎 (continuously unhealthy till 05:51:54)
I0227 05:50:25 > 10.2.20.227 became healthy for the first time(routable) 👍
I0227 05:51:54 > 10.2.20.238 got healthy 👍
I0227 06:01:15 > 10.2.20.227 node delete event
How to reproduce it?
Not sure. May be if the node with this ip joins again in the current network, this can be reproduced again. I am keeping an eye. I will update it here when i see a pattern or able to reproduce it.
Anything else we need to know?
kops1.15.0 made cluster.
Versions:
MTU Setting
Logs:
Weave logs of
10.2.20.238
which got unhealthyhttps://gist.github.com/alok87/5b99d5b07b01306c5f1f34c3eb0f1025
If you check the weave log of
10.2.20.238
^The weave log is filled up with the
Captured frame from MAC
issues after10.2.20.227
joined the cluster and this10.2.20.238
was continuosly unhealthy after that.You can see there were like 541 errors only for 227 node.
The text was updated successfully, but these errors were encountered: