-
Notifications
You must be signed in to change notification settings - Fork 670
"dial tcp: i/o timeout" issues with FastDP and Kubernetes Services. #3605
Comments
Was able to isolate logs better, getting
12:cc:f5:35:61:28 is the weave bridge IP.
|
OpenShift and Weave Net are implemented in completely different ways; there is no reason to suppose that a similar symptom in one would be related to another.
This can happen in harmful and non-harmful cases - see #2808 From the Weave logs, this is a sign of something definitely wrong:
this means that several heartbeats were missed. Yet those nodes continue to appear in later log messages. These connections seem to be working better: (including some overlap with the above set)
Is there any correlation between pods that work and don't work, and those two sets of nodes above? I understand from Slack your symptoms are intermittent, so it would be useful to know any correlation between when the problem hit and the time in the logs. |
What you expected to happen?
When a pod comes up it's able to hit any internal service consistently,
kubernetes.default.svc
for this issue.What happened?
Getting
error: couldn't get deployment devicebroker-12: Get https://172.30.0.1:443/api/v1/namespaces/v14-rapyuta-core/replicationcontrollers/devicebroker-12: dial tcp 172.30.0.1:443: i/o timeout
How to reproduce it?
Happens with openshift deployer pods that bring up a new pods when a deployment is rolled, little sporadic but can be easily reproduced.
Anything else we need to know?
Looks like it was an issue in Openshift SDN and was fixed by changing a flow.
openshift/origin#5796
The fix (?) https://github.com/openshift/openshift-sdn/pull/236/files
Since weave also uses ovs, could it be related?
Network policies are enabled for some name spaces in the cluster.
Versions:
Openshift 3.9 running on Microsoft Azure.
Openshift version
oc v3.9.0+ba7faec-1
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO
Uname
Linux oc-master-0 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 29 14:49:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Weave
/home/weave # ./weave --local status
PeerDiscovery: enabled
Targets: 9
Connections: 18 (17 established, 1 failed)
Peers: 18 (with 306 established connections)
TrustedSubnets: none
DefaultSubnet: 10.32.0.0/16
Docker
Client:
Version: 1.13.1
API version: 1.26
Package version: docker-1.13.1-88.git07f3374.el7.centos.x86_64
Go version: go1.9.4
Git commit: 07f3374/1.13.1
Built: Fri Dec 7 16:13:51 2018
OS/Arch: linux/amd64
Server:
Version: 1.13.1
API version: 1.26 (minimum version 1.12)
Package version: docker-1.13.1-88.git07f3374.el7.centos.x86_64
Go version: go1.9.4
Git commit: 07f3374/1.13.1
Built: Fri Dec 7 16:13:51 2018
OS/Arch: linux/amd64
Experimental: false
Logs:
There's a lot of
Vetoed installation of hairpin flow
messages.Complete logs for weave at https://gist.github.com/HackToHell/5ffc79ca73f0bbf83c7697857fb34395
Startup logs for kubelet at https://gist.github.com/HackToHell/97605144a51a5850c8828ef5f45cb745
Network:
Routes
default via 10.2.0.1 dev eth0 proto dhcp metric 100
10.2.0.0/16 dev eth0 proto kernel scope link src 10.2.0.8 metric 100
10.32.0.0/16 dev weave proto kernel scope link src 10.32.16.0
168.63.129.16 via 10.2.0.1 dev eth0 proto dhcp metric 100
169.254.169.254 via 10.2.0.1 dev eth0 proto dhcp metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
Addrs
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
2: eth0 inet 10.2.0.8/16 brd 10.2.255.255 scope global noprefixroute eth0\ valid_lft forever preferred_lft forever
3: docker0 inet 172.17.0.1/16 scope global docker0\ valid_lft forever preferred_lft forever
6: weave inet 10.32.16.0/16 brd 10.32.255.255 scope global weave\ valid_lft forever preferred_lft forever
IP Tables Save
https://gist.github.com/HackToHell/73a249b0ab9818703905d976d41ab262
Output of
ip a
The text was updated successfully, but these errors were encountered: