Weave reconnect occasionally fails because of missing xfrm policies. #3666

hpdvanwyk · 2019-07-23T14:34:51Z

What you expected to happen?

Weave should always reconnect after a network failure

What happened?

Weave occasionally fails to properly reconnect after a network failure. Failed node can receive ESP traffic but does not transmit any.

ip xfrm policy list is empty for the failed node.

How to reproduce it?

Disconnect the nodes physical/virtual network until heartbeat failure and then reconnect. Only happens very occasionally.
I used the following script to reproduce it on a Proxmox server:

#!/bin/bash

while true; do
        # disconnect network
        qm set 102 --net1 model=virtio,bridge=vmbr1,macaddr=62:40:98:FF:02:72,link_down=1
        sleep 55
        # reconnect network
        qm set 102 --net1 model=virtio,bridge=vmbr1,macaddr=62:40:98:FF:02:72,link_down=0
        sleep 10

        # check if weave still works
        if ssh [email protected] ping 10.42.128.0 -c1; then
                date
                echo pass
        else
                date
                echo broken
                break
        fi
done

This usually triggers it within 30 minutes.

Anything else we need to know?

This reproduced on a 2 node Proxmox cluster running RancherOS 1.5.2. We first discovered it on a 3 node VMware cluster where it was triggered by excessive etcd load that eventually caused network timeouts.

Versions:

/home/weave # ./weave --local version
weave 2.5.2

$ docker version
Client:
 Version:           18.06.3-ce
 API version:       1.38
 Go version:        go1.10.4
 Git commit:        d7080c1
 Built:             Wed Feb 20 02:24:22 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.3-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       d7080c1
  Built:            Wed Feb 20 02:25:33 2019
  OS/Arch:          linux/amd64
  Experimental:     false

$ uname -a
Linux hostyname0 4.14.122-rancher #1 SMP Tue May 28 01:50:21 UTC 2019 x86_64 GNU/Linux
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:02:58Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}

Logs:

# kubectl does not like a disconnected 2 node cluster
$ docker logs -f k8s_weave_weave-net-qjp2s_kube-system_1fb6abfd-ac85-11e9-93f0-52051287a9b1_1
DEBU: 2019/07/23 12:18:16.094346 overlay_switch ->[56:ff:d7:91:ac:16(192.168.128.10)] Removing fastdp from the list of forwarders
INFO: 2019/07/23 12:18:16.094356 overlay_switch ->[56:ff:d7:91:ac:16(192.168.128.10)] using sleeve
DEBU: 2019/07/23 12:18:16.097699 fastdp ->[192.168.128.10:6784|56:ff:d7:91:ac:16(192.168.128.10)]: sending Heartbeat to peer
INFO: 2019/07/23 12:18:16.590789 overlay_switch ->[56:ff:d7:91:ac:16(192.168.128.10)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/07/23 12:18:16.590831 ->[192.168.128.10:49925|56:ff:d7:91:ac:16(192.168.128.10)]: connection shutting down due to error: no working forwarders to 56:ff:d7:91:ac:16(192.168.128.10)
INFO: 2019/07/23 12:18:16.590892 ->[192.168.128.10:49925|56:ff:d7:91:ac:16(192.168.128.10)]: connection deleted
INFO: 2019/07/23 12:18:16.590914 Removed unreachable peer 56:ff:d7:91:ac:16(192.168.128.10)
DEBU: 2019/07/23 12:18:16.590931 [allocator 02:df:bc:77:e3:17] PeerGone: peer 56:ff:d7:91:ac:16
DEBU: 2019/07/23 12:18:16.590944 InvalidateRoutes
INFO: 2019/07/23 12:18:16.591542 ->[192.168.128.10:6783] attempting connection
INFO: 2019/07/23 12:18:16.591547 Destroying IPsec between 02:df:bc:77:e3:17(192.168.128.11) and 56:ff:d7:91:ac:16(192.168.128.10)
INFO: 2019/07/23 12:18:16.591638 ipsec: destroy: in 192.168.128.10 -> 192.168.128.11 0x69072fb9
INFO: 2019/07/23 12:18:16.596619 ipsec: destroy: out 192.168.128.11 -> 192.168.128.10 0xe63c713a
DEBU: 2019/07/23 12:18:16.624675 fastdp: broadcast{1 false} {12:5b:3d:b5:ed:8f ff:ff:ff:ff:ff:ff}
DEBU: 2019/07/23 12:18:16.624752 Creating ODP flow FlowSpec{keys: [EthernetFlowKey{src: 12:5b:3d:b5:ed:8f, dst: ff:ff:ff:ff:ff:ff} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 0}]}
DEBU: 2019/07/23 12:18:16.688656 fastdp: broadcast{1 false} {c6:0e:09:de:81:b0 ff:ff:ff:ff:ff:ff}
DEBU: 2019/07/23 12:18:16.688738 Creating ODP flow FlowSpec{keys: [EthernetFlowKey{src: c6:0e:09:de:81:b0, dst: ff:ff:ff:ff:ff:ff} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 0}]}
DEBU: 2019/07/23 12:18:16.981197 fastdp: broadcast{1 false} {02:df:bc:77:e3:17 ff:ff:ff:ff:ff:ff}
DEBU: 2019/07/23 12:18:16.981272 Creating ODP flow FlowSpec{keys: [EthernetFlowKey{src: 02:df:bc:77:e3:17, dst: ff:ff:ff:ff:ff:ff} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 0}]}
DEBU: 2019/07/23 12:18:23.379129 [http] GET /status
INFO: 2019/07/23 12:18:23.646676 ->[192.168.128.10:57051] connection accepted
DEBU: 2019/07/23 12:18:23.647625 InvalidateShortIDs
INFO: 2019/07/23 12:18:23.647753 ->[192.168.128.10:57051|56:ff:d7:91:ac:16(192.168.128.10)]: connection ready; using protocol version 2
INFO: 2019/07/23 12:18:23.647811 overlay_switch ->[56:ff:d7:91:ac:16(192.168.128.10)] using fastdp
INFO: 2019/07/23 12:18:23.647834 ->[192.168.128.10:57051|56:ff:d7:91:ac:16(192.168.128.10)]: connection added (new peer)
DEBU: 2019/07/23 12:18:23.647858 InvalidateRoutes
INFO: 2019/07/23 12:18:23.647888 Setting up IPsec between 02:df:bc:77:e3:17(192.168.128.11) and 56:ff:d7:91:ac:16(192.168.128.10)
INFO: 2019/07/23 12:18:23.647952 ipsec: InitSALocal: 192.168.128.10 -> 192.168.128.11 :6784 0x9d587dc0
DEBU: 2019/07/23 12:18:23.654523 fastdp ->[192.168.128.10:6784|56:ff:d7:91:ac:16(192.168.128.10)]: confirmed
DEBU: 2019/07/23 12:18:23.654682 sleeve ->[<nil>|56:ff:d7:91:ac:16(192.168.128.10)]: Confirm
DEBU: 2019/07/23 12:18:23.654771 sleeve ->[<nil>|56:ff:d7:91:ac:16(192.168.128.10)]: confirmed
DEBU: 2019/07/23 12:18:23.655030 InvalidateRoutes
DEBU: 2019/07/23 12:18:23.655178 [allocator 02:df:bc:77:e3:17]: Allocator.OnGossip: 663 bytes
INFO: 2019/07/23 12:18:23.655386 fastdp ->[192.168.128.10:6784|56:ff:d7:91:ac:16(192.168.128.10)]: IPSec init SA remote
INFO: 2019/07/23 12:18:23.655404 ipsec: InitSARemote: 192.168.128.11 -> 192.168.128.10 :6784 0x9d93291b
DEBU: 2019/07/23 12:18:23.655738 fastdp ->[192.168.128.10:6784|56:ff:d7:91:ac:16(192.168.128.10)]: sending Heartbeat to peer
DEBU: 2019/07/23 12:18:23.655744 sleeve ->[<nil>|56:ff:d7:91:ac:16(192.168.128.10)]: handleHeartbeat
DEBU: 2019/07/23 12:18:23.655864 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: sendHeartbeat
DEBU: 2019/07/23 12:18:23.656409 fastdp ->[192.168.128.10:6784|56:ff:d7:91:ac:16(192.168.128.10)]: handleHeartbeatAck
INFO: 2019/07/23 12:18:23.656449 ->[192.168.128.10:57051|56:ff:d7:91:ac:16(192.168.128.10)]: connection fully established
DEBU: 2019/07/23 12:18:23.656503 InvalidateRoutes
DEBU: 2019/07/23 12:18:23.656621 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: handleHeartbeatAck
DEBU: 2019/07/23 12:18:23.656632 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: sendFragTest
INFO: 2019/07/23 12:18:23.656899 EMSGSIZE on send, expecting PMTU update (IP packet was 60052 bytes, payload was 60044 bytes)
DEBU: 2019/07/23 12:18:23.656914 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: sendMTUTest: mtu candidate 1414
DEBU: 2019/07/23 12:18:23.657469 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: handleFragTestAck
DEBU: 2019/07/23 12:18:23.657503 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: handleMTUTestAck: for mtu candidate 1414
DEBU: 2019/07/23 12:18:23.657512 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: searchMTU: 1414 1415
INFO: 2019/07/23 12:18:23.657519 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: Effective MTU verified at 1414
DEBU: 2019/07/23 12:18:23.660454 InvalidateRoutes
DEBU: 2019/07/23 12:18:23.660832 ODP miss: map[15:BlobFlowKey{type: 15, key: 00000200, mask: ffffffff} 20:BlobFlowKey{type: 20, key: 00000000, mask: ffffffff} 2:BlobFlowKey{type: 2, key: 00000000, mask: ffffffff} 3:InPortFlowKey{vport: 2} 22:UnknownFlowKey{type: 22, key: 00000000, mask: exact} 25:UnknownFlowKey{type: 25, key: 00000000000000000000000000000000, mask: exact} 16:TunnelFlowKey{id: 00000000007d8bd8, ipv4src: 192.168.128.10, ipv4dst: 192.168.128.11, ttl: 64, tpsrc: 36901, tpdst: 6784} 23:UnknownFlowKey{type: 23, key: 0000, mask: exact} 24:UnknownFlowKey{type: 24, key: 00000000, mask: exact} 4:EthernetFlowKey{src: 00:00:00:00:00:00, dst: 00:00:00:00:00:00} 19:BlobFlowKey{type: 19, key: 00000000, mask: ffffffff}] on port 2
DEBU: 2019/07/23 12:18:23.660860 fastdp ->[192.168.128.10:6784|56:ff:d7:91:ac:16(192.168.128.10)]: handleVxlanSpecialPacket
DEBU: 2019/07/23 12:18:23.660928 fastdp ->[192.168.128.10:6784|56:ff:d7:91:ac:16(192.168.128.10)]: Ack Heartbeat from peer
INFO: 2019/07/23 12:18:23.665420 ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: connection ready; using protocol version 2
INFO: 2019/07/23 12:18:23.665831 overlay_switch ->[56:ff:d7:91:ac:16(192.168.128.10)] using fastdp
INFO: 2019/07/23 12:18:23.665858 ->[192.168.128.10:57051|56:ff:d7:91:ac:16(192.168.128.10)]: connection deleted
DEBU: 2019/07/23 12:18:23.665872 InvalidateRoutes
INFO: 2019/07/23 12:18:23.665911 ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: connection added (new peer)
DEBU: 2019/07/23 12:18:23.665939 InvalidateRoutes
INFO: 2019/07/23 12:18:23.665973 Setting up IPsec between 02:df:bc:77:e3:17(192.168.128.11) and 56:ff:d7:91:ac:16(192.168.128.10)
INFO: 2019/07/23 12:18:23.666047 ipsec: InitSALocal: 192.168.128.10 -> 192.168.128.11 :6784 0x52564f41
INFO: 2019/07/23 12:18:23.667593 ->[192.168.128.10:57051|56:ff:d7:91:ac:16(192.168.128.10)]: connection shutting down due to error: Multiple connections to 56:ff:d7:91:ac:16(192.168.128.10) added to 02:df:bc:77:e3:17(192.168.128.11)
INFO: 2019/07/23 12:18:23.667662 Destroying IPsec between 02:df:bc:77:e3:17(192.168.128.11) and 56:ff:d7:91:ac:16(192.168.128.10)
DEBU: 2019/07/23 12:18:23.670150 fastdp ->[192.168.128.10:6784|56:ff:d7:91:ac:16(192.168.128.10)]: confirmed
DEBU: 2019/07/23 12:18:23.670178 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: Confirm
DEBU: 2019/07/23 12:18:23.670342 InvalidateRoutes
DEBU: 2019/07/23 12:18:23.670472 [allocator 02:df:bc:77:e3:17]: Allocator.OnGossip: 663 bytes
INFO: 2019/07/23 12:18:23.670575 fastdp ->[192.168.128.10:6784|56:ff:d7:91:ac:16(192.168.128.10)]: IPSec init SA remote
INFO: 2019/07/23 12:18:23.670587 ipsec: InitSARemote: 192.168.128.11 -> 192.168.128.10 :6784 0xc81a2449
INFO: 2019/07/23 12:18:23.670713 ipsec: destroy: in 192.168.128.10 -> 192.168.128.11 0x9d587dc0
DEBU: 2019/07/23 12:18:23.670938 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: confirmed
DEBU: 2019/07/23 12:18:23.670951 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: sendHeartbeat
DEBU: 2019/07/23 12:18:23.670982 fastdp ->[192.168.128.10:6784|56:ff:d7:91:ac:16(192.168.128.10)]: sending Heartbeat to peer
DEBU: 2019/07/23 12:18:23.671563 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: handleHeartbeat
DEBU: 2019/07/23 12:18:23.671602 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: handleHeartbeatAck
DEBU: 2019/07/23 12:18:23.671636 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: sendFragTest
INFO: 2019/07/23 12:18:23.671891 EMSGSIZE on send, expecting PMTU update (IP packet was 60052 bytes, payload was 60044 bytes)
DEBU: 2019/07/23 12:18:23.671906 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: sendMTUTest: mtu candidate 1414
INFO: 2019/07/23 12:18:23.671968 overlay_switch ->[56:ff:d7:91:ac:16(192.168.128.10)] using sleeve
INFO: 2019/07/23 12:18:23.671984 ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: connection fully established
DEBU: 2019/07/23 12:18:23.672000 InvalidateRoutes
DEBU: 2019/07/23 12:18:23.672227 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: handleFragTestAck
DEBU: 2019/07/23 12:18:23.672264 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: handleMTUTestAck: for mtu candidate 1414
DEBU: 2019/07/23 12:18:23.672273 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: searchMTU: 1414 1415
INFO: 2019/07/23 12:18:23.672281 sleeve ->[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: Effective MTU verified at 1414
DEBU: 2019/07/23 12:18:23.677814 fastdp ->[192.168.128.10:6784|56:ff:d7:91:ac:16(192.168.128.10)]: handleHeartbeatAck
INFO: 2019/07/23 12:18:23.677827 ipsec: destroy: out 192.168.128.11 -> 192.168.128.10 0x9d93291b
INFO: 2019/07/23 12:18:23.677855 overlay_switch ->[56:ff:d7:91:ac:16(192.168.128.10)] using fastdp

I suspect the relevant bits are:

INFO: 2019/07/23 12:18:23.670587 ipsec: InitSARemote: 192.168.128.11 -> 192.168.128.10 :6784 0xc81a2449
INFO: 2019/07/23 12:18:23.677827 ipsec: destroy: out 192.168.128.11 -> 192.168.128.10 0x9d93291b

After this I get
node0:

$ sudo ip xfrm policy list
src 192.168.128.10/32 dst 192.168.128.11/32 proto udp 
        dir out priority 0 ptype main 
        mark 0x20000/0x20000
        tmpl src 192.168.128.10 dst 192.168.128.11
                proto esp spi 0x939f9099 reqid 0 mode transport

node1:

$ sudo ip xfrm policy list

Looking at how the netlink package used by weave determines xfrm policy selectors: https://github.com/vishvananda/netlink/blob/b1e98597921720cee4147d6476110fc8fc56072d/xfrm_policy_linux.go#L8
It does not seem to use SPI at all meaning the netlink.XfrmPolicyDel() delete done by ipsec.Destroy() will select any xfrm policy with the same local and remote IP and delete it.

The failed node also never notices that fastdp has failed because it keeps receiving heartbeats. The other node fell back to sleeve mode because it never received any heartbeats. In this state no communication is possible between the nodes using weave.

The text was updated successfully, but these errors were encountered:

murali-reddy · 2019-07-30T13:16:21Z

thanks @hpdvanwyk for reporting the issue.

INFO: 2019/07/23 12:18:23.647888 Setting up IPsec between 02:df:bc:77:e3:17(192.168.128.11) and 56:ff:d7:91:ac:16(192.168.128.10)
INFO: 2019/07/23 12:18:23.647952 ipsec: InitSALocal: 192.168.128.10 -> 192.168.128.11 :6784 0x9d587dc0
INFO: 2019/07/23 12:18:23.655404 ipsec: InitSARemote: 192.168.128.11 -> 192.168.128.10 :6784 0x9d93291b
INFO: 2019/07/23 12:18:23.665973 Setting up IPsec between 02:df:bc:77:e3:17(192.168.128.11) and 56:ff:d7:91:ac:16(192.168.128.10)
INFO: 2019/07/23 12:18:23.666047 ipsec: InitSALocal: 192.168.128.10 -> 192.168.128.11 :6784 0x52564f41
INFO: 2019/07/23 12:18:23.667662 Destroying IPsec between 02:df:bc:77:e3:17(192.168.128.11) and 56:ff:d7:91:ac:16(192.168.128.10)
INFO: 2019/07/23 12:18:23.670587 ipsec: InitSARemote: 192.168.128.11 -> 192.168.128.10 :6784 0xc81a2449
INFO: 2019/07/23 12:18:23.670713 ipsec: destroy: in 192.168.128.10 -> 192.168.128.11 0x9d587dc0
INFO: 2019/07/23 12:18:23.677827 ipsec: destroy: out 192.168.128.11 -> 192.168.128.10 0x9d93291b

It appears to me due to concurrent connection setup and destroy triggered IPsec actions results in this hence it happens occasionally.

From your logs last ipsec: destroy should only destroy xfrm policy with spi 0x9d93291b but also ends up in 0xc81a2449

hpdvanwyk mentioned this issue Jul 26, 2019

Fix incorrect xfrm policy being deleted by ipsec.Destroy(). #3669

Merged

murali-reddy added the bug label Jul 31, 2019

hpdvanwyk mentioned this issue Jul 31, 2019

Weave reconnect occasionally fails because InitSARemote can get called after Destroy #3675

Closed

bboreham closed this as completed in #3669 Sep 2, 2019

bboreham added this to the 2.6 milestone Nov 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weave reconnect occasionally fails because of missing xfrm policies. #3666

Weave reconnect occasionally fails because of missing xfrm policies. #3666

hpdvanwyk commented Jul 23, 2019 •

edited

Loading

murali-reddy commented Jul 30, 2019

Weave reconnect occasionally fails because of missing xfrm policies. #3666

Weave reconnect occasionally fails because of missing xfrm policies. #3666

Comments

hpdvanwyk commented Jul 23, 2019 • edited Loading

What you expected to happen?

What happened?

How to reproduce it?

Anything else we need to know?

Versions:

Logs:

murali-reddy commented Jul 30, 2019

hpdvanwyk commented Jul 23, 2019 •

edited

Loading