Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

networking broken for containers with moved MAC #2436

Open
rade opened this issue Jul 10, 2016 · 11 comments
Open

networking broken for containers with moved MAC #2436

rade opened this issue Jul 10, 2016 · 11 comments
Labels
Milestone

Comments

@rade
Copy link
Member

rade commented Jul 10, 2016

when a MAC previously associated with a container on one host, is subsequently associated with container on a different host, the latter container has no network connectivity.

host1$ weave launch
host2$ weave launch $HOST1
host1$ A=$(weave run 10.0.0.1/24 -ti alpine /bin/sh)
host1$ B=$(weave run 10.0.0.2/24 -ti alpine /bin/sh)
host2$ C=$(weave run 10.0.0.3/24 --privileged -ti alpine /bin/sh)
host2$ docker exec $C ping -nq -W 1 -c 1 10.0.0.2
PING 10.0.0.2 (10.0.0.2): 56 data bytes
--- 10.0.0.2 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.947/0.947/0.947 ms
host1$ docker exec $A ip link show ethwe | sed -n -e 's|^ *link/ether \([0-9a-f:]*\).*|\1|p'
aa:c5:15:f8:a3:e8
host1$ docker rm -f $A
host2$ docker exec $C ip link set ethwe address aa:c5:15:f8:a3:e8
host2:~$ docker exec $C ping -nq -W 1 -c 1 10.0.0.2
PING 10.0.0.2 (10.0.0.2): 56 data bytes
--- 10.0.0.2 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss

docker logs weave on host2 shows

ERRO: 2016/07/10 08:07:41.346535 Captured frame from MAC (aa:c5:15:f8:a3:e8) associated with another peer 0e:2a:9b:8e:23:fd(host1)

Running weave report confirms that there is a corresponding MAC cache entry.

@bboreham was pondering whether we can make a better decision by inspecting the destination mac, i.e. only drop the mac if the source mac appears to be from a different peer and the destination mac is for a local peer.

Which, by my reading of the code, simply means turning the error into a warning and carrying on.

We'd need to think carefully in what situations that might create a loop.

@rade
Copy link
Member Author

rade commented Jul 10, 2016

Hmm. Curiously making that suggested changed didn't fix the problem for me. Something weird is going on...

@rade rade self-assigned this Jul 10, 2016
@rade
Copy link
Member Author

rade commented Jul 10, 2016

There's a bug in the MAC cache: MacCache.AddForced never returns a conflict peer. That means the conflictPeer != nil branch in NetworkRouter.handleForwardedPacket is not reached, which in turn means we are not invalidating routes when MACs move.

I was invoking the same AddForced in the revised handleCapturedPacket code.

Curiously fixing that bug still does not make things work. Even though the logs clearly indicate that both peers detect the moved MAC.

@rade
Copy link
Member Author

rade commented Jul 10, 2016

Curiously fixing that bug still does not make things work.

It does when I run with WEAVE_NO_FASTDP=1.

@rade
Copy link
Member Author

rade commented Jul 10, 2016

when running with fastdp, after pinging from B to C

host1$ docker exec $B ping -nq -W 1 -c 1 10.0.0.3

(i.e. the opposite direction of the failing ping), then subsequently the ping from C to B starts working.

tcpdumping 'weave' on both machines shows that during the first C-to-B ping the ARP request gets through to B and a reply is sent, but that never appears on C.

@jakolehm
Copy link

What is the probability to get same mac address on different host? Just wondering because we hit this bug pretty easily.

@rade
Copy link
Member Author

rade commented Jul 11, 2016

What is the probability to get same mac address on different host?

It's 46 random bits; so pretty low.

we hit this bug pretty easily.

I filed this bug separately from #2433 because I have no evidence to suggest that the symptoms there are explained by this.

@rade rade removed their assignment Jul 11, 2016
@awh awh assigned rade Jul 12, 2016
@rade rade removed their assignment Jul 12, 2016
@awh
Copy link
Contributor

awh commented Jul 14, 2016

Curiously fixing that bug still does not make things work.

I believe I have determined what is happening here:

  1. On the first ping container A's MAC address is learned by fastdp's internal bridge implementation on host1 as being associated with vport 1 (the port to the intermediary weave bridge netdev). Associated flow rules are installed.
  2. When the ping is reattempted after the MAC is moved from A to C, C redoes ARP for B (presumably force setting C's MAC flushes its ARP cache.)
  3. The reception of the ARP request on host1 updates the router MAC cache and calls InvalidateRoutes on the fastdp overlay. All extant flows on host1 are removed, but crucially the fastdp bridge MAC cache is left untouched.
  4. Container A responds to the ARP request, and its response crosses the weave bridge into vport 1.
  5. Because the flows have been removed, the packet is handled by the fastdp bridge code, which finds a handler in its MAC cache that sends the packet back out of vport 1. The bogus flow rule is reinstalled.

This is why if you tcpdump -n -e -i weave on host1 when you do the second ping, you see two copies of the ARP response. No ARP response makes it across to host2, because it is being reflected back onto host1's weave bridge instead of being routed over the overlay.

If you apply this patch

diff --git a/router/fastdp.go b/router/fastdp.go
index 450f3be..c64d6ac 100644
--- a/router/fastdp.go
+++ b/router/fastdp.go
@@ -302,6 +302,8 @@ func (fastdp fastDatapathOverlay) InvalidateRoutes() {
        log.Debug("InvalidateRoutes")
        fastdp.lock.Lock()
        defer fastdp.lock.Unlock()
+       fastdp.sendToMAC = make(map[MAC]bridgeSender)
+       fastdp.seenMACs = make(map[MAC]struct{})
        checkWarn(fastdp.deleteFlows())
 }

on top of @rade's #2437 branch, the second ping succeeds immediately.

@brb brb self-assigned this Jul 14, 2016
@brb brb modified the milestones: 1.6.1, 1.6.2 Aug 18, 2016
@awh awh modified the milestones: 1.6.2, 1.7.1 Sep 21, 2016
@awh awh modified the milestones: 1.7.1, 1.7.2 Oct 5, 2016
@awh awh modified the milestones: 1.7.3, 1.7.2 Oct 11, 2016
@awh awh modified the milestones: 1.7.3, 1.8.1 Nov 4, 2016
@bboreham bboreham modified the milestones: 1.8.2, 1.8.1 Nov 21, 2016
@brb brb modified the milestones: 1.8.2, 1.8.3 Dec 8, 2016
@bboreham bboreham modified the milestones: 1.8.3, overflow Feb 1, 2017
@yannrouillard
Copy link

yannrouillard commented Apr 28, 2017

Hi,

We recently had a very network connectivity in Kubernetes bug possibly involving weave.
During troubleshooting we stumbled upon a lot of logs like the one listed in this ticket "ERROR: Captured frame from MAC ...".

This message is printed on a regular basis even when the network is working so I am not sure it is related at all but I would like to know if this message is a sign that we might run into the problem stated in this bug or if it is in fact usually a benign error.

@bboreham
Copy link
Contributor

@yannrouillard see #2877 - it's not always an error.

@brb brb removed their assignment Jun 12, 2018
@Nossnevs
Copy link

Nossnevs commented Apr 5, 2019

We have this issue a lot. It runs ok for sometime and when the pods move around there is a risk that this happens. We are running docker.io/weaveworks/weave-kube:2.4.1 on aws with 5 nodes. Will this issue ever be prioritized? Our solution is today to delete the weave db and delete the weave pods. Some times this works the first time.

kubectl get pods --namespace kube-system | grep weave-net | cut -d ' ' -f 1 | xargs -I{} bash -c 'kubectl exec -c weave --namespace kube-system {} rm /weavedb/weave-netdata.db ; kubectl delete pod --namespace kube-system {}'

@bboreham
Copy link
Contributor

bboreham commented Apr 5, 2019

@Nossnevs you are explicitly creating a container with the same MAC on a different node?
Can you explain the background to this requirement?

If this is not your situation, please open a new issue and supply the requested information.
Weave Net is an Open Source project; if you require a determined response please enquire about paid support contracts.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

7 participants