Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

iptables: No chain/target/match by that name. #2617

Closed
awh opened this issue Nov 8, 2016 · 13 comments
Closed

iptables: No chain/target/match by that name. #2617

awh opened this issue Nov 8, 2016 · 13 comments
Assignees
Milestone

Comments

@awh
Copy link
Contributor

awh commented Nov 8, 2016

From @errordeveloper on October 24, 2016 17:25

I've installed weave-kube on a Kubernetes cluster, after rebooting a node, weave-kube fails to start and logs this:

root@kube-node-0:~# docker logs cfecf98444b0                                                              
INFO: 2016/10/24 17:22:36.316253 Command line options: map[ipalloc-range:10.32.0.0/12 name:ca:dd:16:be:df:
42 nickname:kube-node-0 no-dns:true docker-api: datapath:datapath http-addr:127.0.0.1:6784 ipalloc-init:co
nsensus=2 port:6783]                                                                                      
INFO: 2016/10/24 17:22:36.339086 Communication between peers is unencrypted.                              
INFO: 2016/10/24 17:22:36.342394 Our name is ca:dd:16:be:df:42(kube-node-0)                               
INFO: 2016/10/24 17:22:36.342490 Launch detected - using supplied peer list: [163.172.63.141 62.210.116.20
6]                                                                                                        
INFO: 2016/10/24 17:22:36.406053 [allocator ca:dd:16:be:df:42] Initialising with persisted data           
INFO: 2016/10/24 17:22:36.406158 Sniffing traffic on datapath (via ODP)                                   
INFO: 2016/10/24 17:22:36.407127 ->[62.210.116.206:6783] attempting connection                            
INFO: 2016/10/24 17:22:36.407365 ->[163.172.63.141:6783] attempting connection                            
INFO: 2016/10/24 17:22:36.407662 ->[163.172.63.141:57223] connection accepted                             
INFO: 2016/10/24 17:22:36.411021 ->[62.210.116.206:6783|3e:62:bc:39:9d:42(kube-node-1)]: connection ready;
 using protocol version 2                                                                                 
INFO: 2016/10/24 17:22:36.411151 overlay_switch ->[3e:62:bc:39:9d:42(kube-node-1)] using fastdp           
INFO: 2016/10/24 17:22:36.411262 ->[62.210.116.206:6783|3e:62:bc:39:9d:42(kube-node-1)]: connection added 
(new peer)                                                                                                
INFO: 2016/10/24 17:22:36.413027 Listening for HTTP control messages on 127.0.0.1:6784                    
INFO: 2016/10/24 17:22:36.414151 ->[163.172.63.141:6783|ca:dd:16:be:df:42(kube-node-0)]: connection shutti
ng down due to error: cannot connect to ourself                                                           
INFO: 2016/10/24 17:22:36.414268 ->[163.172.63.141:57223|ca:dd:16:be:df:42(kube-node-0)]: connection shutt
ing down due to error: cannot connect to ourself                                                          
INFO: 2016/10/24 17:22:36.912987 ->[62.210.116.206:6783|3e:62:bc:39:9d:42(kube-node-1)]: connection fully 
established                                                                                               
INFO: 2016/10/24 17:22:36.913654 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, paylo
ad was 60020 bytes)                                                                                       
INFO: 2016/10/24 17:22:36.914948 sleeve ->[62.210.116.206:6783|3e:62:bc:39:9d:42(kube-node-1)]: Effective 
MTU verified at 1438                                                                                      
iptables: No chain/target/match by that name.                                                             

Copied from original issue: weaveworks-experiments/weave-kube#42

@awh
Copy link
Contributor Author

awh commented Nov 8, 2016

From @errordeveloper on October 24, 2016 17:26

The weave-npc container runs fine, just the weave container is failing.

@awh
Copy link
Contributor Author

awh commented Nov 8, 2016

From @errordeveloper on October 24, 2016 17:26

I've tried to re-create the addon, but nothing changed.

@awh awh added the kind/bug label Nov 8, 2016
@awh
Copy link
Contributor Author

awh commented Nov 8, 2016

From @errordeveloper on October 24, 2016 17:28

I've been able to get around this by deleting the addon, running weave reset on the node, and re-creating the addon.

@awh
Copy link
Contributor Author

awh commented Nov 8, 2016

From @bboreham on October 25, 2016 10:41

The symptoms all match this message coming from weave expose setting up the NAT masquerading rules. And since the nat table existed the second time you did it, this points toward the WEAVE chain being missing. Since we don't have the error message from that run, weave reset (or delete the bridge some other way) and re-try is the appropriate action.

We could certainly make the log messages easier to tie to what was happening.

Since Kubernetes effectively buries the container log messages by deleting the container and re-trying, it's difficult to diagnose fully, and also difficult for the end-user to cure a part-successful launch. Can we do better?

@awh
Copy link
Contributor Author

awh commented Nov 8, 2016

@errordeveloper could we get information on k8s/distro versions, cloud provider etc? The evidence points to something external to weave-kube having removed the WEAVE chain from the nat table partway through launch - it'd be good to know what else was running at the time (e.g. firewalld)

@awh
Copy link
Contributor Author

awh commented Nov 8, 2016

From @raghu67 on November 4, 2016 17:23

I am seeing a similar issue. Here are the details:
[root@m0062421 ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0", GitCommit:"a16c0a7f71a6f93c7e0f222d961f4675cd97a46b", GitTreeState:"clean", BuildDate:"2016-09-26T18:16:57Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0", GitCommit:"a16c0a7f71a6f93c7e0f222d961f4675cd97a46b", GitTreeState:"clean", BuildDate:"2016-09-26T18:10:32Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
[root@m0062421 ~]# kubeadm version
kubeadm version: version.Info{Major:"1", Minor:"5+", GitVersion:"v1.5.0-alpha.0.1534+cf7301f16c0363-dirty", GitCommit:"cf7301f16c036363c4fdcb5d4d0c867720214598", GitTreeState:"dirty", BuildDate:"2016-09-27T18:10:39Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
[root@m0062421 ~]#

CentOS 7.2. Kernel Version:
[root@m0062421 ~]# uname -a
Linux m0062421.lab.ppops.net 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

These are based on an old version of OpenStack and KVM. if that is relevant, I can find the details

@bboreham
Copy link
Contributor

We have seen more failures with this symptom, and also an earlier failure.

I conjecture that there is a race between the CNI plugin and weave expose both trying to assign an IP address to the bridge

This code in the weave script is inherently racy if someone else is doing the same thing at the same time:

if ! ip addr show dev $BRIDGE | grep -qF $CIDR ; then
    ip addr add dev $BRIDGE $CIDR

and it gives the right error message:

# ip addr add dev weave 10.32.0.2/12
RTNETLINK answers: File exists

We are still running weave expose because of #2471 (comment)

@bboreham bboreham self-assigned this Nov 14, 2016
awh added a commit that referenced this issue Nov 14, 2016
Remove 'weave expose' race in CNI plugin
@awh awh closed this as completed in aaa073a Nov 14, 2016
@awh awh added this to the 1.8.1 milestone Nov 14, 2016
@pstadler
Copy link

pstadler commented Feb 26, 2017

I had the exact same problem and it took me nearly a day to figure out what's going on.

If anybody encounters this problem in the future:

The cause of this issue is most probably that your kernel is missing the xt_set module. This can be verified by running zgrep CONFIG_NETFILTER_XT_SET /proc/config.gz. This should return CONFIG_NETFILTER_XT_SET=y (or =m) or else you have to either compile or load this module into the kernel manually.

Related: scaleway/kernel-tools#299

@pstadler
Copy link

It would be great to find a way to probe for the absence of this module and show some meaningful error to the user, as this is very hard to debug.

@bboreham
Copy link
Contributor

@pstadler I must confess to being slightly confused by the history of this issue, as the original error message is iptables: No chain/target/match by that name., but later on I say "it gives the right error message: ... RTNETLINK answers: File exists".

So forgive me when you say "I had the exact same problem", but what exactly were the symptoms in your case?

It does not seem likely that a missing kernel module would be solvable "by deleting the addon, running weave reset on the node, and re-creating".

Right now I think there are two or three different issues in the history of this one.

@pstadler
Copy link

pstadler commented Feb 27, 2017

Oh, you're absolutely right, exact same problem is definitely wrong. Let me clarify this.

I got the same error message, but when starting weave-npc. I compared the host's iptables with the ones from a running setup and found the missing rules, which brought me to the solution I described above. You're definitely right that this is not related - sorry about that. I thought I just dump my findings into this issue here, in case somebody with the same problem error message reads this and would hopefully be able to get on the right track.

Do you want me to file a new issue?

@bboreham
Copy link
Contributor

Note I added a line to modprobe xt_set in #2819

@pstadler
Copy link

Great. Sorry again for the confusion.

everflux added a commit to everflux/PKGBUILDs that referenced this issue May 5, 2019
Kubernetes container networking depends on the xt_set capability, see weaveworks/weave#2617 (comment)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants