nodes with multiple network interfaces can fail to talk to services #102

pires · 2017-01-04T17:19:21Z

UPDATE as of Feb 7th, 2018 by request of @bboreham I've edited the title so to not mislead people looking for unrelated issue.

As reported by @damaspi:

When I deploy some demo application, I have the same message as above. (Error syncing pod, skipping: failed to "SetupNetwork" ).

When I check the logs of the proxy pod, kubectl logs kube-proxy-g7qh1 --namespace=kube-system I get the following info: proxier.go:254] clusterCIDR not specified, unable to distinguish between internal and external traffic

pires · 2017-01-04T17:20:56Z

Another user reported the same behavior on #kubernetes-users.

pires · 2017-01-04T21:29:54Z

@damaspi I have opened this issue and provided a fix. Waiting on feedback!

Also, moving to userspace mode brings quite of a performance penalty.

damaspi · 2017-01-04T21:38:47Z

Sorry, I commented in the wrong issue.
Thanks for the fix. I'll not be able to test it soon though (was working on this during holidays, and I am back to work), and I was using only the official stable version (so I have not the environment to build it).

damaspi · 2017-01-04T21:39:44Z

I copied it here now, and delete it in the other...

I worked-around temporarily by configuring proxy-mode to userspace but any advice welcome...

(inspired by this issue )

kubectl -n kube-system get ds -l "component=kube-proxy" -o json | jq ".items[0].spec.template.spec.containers[0].command |= .+ [\"--proxy-mode=userspace\"]" | kubectl apply -f - && kubectl -n kube-system delete pods -l "component=kube-proxy"

pires · 2017-01-04T21:40:45Z

Again, @damaspi

Also, moving to userspace mode brings quite of a performance penalty.

bvandewalle · 2017-01-19T00:46:47Z

I had the same issue.
My Kube-Proxy would not install the Service related rules, making any service unavailable from the pods.

My fix was to modify the Kubeadm DaemonSet for kube-proxy and add explicitely the --cluster-cidr = option.

pires · 2017-01-19T08:11:20Z

/cc @luxas

mikedanese · 2017-02-02T16:14:15Z

@spxtr you are closing a bunch of issues in this repo

pires · 2017-02-02T16:16:00Z

@mikedanese PRs being merged and there was a PR merged that fixed the lack of --cluster-cidr flag in controller-manager.

mikedanese · 2017-02-02T17:50:30Z

@pires, the merge of the PR in the main repo is not what closed this PR. It was the merge in @spxtr's branch. That's what concerns me.

pires · 2017-02-02T17:57:31Z

Ah I've seen it before indeed.

ronaldpetty · 2017-02-10T11:51:19Z

I have seen this on 1.5.2. I manually building a cluster (to learn.) . I am unclear what the fix is, as there is mention of controller-manager and daemon set. That implies to me that people are launching kube-proxy via a daemon-set. Just to clarify, the actual fix is to add the flag (--cluster-cidr) to kube-proxy correct? Just trying to make sure I am not missing something. Also, just to clear my memory, didn't kube-proxy use to get this from the kube-apiserver? Was it always needed, I can't remember. If it doesn't, can someone clarify the difference between --service-cluster-ip-range=10.0.0.0/16 (api) and --cluster-cidr (proxy)? Thanks. (sorry to add here, not sure where else to ask for this issue.)

pires · 2017-02-10T12:26:58Z

Where did the API server exposed the cluster pod CIDR? This was a misconception on my side as well.

ronaldpetty · 2017-02-10T12:29:20Z

Hi @pires, I thought . --service-cluster-ip-range=10.0.0.0/16 on the api-server set it all up as the proxies would talk to the k8s server to get that information. --cluster-cidr maybe was to do a subset of --service-cluster-ip-range, else it seems redundant or there is a use case that I am unclear about (or I just don't know what I am talking about, which could be true!)

pires · 2017-02-10T12:37:36Z

Service CIDR is the subnet used for virtual IPs (used by kube-proxy). Problem is kube-proxy doesn't know about pod network CIDR, which is different than service CIDR.

ronaldpetty · 2017-02-10T13:57:40Z

Ah, so would that be the overlay?

bamb00 · 2017-02-22T00:55:14Z

Would this issue cause communication between pod and api-server? For example if I was to run the curl command from a kube pod to apiserver "curl https://10.96.0.1:443/api" result:> curl: (7) Failed to connect to 10.96.0.1 port 443: Connection timed out...

thockin · 2017-05-31T17:42:34Z

I just had a look at the clusterCIDR logic in kube-proxy, and I agree that is a weird corner case.

I agree the static route is appropriate for the 2nd interface, but it's unfortunate. It feels like the kernel should be smarter than that.

bamb00 · 2017-06-06T20:25:19Z

I'm running v1.6.1 and thought the error "clusterCIDR not specified, unable to distinguish between internal and external traffic" would be address.

2017-06-06T17:49:17.113224501Z I0606 17:49:17.112870 1 server.go:225] Using iptables Proxier.
2017-06-06T17:49:17.139584294Z W0606 17:49:17.139190 1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic
2017-06-06T17:49:17.139607413Z I0606 17:49:17.139223 1 server.go:249] Tearing down userspace rules.
2017-06-06T17:49:17.251412491Z I0606 17:49:17.251115 1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
2017-06-06T17:49:17.252499164Z I0606 17:49:17.252359 1 conntrack.go:66] Setting conntrack hashsize to 131072
2017-06-06T17:49:17.253220249Z I0606 17:49:17.253057 1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
2017-06-06T17:49:17.253246216Z I0606 17:49:17.253124 1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600

timchenxiaoyu · 2017-06-09T05:29:05Z

how to define the internal and external traffic ?

thockin · 2017-06-09T05:39:01Z

This error specifically refers to anything outside the clusters Pod IPs.

…

On Thu, Jun 8, 2017 at 10:29 PM, timchenxiaoyu ***@***.***> wrote: how to define the internal and external traffic ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#102 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFVgVPwm-x4smcIx1cFAOLOO6FFIn9T6ks5sCNgkgaJpZM4La21F> .

kfox1111 · 2017-07-05T19:56:54Z

I've seen this problem too. a route to the pod network to the second nic resolved the issue for me. Feels a little fragile though.....

…erstand what's internal and external traffic. Fixes kubernetes/kubeadm#102

bamb00 · 2017-09-14T00:49:19Z

Hi,

I'm running Kubernetes v1.6.6 & v1.7.0 kube-proxy. Getting the same error,

kube-proxy:

   W0914 00:15:41.627710       1 proxier.go:298] clusterCIDR not specified, unable to distinguish between internal and external traffic

Kubernetes version:

   Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.6", GitCommit:"7fa1c1756d8bc963f1a389f4a6937dc71f08ada2", GitTreeState:"clean", BuildDate:"2017-06-16T18:34:20Z", GoVersion:"go1.7.6", Compiler:"gc", Platform:"linux/amd64"}
   Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.6", GitCommit:"7fa1c1756d8bc963f1a389f4a6937dc71f08ada2", GitTreeState:"clean", BuildDate:"2017-06-16T18:21:54Z", GoVersion:"go1.7.6", Compiler:"gc", Platform:"linux/amd64"}

Try the workaround from @damaspi but failed in v1.6.6 and v1.7.0 use to work in v1.5.4.

  # kubectl -n kube-system get ds -l "component=kube-proxy" -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--cluster-cidr=10.96.0.0/12"]' | kubectl apply -f - && kubectl -n kube-system delete pods -l "component=kube-proxy"
  
    error: error validating "STDIN": error validating data: items[0].apiVersion not set; if you choose to ignore these errors, turn validation off with --validate=false

Need guidance to resolve in v1.6.6 & v1.7.0. Thanks.

jamiehannaford · 2017-10-09T09:53:37Z

@bboreham

I don't really know what kubeadm could do, since the solution seems to relate to the underlying network. Maybe add options to inform your desired "public interface" and "private interface" and have kubeadm recommend network config changes?

I don't think kubeadm should be spitting out OS or distro-specific configuration instructions for host networking. I think it's the responsibility of the operator to configure their host appropriately because otherwise it becomes a rabbit hole. We can certainly make it a requirement, though.

What should kubeadm expect for things to work? That if the user wants to use a non-default NIC, they need to add a static route in Linux? Is this a general enough use-case for us to add it as a system requirement?

jamiehannaford · 2017-11-02T16:02:09Z

@bboreham Any ideas on how we can improve our documentation here? Otherwise I'm in favour of closing this because:

it seems to relate to a user's network environment, not kubeadm
there's no single way to clarify those expectations

bboreham · 2017-11-02T17:24:37Z

[Aside: it bugs me I have to read up and down and through other issues to page the context back in. The problem people wanted resolved is absolutely nothing to do with the title of this issue]

In the setup docs you could say "if you have more than one network adapter, and your Kubernetes components are not reachable on the default route, we recommend you add IP route(s) so Kubernetes cluster addresses go via the appropriate adapter".

jamiehannaford · 2017-11-02T20:24:55Z

[Aside: it bugs me I have to read up and down and through other issues to page the context back in. The problem people wanted resolved is absolutely nothing to do with the title of this issue]

You are not the only one! 😅

In the setup docs you could say "if you have more than one network adapter, and your Kubernetes components are not reachable on the default route, we recommend you add IP route(s) so Kubernetes cluster addresses go via the appropriate adapter".

Cool, I'll try to submit a docs PR for this tomorrow and close this out.

jamiehannaford · 2017-11-10T15:25:10Z

This is now documented in kubernetes/website#6265, so I'm going to close.

This issue seems to track a few different problems at once, so if you're still running into a potential bug, please open a new issue so can better target the root cause.

mindscratch · 2017-11-29T13:32:27Z

FWIW, if you use kubeadm to start the cluster, if you specify the "pod-network-cidr", that'll get passed to the kube-proxy when it starts as the "cluster-cidr". For example, weave defaults to using "10.32.0.0/12"...so I used kubeadm init --kubernetes-version=v.1.8.4 --pod-network-cidr=10.32.0.0/12 which started kube-proxy with cluster-cidr=10.32.0.0/12

bamb00 · 2017-11-29T23:08:31Z

@bboreham I'm new to this...Would there be an example on how to implement your suggestion "add IP route(s) so Kubernetes cluster addresses go via the appropriate adapter"?

bboreham · 2017-11-30T00:12:11Z

@bamb00 scroll up; there is an example at #102 (comment)

Caution: if you make a wrong step it may will result in your machine being inaccessible. Generally this will come back after a reboot, unless you configured the bad route to be there on startup.

I do not know an easy way to learn Linux network configuration.

bboreham · 2017-11-30T00:14:06Z

@mindscratch do note this issue has nothing to do with "cluster-cidr"; that was a red herring eliminated around seven months ago. Please open a new issue if you are having new problems.

SpComb · 2018-03-06T11:35:00Z

Semi-serious suggestion for fixing this specific case without requiring the kube-proxy to use ! -s $podCIDR to distinguish host source address:

$ sudo ip ro add local 10.96.0.0/12 table local dev lo
$ sudo iptables -t nat -I KUBE-SERVICES -s 10.96.0.0/12 -d 10.96.0.0/12 -j KUBE-MARK-MASQ

(or possibly some variation with an explicit ... src 10.96.0.0 on the local route... the table local is probably also unnecessary and a bad idea)

$ ip ro get 10.96.0.1
local 10.96.0.1 dev lo  src 10.96.0.1 
    cache <local> 
$ curl -vk https://10.96.0.1
...
* Connected to 10.96.0.1 (10.96.0.1) port 443 (#0)

11:32:20.671085 0c:c4:7a:54:0a:e6 > 44:aa:50:04:3d:00, ethertype IPv4 (0x0800), length 74: 10.80.4.149.59334 > 10.80.4.147.6443: Flags [S], seq 2286812584, win 43690, options [mss 65495,sackOK,TS val 209450 ecr 0,nop,wscale 8], length 0
11:32:20.671239 44:aa:50:04:3d:00 > 0c:c4:7a:54:0a:e6, ethertype IPv4 (0x0800), length 74: 10.80.4.147.6443 > 10.80.4.149.59334: Flags [S.], seq 1684666695, ack 2286812585, win 28960, options [mss 1460,sackOK,TS val 208877 ecr 209450,nop,wscale 8], length 0
11:32:20.671315 0c:c4:7a:54:0a:e6 > 44:aa:50:04:3d:00, ethertype IPv4 (0x0800), length 66: 10.80.4.149.59334 > 10.80.4.147.6443: Flags [.], ack 1, win 171, options [nop,nop,TS val 209450 ecr 208877], length 0

However, I have no idea if that covers all of the expected behaviors of those source-specific kube-proxy MASQ rules...

EDIT: this also has all kinds of side-effects for connections to unconfigured service VIPs... they will end up connecting to any matching host network namespace services.

EDIT2: However, even that is probably better than the current behavior of leaking connections to unconfigured 10.96.X.Y service VIPs out via the default route... which is vaguely unsettling

pires changed the title ~~Can~~ kube-proxy can't distinguish internal and external traffic Jan 4, 2017

pires mentioned this issue Jan 4, 2017

kubeadm not creating /etc/cni/net.d .. pods will fail to load. #74

Closed

pires mentioned this issue Jan 4, 2017

kubeadm: kube-proxy needs to know the pod subnet CIDR kubernetes/kubernetes#39440

Merged

This was referenced Jan 4, 2017

kube-dns restarts frequently，using kubeadm v1.6.0-alpha.0.2074 #99

Closed

kube-dns started on slave node while cluster restarted and something went wrong with DNS #96

Closed

Worker node service fails after reboot #92

Closed

rathir mentioned this issue Jan 19, 2017

kubernetes on azure is broken with docker version 1.13.0 colemickens/azure-kubernetes-status#20

Closed

spxtr closed this as completed in spxtr/kubernetes@cae862d Feb 2, 2017

mikedanese reopened this Feb 2, 2017

dchen1107 closed this as completed in dchen1107/kubernetes-1@9dedf92 Feb 4, 2017

murali-reddy mentioned this issue Aug 7, 2017

Replace or remove "--cluster-cidr" cloudnativelabs/kube-router#107

Closed

berryjam pushed a commit to berryjam/kubernetes that referenced this issue Aug 18, 2017

kubeadm: kube-proxy needs to know the pod subnet CIDR in order to und…

e3585db

…erstand what's internal and external traffic. Fixes kubernetes/kubeadm#102

SStar1314 mentioned this issue Oct 10, 2017

kube-dns-3841192733-98vm5 0/3 CrashLoopBackOff kubernetes-sigs/kubespray#1494

Closed

jamiehannaford self-assigned this Nov 2, 2017

jamiehannaford mentioned this issue Nov 10, 2017

Document nics req kubernetes/website#6265

Merged

jamiehannaford closed this as completed Nov 10, 2017

bboreham mentioned this issue Dec 8, 2017

daemonSet pod with 'hostNetwork: true' has no route to service net kubernetes/kubernetes#56934

Closed

pires changed the title ~~kube-proxy can't distinguish internal and external traffic~~ nodes with multiple network interfaces can fail to talk to services Feb 7, 2018

murali-reddy mentioned this issue Jul 12, 2018

Issues accessing pods from host network with several interfaces cloudnativelabs/kube-router#337

Closed

bboreham mentioned this issue Aug 2, 2018

Weave Net does not come up after joining k8s nodes within VirtualBox. weaveworks/weave#3363

Closed

bboreham mentioned this issue Oct 5, 2018

weave-net pod fails to get peers after Kubernetes v1.12.0 upgrade weaveworks/weave#3420

Closed

murali-reddy mentioned this issue Feb 20, 2019

Weave net don't create a Interface in a kubernetes node weaveworks/weave#3599

Closed

murali-reddy mentioned this issue Feb 3, 2020

Crashing weave-net pod when adding node to k8 cluster without supplying network-CIDR weaveworks/weave#3758

Open

murali-reddy mentioned this issue Aug 7, 2020

Improve the ipAddrAdd performance cloudnativelabs/kube-router#965

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nodes with multiple network interfaces can fail to talk to services #102

nodes with multiple network interfaces can fail to talk to services #102

pires commented Jan 4, 2017 •

edited

Loading

pires commented Jan 4, 2017

pires commented Jan 4, 2017

damaspi commented Jan 4, 2017

damaspi commented Jan 4, 2017

pires commented Jan 4, 2017

bvandewalle commented Jan 19, 2017

pires commented Jan 19, 2017

mikedanese commented Feb 2, 2017

pires commented Feb 2, 2017 •

edited

Loading

mikedanese commented Feb 2, 2017

pires commented Feb 2, 2017

ronaldpetty commented Feb 10, 2017

pires commented Feb 10, 2017

ronaldpetty commented Feb 10, 2017

pires commented Feb 10, 2017

ronaldpetty commented Feb 10, 2017

bamb00 commented Feb 22, 2017

thockin commented May 31, 2017

bamb00 commented Jun 6, 2017

timchenxiaoyu commented Jun 9, 2017

thockin commented Jun 9, 2017 via email

kfox1111 commented Jul 5, 2017

bamb00 commented Sep 14, 2017 •

edited

Loading

jamiehannaford commented Oct 9, 2017

jamiehannaford commented Nov 2, 2017

bboreham commented Nov 2, 2017

jamiehannaford commented Nov 2, 2017

jamiehannaford commented Nov 10, 2017

mindscratch commented Nov 29, 2017

bamb00 commented Nov 29, 2017

bboreham commented Nov 30, 2017

bboreham commented Nov 30, 2017

SpComb commented Mar 6, 2018 •

edited

Loading

nodes with multiple network interfaces can fail to talk to services #102

nodes with multiple network interfaces can fail to talk to services #102

Comments

pires commented Jan 4, 2017 • edited Loading

pires commented Jan 4, 2017

pires commented Jan 4, 2017

damaspi commented Jan 4, 2017

damaspi commented Jan 4, 2017

pires commented Jan 4, 2017

bvandewalle commented Jan 19, 2017

pires commented Jan 19, 2017

mikedanese commented Feb 2, 2017

pires commented Feb 2, 2017 • edited Loading

mikedanese commented Feb 2, 2017

pires commented Feb 2, 2017

ronaldpetty commented Feb 10, 2017

pires commented Feb 10, 2017

ronaldpetty commented Feb 10, 2017

pires commented Feb 10, 2017

ronaldpetty commented Feb 10, 2017

bamb00 commented Feb 22, 2017

thockin commented May 31, 2017

bamb00 commented Jun 6, 2017

timchenxiaoyu commented Jun 9, 2017

thockin commented Jun 9, 2017 via email

kfox1111 commented Jul 5, 2017

bamb00 commented Sep 14, 2017 • edited Loading

jamiehannaford commented Oct 9, 2017

jamiehannaford commented Nov 2, 2017

bboreham commented Nov 2, 2017

jamiehannaford commented Nov 2, 2017

jamiehannaford commented Nov 10, 2017

mindscratch commented Nov 29, 2017

bamb00 commented Nov 29, 2017

bboreham commented Nov 30, 2017

bboreham commented Nov 30, 2017

SpComb commented Mar 6, 2018 • edited Loading

pires commented Jan 4, 2017 •

edited

Loading

pires commented Feb 2, 2017 •

edited

Loading

bamb00 commented Sep 14, 2017 •

edited

Loading

SpComb commented Mar 6, 2018 •

edited

Loading