-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNS failed when using more than one node #751
Comments
I've got the same issue last month, I'm using k3s v0.7.0 on master, with 3 nodes, but coredns deploy only in one node, even if the node scheduling is "beta.kubernetes.io/os = linux". EDIT: In my case I've tested running 2 k3s VM on VBox (but using shared network), with the same OS as I run on my server, ubuntu 18.04, and the default DNS os k3s worked fine. So, I think my problem is related with my "router" (mikrotik), even if I disabled all the firewall rules, bu my server are behind a NAT too. I will keep trying. EDIT 2: Well, I've installed a new cluster using the same setup using kubeadm with weave cni, one VM on DigitalOcean (all-in-one k8s) and one machine (bare metal) behind the NAT (Mikrotik); forwarded all port (dst-nat) to my local node. Pod communication worked fine, but don't work if enable encryption of weave net (by default if using the Rancher command to deploy the cluster). |
Same LAN here. |
I've patched 0.9.1 to use host-gw instead of vxlan and all problems disappeared. Since 0.10 brings options for flannel are you interested in a patch to enable host-gw ? Diffs for 0.9.1 are very small
|
I think it would be okay to have an option for It would be good to get to the bottom of the issue tho. |
I have a similar issue, not sure if it's the same. I noticed this problem when deploying k3s to more than one node. In my case it seems the master node cannot resolve dns while the other nodes can. So any workload ending up on the master fails connecting to things. For example deploying external-dns or cert-manager, if they end up on the master they fail. |
I have a 1.0 version of k3s cluster with 3 masters and 2 agents, same problem here, to add some details:
|
Another workaround/fix is to use NodeLocal DNSCache https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/.
Hope it helps. |
I'm thinking this issue happen when your dns server is one of the hosts itself. |
I feel like I have a similar problem. With only master node, DNS are working well.
Indeed ! My DNS server is deployed on the same host. |
Can confirm this is not universal. I believe I am running into this issue, and my upstream DNS server is external to both the router and any k3s node. I had thought that perhaps it might be an issue with a mixed-architecture cluster; my master is running on a Raspberry Pi 4 with Raspbian Buster; I have one worker node on AMD64/Ubuntu 18.04. I haven't been able to test out the multi-arch theory due to lack of nodes (I only have the one RPi right now). Another commonality I see mentioned in this thread is that I have a MikroTik router. I will go down that rabbit hole here momentarily. I think it is a fair possibility that this or something host-OS side could be the issue and it's related to VXLAN, because I can only ping pod IPs on the local node in the cluster. |
And I've resolved my issue. Check your firewalls -- make sure that your nodes can communicate with each other on UDP port 8472 (assuming you're using the default VXLAN backend for Flannel). @akenakh this could explain why host-gw backend was working and VXLAN was not, in your case. |
This seems to be the case for me. Everything was working fine when I had my DHCP and DNS handled by my router and forwarding DNS requests to a DNS server inside my cluster (PiHole). When I tried changing the DHCP and DNS to use PiHole directly (no changes to the pods, only router settings), the pods using Playing around inside one of the host network pods, I found that queries to EDIT: Sorry, I had some config here to try get all names to resolve but it seemed to only go to the first nameserver. Specifying both didn't have the desired effect. |
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions. |
I know this issue is long closed/stale, but just wanted to comment that still works for me. Default flannel mode is vxlan for k3s installations and opening 8472 port on Ubuntu hosts worked. These ports need to be open when you do multi-node clusters, I did not notice it when using a single node. Still works as of version: |
I'm chasing this bug for months, pods can't talk to coredns on nodes agents.
On a fresh 0.8.1 arm64 deployment with 3 nodes (one master, two agents), but same issue existed with previous k3s version, kernel 4.4 or 5.3, host is Arch.
iptables v1.8.3 (legacy)
Using default install script
Expected:
10.43.0.10 DNS service (and I assume the whole network) should be correctly setup on each nodes.
It's easy to test since the host can't reach the dns when the problem appears.
I found that scaling up the coredns deployment displaces the working node to an agent, making the master node unable to reach the DNS.
For some reasons, sometimes, it just works from, the 3 nodes, but most of the time it doesn't.
I've tried to start the agent manually after the boot sequence is complete with no luck, compared the iptables output, everything is fine ...
I've also tried to point coredns to 8.8.8.8 directly with no result.
The text was updated successfully, but these errors were encountered: