Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolution stop working after k3s installation #3624

Closed
umairyounus opened this issue Jul 13, 2021 · 18 comments
Closed

DNS resolution stop working after k3s installation #3624

umairyounus opened this issue Jul 13, 2021 · 18 comments

Comments

@umairyounus
Copy link

Environmental Info:
K3s Version:
k3s version v1.21.2+k3s1 (5a67e8d)
go version go1.16.4

Node(s) CPU architecture, OS, and Version:
Linux dell1 5.11.0-22-generic #23-Ubuntu SMP Thu Jun 17 00:34:23 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
1 server, 3 agents

Describe the bug:
After doing a fresh installation of the ubuntu 20.01 server, when I installed k3s the DNS resolution on the machine stops working. DNS resolution doesn't work on the host machine and inside the pods as well. I've tried it on a few different machines same behaviour.

DNS works fine before k3s agent installation, looks like there is some kind of conflict.

Steps To Reproduce:

  • Fresh ubuntu-21.04 installation

  • install k3s as an agent

  • Installed K3s:

Expected behavior:
DNS resolution should work

Actual behavior:
name resolution failed

Additional context / logs:
/ # ping google.com
ping: bad address 'google.com'

@brandond
Copy link
Member

I run K3s extensively on Ubuntu, and QA validates on it as well. I can't reproduce this issue. Can you share any more information about your environment or workload?

@umairyounus
Copy link
Author

umairyounus commented Jul 14, 2021

Hi Brandond,

Thanks for a prompt reply. I've just installed a fresh copy of ubuntu on raspberry pi and the dell machine. Raspberry pi is a master node and dell is a worker node. Now DNS is working on the host machine and pods on the master node but doesn't work inside pods on the dell machine. It's a fresh installation. Here is a copy of resolv.conf from the pod

search default.svc.cluster.local svc.cluster.local cluster.local localdomain
nameserver 10.43.0.10
options ndots:5

@brandond
Copy link
Member

That sounds like cluster networking (flannel CNI) isn't working. Can you confirm that you've disabled firewalld/ufw on both nodes, and that there's not anything else between the two filtering traffic?

@umairyounus
Copy link
Author

Both firewalld and ufw are disabled, there is no firewall in between. Master node is successfully deploying pods. Just pods on the worker nodes can't resolve.

@umairyounus
Copy link
Author

The issue is related to Ubuntu 21.04. I've tried Ubuntu 21.04 on few different machines but the same result. After downgrading to Ubuntu 20.04 everything is working fine.

@brandond
Copy link
Member

Interesting. I believe we only technically support LTS releases of Ubuntu, but this is worth looking into regardless.

@brandond brandond added this to the v1.22.0+k3s1 milestone Jul 19, 2021
@trallnag
Copy link

trallnag commented Jul 22, 2021

@umairyounus, I'm using Ubuntu Server 21.04 and for me K3s and DNS is working fine. Installed K3s with k3sup.


Should not have commented, bad luck. Few minutes ago I ran apt upgrade and suddenly K3s broken down. All over the place errors like this:

panic: Get https://10.43.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.43.0.1:443: connect: connection refused

goroutine 1 [running]:
main.main()
        /go/src/github.com/kubernetes-incubator/metrics-server/cmd/metrics-server/metrics-server.go:39 +0x13b

Reinstalled K3s and now it works again (all my persistent storage is using hostPath so I can dump K3s without worries).

@xtenduke
Copy link

xtenduke commented Jul 25, 2021

I resolved this with K3s v1.19.13 on Ubuntu 21.04 on RaspberryPi 4B wby using host-gw flannel config
--flannel-backend=host-gw

I was experiencing DNS resolution issues in pods (works fine on host)

I have 4 Ubuntu 21.04 Nodes running Ubuntu 21.04, I installed with k3sup
They have static IPs and are configured as 192.168.1.20 - through 192.168.1.23 with 8.8.8.8 & 8.8.4.4
IPV6 is disabled
Other than that, they are stock installs of Ubuntu's 21.04 RaspberryPi distribution

Maybe 21.04 has issues with vxlan?
I am happy to help with debugging

@WhiteBahamut
Copy link

Same for me, also switched to --flannel-backend=host-gw. I did some googeling (cant find the links anymore), but it seems there is a bug in flannel for vxlan

@sonicbells
Copy link

I experienced the same issue on Ubuntu 20.04, and the following helped me (maybe not ideal, but worked for me):
added DNS=1.1.1.1 to the /etc/systemd/resolved.conf on the host, and then systemctl restart systemd-resolved - pods started resolving external names correctly.

@WhiteBahamut
Copy link

And you kept vxlan mode?

@fapatel1 fapatel1 modified the milestones: v1.22.0+k3s1, v1.22.2+k3s1 Aug 23, 2021
@manelpb
Copy link

manelpb commented Aug 25, 2021

Also having the same issues with ubuntu 21

@markmcgookin
Copy link

markmcgookin commented Sep 6, 2021

Hey, I am experiencing this too.... I have a k3s cluster that I just rebuilt from scratch (because I thought this issue would go away with a fresh build... nope)

Happy to try anything and debug anything as this cluster is now useless until I get this sorted.

Setup

master node - Pi4B 8GB
nodes a,b,c - Pi4B 4GB

Installed Ubuntu Server 64 via raspberry pi imager tool.

sudo apt-get update 
sudo apt-get upgrade

change hostnames to kmaster, knodea, knodeb, knodec

then install k3s


k3s version v1.21.4+k3s1 (3e250fdb)
go version go1.16.6

Do the container thing with cmdline.txt (cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory)

added a few secrets, some pvcs etc.. to support the deployments.

Pods download fine (DNS still works fine on hosts when you ssh onto them)

Pods that run code that calls to the outside world (in this case bitbucket api) fail with errors like

Unhandled exception. Flurl.Http.FlurlHttpException: Call failed. Resource temporarily unavailable (bitbucket.org:443) POST https://bitbucket.org/site/oauth2/access_token

I tried @sonicbells 'fix' and rebooted that node as well, deleted and redeployed the pod, no change, same failure.

Generally running a fresh stock install. Getting desperate here.

I have gone through all the K3S 'Debugging DNS issues' steps and simply found out that... my DNS doesn't work, but I don't know how to solve it. This cluster was previously running 19.04 I think and it worked fine with no extra steps, but one node failed, I had to rebuild it, RPI Imager didn't have that any more so I used 21.04 and the pod needs to be on that node, so nothing was working... assumed a full rebuild would help, so built all 4 nodes with 21.04 and it fully doesn't work now.

UPDATE:

I found this article as @xtenduke above mentioned using flannel ... I followed this tutorial to install it, deleted my pod, and recreated it, but the only thing that has changed is the calls now time out as opposed to being unable to resolve.

@trallnag
Copy link

trallnag commented Sep 6, 2021

Really weird I wonder why it started to work again for me using Ubuntu 21.04 after creating a K3s cluster from scratch

@markmcgookin
Copy link

markmcgookin commented Sep 6, 2021 via email

@anthonyoteri
Copy link

Had the same issue with DNS not resolving using Ubuntu 21.10 on VMs.

$ kubectl get nodes
NAME         STATUS   ROLES                  AGE   VERSION
k3s-master   Ready    control-plane,master   77s   v1.22.2+k3s2
k3s-node-1   Ready    <none>                 40s   v1.22.2+k3s2
k3s-node-2   Ready    <none>                 44s   v1.22.2+k3s2

Installed with k3s-ansible using the options --disable traefik --disable servicelb

DNS was unable to resolve anything from within the pods.

After changing my server arguments to --flannel-backend=host-gw --disable traefik --disable servicelb DNS started resolving.

@brandond
Copy link
Member

This is a duplicate of #4188

@matthew-ellis
Copy link

Was hitting the same DNS issue on a bootstrapped kubeadm cluster running on RHEL 8.4 servers with Flannel network installed. Changing the Backend Type in net-conf.json to host-gw in the yaml before applying to the cluster seems to have fixed it!

net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "host-gw" } }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests