Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolution seems broken with nftables and forwardKubeDNSToHost in development version #9196

Closed
alkersan opened this issue Aug 19, 2024 · 3 comments · Fixed by #9200
Closed
Assignees

Comments

@alkersan
Copy link

Bug Report

Description

It seems that introduction of nftables being used by default in the current 1.8 development branch (tried v1.8.0-alpha.1-81-ge193e7db9) - breaks the forwardKubeDNSToHost. Here's what I did:

  • bootstrapped a single-node control plane node on AWS, from the latest published image, with pretty much all default parameters (all components are pointing to v1.31 of k8s already)
  • started a simple alpine:3.20 pod with hostNetwork: true solely to run nslookup:
apiVersion: v1
kind: Pod
metadata:
 name: test
 namespace: default
spec:
 restartPolicy: Never
 automountServiceAccountToken: false
 hostNetwork: true
 containers:
   - name: test
     image: alpine:3.20
     command: [ "/bin/sh" ]
     args:
       - "-c"
       - "trap : TERM INT; sleep infinity & wait"
 tolerations:
   - key: node-role.kubernetes.io/control-plane
     effect: NoSchedule
  • attached with shell and tried to lookup an ec2 endpoint within the same region:
> nslookup ec2.us-east-2.amazonaws.com
nslookup: write to '10.96.0.9': Operation not permitted
;; connection timed out; no servers could be reached

As can be seen - the address of the host-dns service is correct (the 9th ip addr), but no resolution happens. Though, network connectivity seems to be working fine, as simply pointing to a different nameserver works, e.g here is an answer from the Cloudflare NS:

> nslookup ec2.us-east-2.amazonaws.com 1.1.1.1
Server:		1.1.1.1
Address:	1.1.1.1:53

Non-authoritative answer:
Name:	ec2.us-east-2.amazonaws.com
Address: 99.78.178.238

What's interesting here, is that disabling forwardKubeDNSToHost brings the DNS resolution into functioning state, but that's doesn't seem like a solution.

So, I've started to play with different things, an accidentally tried to run kube-proxy with iptables backend instead, as before. I've reprovisioned the control-plane node with a single change in it's config file:

cluster:
  proxy:
    extraArgs:
      proxy-mode: iptables

After this change - DNS resolution worked:

> nslookup ec2.us-east-2.amazonaws.com
Server:		10.96.0.9
Address:	10.96.0.9:53

Non-authoritative answer:
Name:	ec2.us-east-2.amazonaws.com
Address: 99.78.176.220

Thus, I suspect that something isn't fully configured with the latest nftables rules or maybe it's kube-proxy mangling something.

Logs

Logs don't contain any suspicious errors or warnings. Cluster boostraps just fine, within 30 seconds, and node is marked ready.

Environment

  • Talos version:
Client:
	Tag:         v1.8.0-alpha.1-81-ge193e7db9
	SHA:         e193e7db
	Built:       
	Go version:  go1.22.6
	OS/Arch:     linux/amd64
Server:
	NODE:        talos.flab.dev
	Tag:         v1.8.0-alpha.1-81-ge193e7db9
	SHA:         e193e7db
	Built:       
	Go version:  go1.22.6
	OS/Arch:     linux/arm64
	Enabled:     RBAC
  • Kubernetes version:
Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.31.0
  • Platform: AWS, ARM64, AMI_ID: ami-0e29f054809ce5025 (talos-v1.8.0-alpha.1-81-ge193e7db9-us-east-2-arm64)
@smira
Copy link
Member

smira commented Aug 19, 2024

Please attach talosctl suport bundle and talosctl logs dns-resolve-cache.

10.96.0.9 is the address of the kube-dns (i.e. CoreDNS pod) running in Kubernetes. You seem to have problem reaching out to it, this is not host DNS yet. kube-dns should in turn route to host DNS, but it's not clear what happened there.

@smira
Copy link
Member

smira commented Aug 19, 2024

Oops, sorry, misread your post. You run with host networking, and 10.96.0.9 is indeed host DNS.

@smira
Copy link
Member

smira commented Aug 19, 2024

I was able to reproduce this issue

@alkersan alkersan changed the title DNS resolution seems broken with nftables and forwardKubeDNSToHost in development verision DNS resolution seems broken with nftables and forwardKubeDNSToHost in development version Aug 19, 2024
smira added a commit to smira/talos that referenced this issue Aug 19, 2024
This is an attempt to fix many issues related with trying to use Service
IP for host DNS.

Fixes siderolabs#9196

Signed-off-by: Andrey Smirnov <[email protected]>
smira added a commit to smira/talos that referenced this issue Aug 19, 2024
This is an attempt to fix many issues related with trying to use Service
IP for host DNS.

Fixes siderolabs#9196

Signed-off-by: Andrey Smirnov <[email protected]>
@smira smira self-assigned this Aug 19, 2024
smira added a commit to smira/talos that referenced this issue Aug 19, 2024
This is an attempt to fix many issues related with trying to use Service
IP for host DNS.

Fixes siderolabs#9196

Signed-off-by: Andrey Smirnov <[email protected]>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants