Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apparent DNS failure in Docker image alpine:3.8, nslookup: can't resolve '(null)' #476

Closed
chrisinmtown opened this issue Jan 23, 2019 · 20 comments

Comments

@chrisinmtown
Copy link

chrisinmtown commented Jan 23, 2019

Seeing odd DNS behavior in Docker image alpine:3.8. I'm baffled that nslookup complains yet finds the IP address. In comparison, ping works perfectly. See below. This little test runs a docker container to resolve the name of the host VM. If I can get that working the next test will be to have the docker container resolve the name of other running containers.

Traced this back from behavior in an OpenJDK image, in which Java cannot resolve host names. I'd really prefer to use an Alpine version of a Java/JRE image, it's half the size of a non-Alpine (debian) Java/JRE image, but this network glitch is kind of a killer.

So far I've run this test under a plain Ubuntu VM running docker 17.05.0-ce and under Kubernetes running docker version 18.09.1. Same behavior in both. I know there are many external variables that might affect this so it might not be an Alpine issue at all, altho issue #255 sure seems to be related.

Would someone possibly take a minute to explain please? Thanks in advance.

me@host-dev1-vm01-core:~$ docker run alpine:3.8 nslookup host-dev1-vm01-core

nslookup: can't resolve '(null)': Name does not resolve
Name:      host-dev1-vm01-core
Address 1: 10.1.0.6
me@host-dev1-vm01-core:~$ docker run alpine:3.8 ping host-dev1-vm01-core
PING host-dev1-vm01-core (10.1.0.6): 56 data bytes
64 bytes from 10.1.0.6: seq=0 ttl=64 time=0.061 ms
64 bytes from 10.1.0.6: seq=1 ttl=64 time=0.133 ms
^C
--- host-dev1-vm01-core ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.061/0.097/0.133 ms
@chrisinmtown chrisinmtown changed the title Apparent DNS failure in Docker image alpine:3.8 can't resolve '(null)' Apparent DNS failure in Docker image alpine:3.8, nslookup: can't resolve '(null)' Jan 23, 2019
@bboreham
Copy link

The BusyBox nslookup, which Alpine uses, does two lookups, one for the DNS server and one for the name you asked for. This can be seen here.

In your example nslookup did resolve the name host-dev1-vm01-core to the address 10.1.0.6.

The line can't resolve '(null)' says that, at that point, it didn't know what its DNS server was.
Looking at the code that might initialize it, we see why.
Sorry, I ran out of time chasing down the reason for that. Hope this is of some help.

@chrisinmtown
Copy link
Author

Thanks @bboreham for the note, could this failure to resolve the DNS server cause a host-name resolution failure in a Java program? I might be asking the wrong questions, but I guess I'm grasping at straws, trying to explain why in my environment & tests the Java image openjdk:8-jre-alpine (derived from alpine:3.8 image) fails but the Java image openjdk:8-jre-slim (derived from debian) works just fine.

@bboreham
Copy link

It’s not really a failure at all; it’s just a program printing something out that isn’t helpful or interesting.

What Alpine nslookup does has no bearing on what a Java program does.

@chrisinmtown
Copy link
Author

I ran nslookup to check whether the running container is able to resolve a name to an IP address. I hear you saying, nslookup is not a reliable indicator. Please suggest a better way.

@bboreham
Copy link

I'm not saying it is not a reliable indicator, I'm saying the line where it prints can't resolve '(null)' is not related to what you want to know.

Check the return code from nslookup; ignore that line.

@chrisinmtown
Copy link
Author

chrisinmtown commented Jan 25, 2019

I have not yet figured out the problem, pls see below for shortest possible Java debugging material, hope this will help other people.

file ResolveHostName.java

import java.net.InetAddress;
public class ResolveHostName {
	public static void main(String[] args) throws Exception {
		if (args.length != 1)
			throw new IllegalArgumentException("Usage: program host-name-to-resolve");
		System.out.println("Resolving " + args[0]);
		System.out.println(InetAddress.getByName(args[0]).toString());
	}
}

file Dockerfile-alpine

FROM openjdk:8-jre-alpine
COPY res-host-name.jar /
ENTRYPOINT ["java", "-jar", "res-host-name.jar"]

file build.sh

#!/bin/bash
set -e -x
javac ResolveHostName.java
jar cvfe res-host-name.jar ResolveHostName ResolveHostName.class
docker build -f Dockerfile-alpine  .

@jcperezamin
Copy link

Please, include this fix in all alpine versions kubernetes/kubernetes#56903 (comment)

@EmpireJones
Copy link

EmpireJones commented Mar 5, 2019

@jcperezamin - I think that's a different issue. Also, I think that'd just be hacking in a workaround for the underlying issue, which is a linux kernel bug to be fixed in v5.0 - torvalds/linux@4e35c1c . Also, that change would break some IPv6 support.

@inter169
Copy link

inter169 commented Mar 6, 2019

  1. yes, you can ignore the alpine (busybox) nslookup outputting message "... can't resolve '(null)'", it's unnecessary, if you set the second param as a NS, the msg would be gone away:
# nslookup wx.qlogo.cn 100.100.2.136
Server:    100.100.2.136
Address 1: 100.100.2.136

Name:      wx.qlogo.cn
Address 1: 203.205.142.155
Address 2: 203.205.142.154

and the relative busybox codes (e.g. alpine3.8 is using http://busybox.net/downloads/busybox-1.28.4.tar.bz2 , networking/nslookup.c, function nslookup_main) and the debugging logs with strace showed that the program tried the DNS query for the given NS server ('forward' or 'reverse' DNS query, it depends on the second NS param you gave, if the NS param was some ns ip it did reverse query .aka PTR query) first of all, you got that msg because of the empty global NS server at the initialized time.

  1. my codes implements (workaround DNS intermittent delays of 5s kubernetes/kubernetes#56903 (comment)) removed the ipv6 from default behavior to decrease that dns timed out issue, at the docker image tier, please consider the pros and cons (it worked well for me, the nodejs/java/go/c++/python application that no needs ipv6), and the perfect solution to solve the dns timed out would be others (kernel conntrack racy fixes, local node dns cache ).

I have been running most of applications (java/go/nodejs/c/c++/python) based on my customized alpine docker for 1+ years.
and highly recommend running some debugging with strace, tcpdump on the docker, you would run docker on the privileged mode to get more system capability,
can you please post some strace / tcpdump logs here?

thanks,
harper

Iristyle added a commit to Iristyle/puppetdb that referenced this issue May 2, 2019
 - There appear to be many reported issues against Alpine DNS. This
   is an attempt to work around the ones we're experiencing.

 - In local testing (specifically under LCOW), DNS resolution under
   Alpine seems to be very problematic.

   `nslookup` may repeatedly fail to perform a DNS resolution against
   another container name like `puppet.local` repeatedly.

   Lookup failures will resemble something like:

     / # nslookup puppet.local
     nslookup: can't resolve '(null)': Name does not resolve

     nslookup: can't resolve 'puppet.local': Name does not resolve

   Even successes have problems with the DNS server

     / # nslookup puppet.local
     nslookup: can't resolve '(null)': Name does not resolve

     Name:      puppet.local
     Address 1: 172.17.212.25

 - Supposedly the "can't resolve '(null)'" part is innocuous, but it's
   unclear if that is the case.  More info at:

   nicolaka/netshoot#6
   gliderlabs/docker-alpine#476

 - It seems that just having the `bind-tools` package installed will
   increase the reliability, but after running dig once against the
   given host, intermittnet DNS resolution problems seem to go away

     / # nslookup puppet.local
     Server:         172.17.208.1
     Address:        172.17.208.1#53

     Non-authoritative answer:
     Name:   puppet.local
     Address: 172.17.212.25

   So the script is changed to query for the postgres hostname

 - We don't use curl here because we're mostly interested in making
   sure a host with a given name *should* exist.

   There are scenarios where host / dig will succeed, but latter
   checks with curl may not - and we want to differentiate those
   failure modes as much as possible

   https://serverfault.com/questions/335359/how-is-it-possible-that-i-can-do-a-host-lookup-but-not-a-curl
@xbmono
Copy link

xbmono commented Jun 20, 2019

I have similar issue in our Kubernetes cluster when I try to get the ip of Kubernetes DNS server. I'm using the this docker image: nginx:1.16.0-alpine

/ # nslookup kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve '(null)': Name does not resolve

Name:      kube-dns.kube-system.svc.cluster.local
Address 1: 192.168.0.10 kube-dns.kube-system.svc.cluster.local

@WeihanLi
Copy link

WeihanLi commented Jun 20, 2019

fixed with config below with k8s deploy

spec:
  template:
    metadata:
      labels:
        app: activityreservation
    spec:
      dnsConfig:
        options:
          - name: ndots
            value: "1"

refer to https://github.com/WeihanLi/ActivityReservation/blob/d3e4de902af70ad1c85618db8a481f7fbfe1a964/k8s/reservation-deployment.yaml for details

@bboreham
Copy link

@xbmono if you are complaining about "can't resolve '(null)'" please read my explanation at #476 (comment). It is nothing to worry about. If something else, I suggest you open a new issue.

@Antiarchitect
Copy link

Antiarchitect commented Jul 29, 2019

Having the same issue with Alpine 3.9 (not Kubernetes case). Is there any workaround for this?

@vedmant
Copy link

vedmant commented Aug 22, 2019

I have the same problem with alpine:3.10

@sashok2k
Copy link

I have the same problem with alpine:3.10

Same for me..

@sscholl
Copy link

sscholl commented Sep 4, 2019

same here node:alpine brings this problem

Iristyle added a commit to Iristyle/puppetdb that referenced this issue Oct 28, 2019
 - There appear to be many reported issues against Alpine DNS. This
   is an attempt to work around the ones we're experiencing.

 - In local testing (specifically under LCOW), DNS resolution under
   Alpine seems to be very problematic.

   `nslookup` may repeatedly fail to perform a DNS resolution against
   another container name like `puppet.local` repeatedly.

   Lookup failures will resemble something like:

     / # nslookup puppet.local
     nslookup: can't resolve '(null)': Name does not resolve

     nslookup: can't resolve 'puppet.local': Name does not resolve

   Even successes have problems with the DNS server

     / # nslookup puppet.local
     nslookup: can't resolve '(null)': Name does not resolve

     Name:      puppet.local
     Address 1: 172.17.212.25

 - Supposedly the "can't resolve '(null)'" part is innocuous, but it's
   unclear if that is the case.  More info at:

   nicolaka/netshoot#6
   gliderlabs/docker-alpine#476

 - It seems that just having the `bind-tools` package installed will
   increase the reliability, but after running dig once against the
   given host, intermittnet DNS resolution problems seem to go away

     / # nslookup puppet.local
     Server:         172.17.208.1
     Address:        172.17.208.1#53

     Non-authoritative answer:
     Name:   puppet.local
     Address: 172.17.212.25

   So the script is changed to query for the postgres hostname

 - We don't use curl here because we're mostly interested in making
   sure a host with a given name *should* exist.

   There are scenarios where host / dig will succeed, but latter
   checks with curl may not - and we want to differentiate those
   failure modes as much as possible

   https://serverfault.com/questions/335359/how-is-it-possible-that-i-can-do-a-host-lookup-but-not-a-curl
@WeihanLi
Copy link

any workaround with docker here?

@ncopa
Copy link
Collaborator

ncopa commented Jan 15, 2020

The BusyBox nslookup, which Alpine uses, does two lookups, one for the DNS server and one for the name you asked for. This can be seen here.

This was very useful. There are apparently a different nslookup implementation available in busybox which can be enabled with CONFIG_FEATURE_NSLOOKUP_BIG and the comments there says that it is compatible with musl. I wil enable that and see if i can backport it to alpine:3.11 at least.

BTW, the official docker image has moved to https://github.com/alpinelinux/docker-alpine. Since this was a config option in upstream alpine, it would have been good if it was reported upstream to
https://gitlab.alpinelinux.org/alpine/aports

algitbot pushed a commit to alpinelinux/aports that referenced this issue Jan 15, 2020
The small nslookup does not work with musl, so lets enable the musl
compatible variant.

ref: gliderlabs/docker-alpine#476
algitbot pushed a commit to alpinelinux/aports that referenced this issue Jan 15, 2020
The small nslookup does not work with musl, so lets enable the musl
compatible variant.

ref: gliderlabs/docker-alpine#476
(cherry picked from commit cfb652d)
@ncopa
Copy link
Collaborator

ncopa commented Jan 15, 2020

This should be fixed in alpinelinux/aports@e5c984f and will be available next release (alpine:3.11.3). Meanwhile, you can get it with apk upgrade -U -a

@ncopa ncopa closed this as completed Jan 15, 2020
Cruikshanks added a commit to Cruikshanks/simple-rails-docker that referenced this issue Feb 28, 2020
This is a bit of an arse, but I have spotted since switching to alpine as the base image the healthchecks no longer work. If I `docker exec` onto the instance I can replicate the wget calls fine. But the healthchecks report the same thing in both

```json
{
  "Start": "2020-02-27T13:02:17.0507153Z",
  "End": "2020-02-27T13:02:17.2414162Z",
  "ExitCode": 1,
  "Output": "wget: bad address '|| exit 1'\n"
},
```

I believe the problem is a known issue with the DNS in Alpine images. What I'm struggling with right now is trying to find a clean example of what I need to do to resolve it

- https://medium.com/@xavier.priour/docker-alpine-dns-issue-bad-address-84594d128d9f
- https://forums.docker.com/t/resolved-service-name-resolution-broken-on-alpine-and-docker-1-11-1-cs1/19307
- gliderlabs/docker-alpine#476
- https://unix.stackexchange.com/questions/441664/alpine-linux-sometimes-dns-is-not-resolved
- docker/for-linux#755
- https://stackoverflow.com/questions/57202039/resolve-conf-cant-be-changed-docker-alpine
-

I have also tried playing with and removing the rails user (in case it was a permissions issue) and carrying out a `apk upgrade -U -a` as part of the build to ensure everything in the image is the latest and greatest but still no joy.

So as I never actually see these when I'm googling for examples, and I know the apps are currently working, I'm removing them for now. I would like to bring them back in later though once I've got a bit more time to look into the problem.
@jlegido
Copy link

jlegido commented Feb 18, 2021

A temporary workaround which worked for me: try to create the docker container in host mode:

--network host

binford2k pushed a commit to voxpupuli/container-puppetdb that referenced this issue Nov 1, 2022
 - There appear to be many reported issues against Alpine DNS. This
   is an attempt to work around the ones we're experiencing.

 - In local testing (specifically under LCOW), DNS resolution under
   Alpine seems to be very problematic.

   `nslookup` may repeatedly fail to perform a DNS resolution against
   another container name like `puppet.local` repeatedly.

   Lookup failures will resemble something like:

     / # nslookup puppet.local
     nslookup: can't resolve '(null)': Name does not resolve

     nslookup: can't resolve 'puppet.local': Name does not resolve

   Even successes have problems with the DNS server

     / # nslookup puppet.local
     nslookup: can't resolve '(null)': Name does not resolve

     Name:      puppet.local
     Address 1: 172.17.212.25

 - Supposedly the "can't resolve '(null)'" part is innocuous, but it's
   unclear if that is the case.  More info at:

   nicolaka/netshoot#6
   gliderlabs/docker-alpine#476

 - It seems that just having the `bind-tools` package installed will
   increase the reliability, but after running dig once against the
   given host, intermittnet DNS resolution problems seem to go away

     / # nslookup puppet.local
     Server:         172.17.208.1
     Address:        172.17.208.1#53

     Non-authoritative answer:
     Name:   puppet.local
     Address: 172.17.212.25

   So the script is changed to query for the postgres hostname

 - We don't use curl here because we're mostly interested in making
   sure a host with a given name *should* exist.

   There are scenarios where host / dig will succeed, but latter
   checks with curl may not - and we want to differentiate those
   failure modes as much as possible

   https://serverfault.com/questions/335359/how-is-it-possible-that-i-can-do-a-host-lookup-but-not-a-curl
binford2k pushed a commit to voxpupuli/container-puppetdb that referenced this issue Nov 1, 2022
 - There appear to be many reported issues against Alpine DNS. This
   is an attempt to work around the ones we're experiencing.

 - In local testing (specifically under LCOW), DNS resolution under
   Alpine seems to be very problematic.

   `nslookup` may repeatedly fail to perform a DNS resolution against
   another container name like `puppet.local` repeatedly.

   Lookup failures will resemble something like:

     / # nslookup puppet.local
     nslookup: can't resolve '(null)': Name does not resolve

     nslookup: can't resolve 'puppet.local': Name does not resolve

   Even successes have problems with the DNS server

     / # nslookup puppet.local
     nslookup: can't resolve '(null)': Name does not resolve

     Name:      puppet.local
     Address 1: 172.17.212.25

 - Supposedly the "can't resolve '(null)'" part is innocuous, but it's
   unclear if that is the case.  More info at:

   nicolaka/netshoot#6
   gliderlabs/docker-alpine#476

 - It seems that just having the `bind-tools` package installed will
   increase the reliability, but after running dig once against the
   given host, intermittnet DNS resolution problems seem to go away

     / # nslookup puppet.local
     Server:         172.17.208.1
     Address:        172.17.208.1#53

     Non-authoritative answer:
     Name:   puppet.local
     Address: 172.17.212.25

   So the script is changed to query for the postgres hostname

 - We don't use curl here because we're mostly interested in making
   sure a host with a given name *should* exist.

   There are scenarios where host / dig will succeed, but latter
   checks with curl may not - and we want to differentiate those
   failure modes as much as possible

   https://serverfault.com/questions/335359/how-is-it-possible-that-i-can-do-a-host-lookup-but-not-a-curl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests