-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: BSD: DNS go resolver loses first UDP outbound packet if onlink target not in neighbor cache #43398
Comments
While I can prove this happens with FreeBSD, I think I've seen it with Linux too. I just don't have things hitting DNS in the same way from Linux clients, as regularly, and I can't swear that I've seen it on Linux. I first saw this behavior with the natscli(1) tool for talking to NATS servers, so I wrote a client which did what I wanted in one process. But I still saw the behavior, and after a lot of head-scratching, I put in a pre-flight DNS resolution and managed to isolate the delays there. I wrote a simplified DNS resolution tool to just do the DNS lookups, with timing. And then late last night I got very lucky and managed to trigger the delay again, with just the simplified tool, while I was running a tcpdump looking for the delay. I was expecting to see retransmissions. Not to see that the response came back immediately and the client never asked that DNS question again, so it did process and accept the response, eventually. |
What is the contents of |
Should have said, sorry:
|
|
I kept periodically trying again with a forced resolver, finally got a 5seconds delay: Go resolver.
|
I can now reproduce this more reliably: if the IP address of the first DNS resolver is not in the ARP cache, then I can not reproduce this "stalling after clearing ARP cache" with C programs doing DNS work, only with Go programs. |
That's weird. I don't understand why the ARP cache would be involved at all. 5 seconds is the default timeout for waiting for a response from a DNS server. So it seems that Go is not recognizing the DNS server's response for some reason. But also Go normally sends both an A and an AAAA query at the same time, unless |
Some rebuilding later: if the DNS resolvers in My assumption is that the ARP and NDP caches have to be populated before the outbound query can be passed onto the DNS server, so these should have been populated before the response comes back. The FreeBSD environments where these tests are running are FreeBSD Jails on a FreeNAS box, with vnet virtualized network stacks connected to the NAS's bridge, which is bridged onto the home network. The NAS and the DNS resolvers are all wired ethernet. IPv6 is ULA with no global connectivity. There's no BPF inside the jail, so I can't tcpdump there; I can try to change that to see if the tcpdump from the host vs from the jail see different packets. Without a reset ARP cache, I see the Go app send out the AAAA query just before the A query, in what looks like a Happy Eyeballs approach. So at this point, it looks like if there's no entry in the cache, the first packet is held up, or dropped, before the timeout triggers a retransmit, while the second DNS UDP packet goes through fine. This can't be a generic "UDP packet dropped when entry not in cache" issue though, since I don't see delays in this scenario with C language DNS tools. |
Unless the connected UDP socket is in non-blocking mode, so the Go code is getting an -1/EAGAIN response to the UDP send when it's an on-link address with no entry in the appropriate cache, and the Go code is not handling it that EAGAIN? Rampant speculation, sorry, but it's the only thing which makes any kind of sense to me. |
The Go standard library always uses non-blocking network I/O, so it always gets |
With hindsight: the fact that when the AAAA packet is eventually observed in the tcpdump, it's going to the second DNS resolver is a stronger signal that "something was lost" with the original AAAA query. So it's not "stalls for timeout before trying AAAA" as I originally titled this, but "loses first transmitted DNS query and retries to second resolver after timeout". |
I can confirm that with |
The Go DNS resolver should normally send out queries for both the A and the AAAA records simultaneously, so it's weird that that is not happening. |
Found it. The parallel transmission is triggering BSD behavior (shared with macOSX, but Go forces cgo DNS resolver on macOSX for other reasons) of dropping all but the most recent UDP packet for a destination on the local network when the destination is not in the ARP cache.
So the difference between C programs and Go programs is that most C programs are not doing happy-eyeballs by default, so there's only one concurrent DNS request pending. Go's DNS resolver is too aggressive for the OS's UDP stack when the DNS resolver is not in the relevant neighbor cache. |
Thanks, nice analysis. So in practice this only matters if the DNS server is on the local network, rather than being reached through a router. Now we have to figure out what to do about it. |
On Linux,
On my laptop, using wifi, this value appears to have been bumped to 101, for default as well as the wifi interface. |
I think that the same applies if the DNS resolver is reached through a router, if the router is not currently in the ARP cache. This seems like something the BSDs should be fixing in a world of Happy Eyeballs. But that doesn't help us now. In my case, I had a gitolite server using a post-commit hook to publish to NATS; the server runs in a VNET jail, so gitolite is normally the only thing populating the ARP cache, so I'd see 5 second stalls "annoyingly often" when doing a git push. I've worked around it for now, since these are all static hosts and known to my config management system, by publishing static ARP/NDP files with entries for the DNS servers, flagged as permanent entries, so I won't see timeouts here. I can remove this to help test any proposed solutions. Spitballing:
Excuse me, I have to go tune a sysctl ... |
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252278 filed asking the FreeBSD maintainers to raise the default, which is a longer-term mitigation. |
I"m not quite sure that this is a problem to be fixed on the Go side at all. The timeout could also be reduced in /etc/resolv.conf. Or /etc/resolv.conf could set |
We build a client tool for multiple OSes, single-binary so env hacks are not viable. We added cross-compilation for FreeBSD and I've been testing the tools on FreeBSD. The stalls, only happening with Go, are significant enough to cause concern. We can't tune system settings for people installing our software. We generally try to avoid having problems which warrant FAQ entries. So releasing client tools and client libraries, where we can't tune the system or update Ideally we'd call a target-os-build-tag provided PlatformSetup function which we'd write, where on FreeBSD we'd grab the integer MIB for maxhold from sysctl and if it's 1, force on SingleRequest. Failing that, I guess we just document the limitation and ask people to tune their systems. |
As far as I can see this problem is quite unlikely to happen in real situations. It is likely that the DNS server is reached through a router, or is on the router itself; in such a case that router will reliably be in the ARP cache. When the DNS server is on the local network but is not the router, then any name lookup will contact the DNS server, so again the DNS server is likely to be in the ARP cache. So as far as I can tell the problem only occurs in cases that are quite unlikely to occur in practice, for all that they can occur in test setups. I'm not strongly opposed to fixing this in the Go standard library, but for what seems to me to be an unlikely case I don't see a good argument for adding new API, and I wouldn't want to see a lot of OS-specific code to check the ARP cache. We could consider making |
I did not encounter this in a test setup. This was real. Any virtualization using a virtualized network stack for the container will encounter it more often. I would expect to see this behavior in most offices, where DNS is not served from the home router, when FreeBSD is in use. FreeNAS is a popular NAS storage OS, so this will affect people running Go software in jails on their NAS. Of my three spitballed suggestions, number 3 seems the sanest to me. There are already multiple sysctl int retrievals in the net package on BSDs, in |
OK, want to send a patch? |
Sure. I suspect my contributor status might have lapsed, and I'm not at Google, so my 2009-era status might need a refresh? (My golang.org email forwarding was removed when I left, for sure) I can set up gerrit accounts etc, but I'm also happy for you to just take this patch and brush it up. LMK which way you'd like me to go here. Unfortunately the current tip doesn't install on FreeBSD, broken tests, and I'm not chasing that down right now, so this is a "rough draft" of what I think the solution looks like. I've chased down the manual-pages on the current BSDs to back the claims I make here. From 916c9766b40249d13e9a3e4e9d02eb954d03b296 Mon Sep 17 00:00:00 2001
From: Phil Pennock <[email protected]>
Date: Sun, 3 Jan 2021 00:03:16 -0500
Subject: [PATCH] Throttle DNS to singleRequest where ARP limited
The concurrent DNS resolution of the Go resolver is not safe when the next-hop
IP address of the DNS resolver is not currently in the neighbor cache (ARP for
IPv4, NDP for IPv6), unless the kernel will retain more than one packet for a
given IP while awaiting ARP/NDP resolution.
On Linux, the default number of packets to hold while waiting is 101 and
there's no problem. Many OSes still use historical BSD sockets behavior of
limiting this to just 1. FreeBSD and Darwin use a sysctl to let the limit be
varied, but while Darwin raises the default, FreeBSD currently still defaults
to 1.
If we think the limit is 1 and any of the resolvers are not on a loopback IP,
then implicitly set singleRequest for the DNS resolver.
Resolves #43398
---
src/net/conf.go | 27 +++++++++++++++++++++++++++
src/net/limits.go | 18 ++++++++++++++++++
src/net/limits_bsdconst.go | 15 +++++++++++++++
src/net/limits_bsddyn.go | 26 ++++++++++++++++++++++++++
src/net/limits_linux.go | 32 ++++++++++++++++++++++++++++++++
src/net/limits_other.go | 16 ++++++++++++++++
6 files changed, 134 insertions(+)
create mode 100644 src/net/limits.go
create mode 100644 src/net/limits_bsdconst.go
create mode 100644 src/net/limits_bsddyn.go
create mode 100644 src/net/limits_linux.go
create mode 100644 src/net/limits_other.go
diff --git a/src/net/conf.go b/src/net/conf.go
index f1bbfedad0..66bdc3e7e2 100644
--- a/src/net/conf.go
+++ b/src/net/conf.go
@@ -107,6 +107,33 @@ func initConfVal() {
confVal.forceCgoLookupHost = true
}
+ // LookupHost stalls of DNS timeout are seen when the DNS resolver is
+ // reached via a next-hop IP not currently in the ARP/NDP cache if the
+ // kernel only keeps the very latest packet for that IP while waiting for
+ // ARP/NDP resolution. Our Go resolver concurrency results in performance
+ // loss. We can't sanely tell if it's safe "right now" without a lot more
+ // intricate ARP parsing and remaining time evaluation, so instead we just
+ // disable concurrent DNS on hosts where the limit is 1.
+ if safeMaxUDP, _ := MaxSafeConcurrentUDP(); safeMaxUDP == 1 {
+ // A common pattern is to have a localhost DNS resolver; it's common
+ // enough to be worth a hail mary check before throttling.
+ allLocal := true
+ for _, ipPortStr := range confVal.resolv.servers {
+ ipStr, _, err := SplitHostPort(ipPortStr)
+ if err != nil {
+ allLocal = false
+ break
+ }
+ if !ParseIP(ipStr).IsLoopback() {
+ allLocal = false
+ break
+ }
+ }
+ if !allLocal {
+ confVal.resolv.singleRequest = true
+ }
+ }
+
if _, err := os.Stat("/etc/mdns.allow"); err == nil {
confVal.hasMDNSAllow = true
}
diff --git a/src/net/limits.go b/src/net/limits.go
new file mode 100644
index 0000000000..63eb1ec8fd
--- /dev/null
+++ b/src/net/limits.go
@@ -0,0 +1,18 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package net
+
+// MaxSafeConcurrentUDP is the safe limit on concurrent UDP packets to a
+// destination when the local next-hop IP is not in the L2 cache (assuming
+// ethernet; ARP for IPv4, NDP for IPv6). Until the MAC address is known, most
+// OSes will hold "a small number" of packets for a given next-hop and then
+// start discarding earlier packets. Unfortunately this limit is still 1 on
+// some OSes.
+//
+// The maxSafe return value will be 1 in error scenarios, so should always be a
+// safe value to use as a safe limit.
+func MaxSafeConcurrentUDP() (maxSafe int, ok bool) {
+ return maxSafeConcurrentUDP()
+}
diff --git a/src/net/limits_bsdconst.go b/src/net/limits_bsdconst.go
new file mode 100644
index 0000000000..1f5ca38d74
--- /dev/null
+++ b/src/net/limits_bsdconst.go
@@ -0,0 +1,15 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build dragonfly netbsd openbsd
+
+package net
+
+// Unless and until we know of something like the FreeBSD sysctl, we rely upon
+// the manual-page arp(4) documented limit.
+// As of 2021-01, NetBSD, OpenBSD and DragonFlyBSD all document a static value
+// of 1.
+func maxSafeConcurrentUDP() (maxSafe int, ok bool) {
+ return 1, true
+}
diff --git a/src/net/limits_bsddyn.go b/src/net/limits_bsddyn.go
new file mode 100644
index 0000000000..d5294b406c
--- /dev/null
+++ b/src/net/limits_bsddyn.go
@@ -0,0 +1,26 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build freebsd darwin
+
+package net
+
+import (
+ "syscall"
+)
+
+// On FreeBSD, as of 2021-01 the default untuned value of the sysctl is 1; on a
+// test mac mini running Darwin 17.7.0, the value observed is 16.
+func maxSafeConcurrentUDP() (maxSafe int, ok bool) {
+ var (
+ n uint32
+ err error
+ )
+ n, err = syscall.SysctlUint32("net.link.ether.inet.maxhold")
+ if n == 0 || err != nil {
+ return 1, false
+ }
+ return int(n), true
+
+}
diff --git a/src/net/limits_linux.go b/src/net/limits_linux.go
new file mode 100644
index 0000000000..c602731e22
--- /dev/null
+++ b/src/net/limits_linux.go
@@ -0,0 +1,32 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build linux
+
+package net
+
+func maxSafeConcurrentUDP() (maxSafe int, ok bool) {
+ // For Linux, we return the default value, not the specific value for a
+ // given interface, as being "close enough". We don't want code sending
+ // packets to destinations to also need to care about which interface the
+ // packets are going out of: the complexity would spread too far up the
+ // APIs to support an edge scenario. Modern Linux has sane defaults anyway
+ // and is not the reason for this check.
+ fd, err := open("/proc/sys/net/ipv4/neigh/default/unres_qlen")
+ if err != nil {
+ return 1, false
+ }
+ defer fd.close()
+ l, ok := fd.readLine()
+ if !ok {
+ return 1, false
+ }
+ f := getFields(l)
+ n, _, ok := dtoi(f[0])
+ if n == 0 || !ok {
+ return 1, false
+ }
+
+ return n, true
+}
diff --git a/src/net/limits_other.go b/src/net/limits_other.go
new file mode 100644
index 0000000000..1b48e9c0b6
--- /dev/null
+++ b/src/net/limits_other.go
@@ -0,0 +1,16 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !linux
+// +build !darwin,!freebsd,!netbsd,!openbsd
+// +build !dragonfly,!netbsd,!openbsd
+
+package net
+
+func maxSafeConcurrentUDP() (maxSafe int, ok bool) {
+ // Our assumption is that the old BSD socket behavior is the default unless
+ // and until we learn otherwise. This appears to be true on Windows, at
+ // least.
+ return 1, false
+}
--
2.28.0 |
If you're willing to send the change through Gerrit or GitHub, please do that, as it will ensure that we have the right copyright agreement. Thanks. |
SG. Talking to CEO/COO to double-check CLA status, will get this in Mon/Tue (I expect). Looks like very old Linux kernels don't have the proc path I reference above, but will have |
@philpennock Is https://gist.github.com/dmgk/964b472a73050633c6b25d7636b57e7f the test failure you're seeing? |
@dmgk I didn't see that much detail but it being net/http rings a bell. $currentEmployer now has an org CLA signed so I expect to get the patch submitted that way today. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I have recurring problems with DNS resolution taking 5+ seconds "sometimes", and only for Go software. I finally got lucky and got a tcpdump showing it.
dns_resolve.go
All DNS resolvers are fully functional (Unbound 1.13.0), all responding. The
.lan
zone is served from the resolvers, has an SOA and NS records, has IPv4 and IPv6.What did you expect to see?
A fraction of a second for
net.Resolver.LookupHost()
to return A and AAAA records.What did you see instead?
As you can see in this TCP dump output, the A response came in immediately, but it was 5 seconds before the
LookupHost()
call issued the AAAA query. There was no retry, the result from the first query was accepted. Go's DNS resolution just wedged for a duration which happens to be equal to the timeout period, before accepting the response.tcpdump output
The text was updated successfully, but these errors were encountered: