Gossip-based DNS embedded in the router. #1065

tomwilkie · 2015-07-02T17:13:01Z

Nameserver holds a sorted list of (hostname, peer, container id, ip) tuples.
Entries are gossiped using topology gossip / broadcast - only new entries are
propogated as expected.
Entries for a given peer are removed when the connection to the peer is lost.
Entries are tombstoned when containers die. Tombstones are kept for 10mins,
as they do not need to survive partition, only gossip propagation delay.
Hostname lookups are O(logn) in memory.
All other operations (reverse DNS, tombstoning, deletion, etc) are O(n) in memory.

Fixes #826, Fixes #987, Fixes #733, Fixes #944, Fixes #833, Fixes #645, Fixes #583, Fixes #338

Left to do:

After the bulk of the review, we need to:

squash & rebase
remove old weavedns code.
rename package to nameserver
update for mflags

Not doing in this PR (left to follow-on issues, to be filed)

Reduce the complexity of non-hostname-lookups (all of which are O(n))
Rate limit new entry broadcasts
Remove dns entries from weave status and add a separate command for them

tomwilkie · 2015-07-02T17:13:54Z

@rade made a PR as I don't want you comments on changesets to get lost as I push -f. Not ready to merge yet, but it just passed all the tests - https://circleci.com/gh/weaveworks/weave/1438

tomwilkie · 2015-07-02T17:14:51Z

From @rade: many weave ... commands actually work w/o weave running as long as ipam isn't needed. So I wouldn't just drop the when_dns_running. Replace it with when_weave_running instead.

rade · 2015-07-02T17:16:16Z

Not ready to merge yet

-> assigning to you in that case

rade · 2015-07-02T17:40:14Z

btw, I'd prefer if you didn't rebase unless necessary, and especially didn't squash commits - until the very end. Otherwise I'll get bored very quickly following this, since I'd have to re-read the entire diff over and over.

tomwilkie · 2015-07-02T17:41:29Z

Fair enough, I'll stop doing it until the end.

rade · 2015-07-03T12:39:46Z

Hostname lookups are O(logn), reverse DNS lookups are O(n), all in memory.

What is the complexity of:

removing an individual record (e.g. as a result of weave detach)
removing all records for a container (e.g. as a result of container death)
removing all records for a peer (e.g. as a result of the peer disappearing)
?

tomwilkie · 2015-07-03T12:49:51Z

Everything other than lookup-by-hostname is O(n), in memory.

There is a potential optimisation for receive gossip msg, to make merge ~O(logn). This is on the assumption most gossip message are only a few entries. For large ones O(n) is as good as you can get.

rade · 2015-07-03T12:51:54Z

We should file a follow-on issue to reduce the complexity.

tomwilkie · 2015-07-03T13:03:33Z

This PR also doesn't include rate limiting entry addition, which was part of the original design thoughts. Happy to leave that to another PR?

rade · 2015-07-03T13:06:38Z

rate limiting entry addition

File an issue for it.

inercia · 2015-07-05T21:56:47Z

I think this code does not reuse any of our current WeaveDNS code. What is the plan? Are we going to throw all the code we have now?

rade · 2015-07-06T06:57:45Z

I think there is a race here: I believe it is possible to receive some DNS gossip pertaining to a peer after that peer has been removed from Peers by GC. That would result in permanently stale records.

tomwilkie · 2015-07-06T08:50:54Z

@inercia thats right, its a fresh implementation. Some of the code has been borrowed (mainly the dns.go bits) but they've been rewritten. The API should be the same.

There is no plan, and what to do with the existing code is up to you and @rade. I guess they could coexist for a while if there is a good reason, but in the end I don't see why we would need both.

inercia · 2015-07-06T09:12:31Z

@tomwilkie I don't see the point in throwing all the code we currently have for DNS, specially when most of the code in our current nameserver is transport agnostic. The gossiping mechanism could be plugged in our current zone database, remove the mDNS code (and maybe the cache), and we could reuse the rest of the weaveDNS code... It is well tested code, and many features and solutions for corner cases are already there (for example, the code in this dns.go is far from implementing all the features we current have in our nameserver).

tomwilkie · 2015-07-06T09:13:22Z

@inercia what features are missing from gossipdns/dns.go?

inercia · 2015-07-06T09:27:26Z

@tomwilkie Some things...

fallback connections should use the same transport used by clients (eg, if the client uses TCP, we should use TCP to fallback servers)
you should also handle EDNS buffer sizes
you could also prune responses (there is no reason for returning 1000 answers, even when we have them)
this server is not tested, and maybe it is not easy to test (eg, things like etcResolvConf should be configurable)
you could also optionally cache responses...

tomwilkie · 2015-07-06T09:33:49Z

fallback connections should use the same transport used by clients (eg, if the client uses TCP, we should use TCP to fallback servers)

The intention is that they do. Have you spotted a bug?

you should also handle EDNS buffer sizes

Whats EDNS?

you could also prune responses (there is no reason for returning 1000 answers, even when we have them)

I intentionally don't want to do that.

this server is not tested, and maybe it is not easy to test (eg, things like etcResolvConf should be configurable)

dns.go is just a shim onto nameserver.go, which is well tested. Its is also tested by the integration tests. I'm not convinced dns.go and http,go need tests.

you could also optionally cache responses...

All authoritative answers are served from memory by design, so no point in caching. All recursive answers are intentionally uncached - they would be doing a network call anyway, so why cache them?

Thanks for taking a look at the code - this is really useful feedback.

tomwilkie · 2015-07-06T10:40:04Z

I think there is a race here: I believe it is possible to receive some DNS gossip pertaining to a peer after that peer has been removed from Peers by GC. That would result in permanently stale records.

@rade under what situation are you thinking? I'm not familiar enough with the gossip / broadcast / peers code. I could add a list of "live" peers to the nameserver, and filter incoming gossips by that. I would need to add a "peer added" callback to peers.go if one doesn't already exist. Then we'd need to make sure we get the peer-added callback before any gossips from the peer. Thoughts?

rade · 2015-07-06T10:55:15Z

under what situation are you thinking?

There is no connection between the peer topology gossip and nameserver gossip. Especially not when multiple peers are involved. e.g. Peer A could detect B's death and still receive nameserver gossip from B containing records pertaining to C.

So I'd think it can happen quite easily.

we'd need to make sure we get the peer-added callback before any gossips from the peer.

That's just the same problem, in reverse. But it's less of an issue as long as it only goes wrong rarely - period gossip will fill in the blanks.

Also, instead of a callback you could just ask peers whenever adding records.

tomwilkie · 2015-07-06T11:59:05Z

@rade I've added code to filter gossip messages based on the contents of peers. Had to be careful to avoid deadlocks, which meant I had to take the peers lock in a bit of a smelly way. Let me know what you think.

Other than EDNS (what is it?) I think this might be close to done. Who do you want to review this?

gossipdns/nameserver.go

+	// write lock.
+	if n.peers != nil {
+		n.peers.RLock()
+		defer n.peers.RUnlock()


router/peers.go

+	peers.onGC = append(peers.onGC, callback)
+}
+
+func (peers *Peers) triggerOnGC(removed []*Peer) {


rade · 2015-07-06T18:27:12Z

How sure are your that moving the onGC callback invocation outside the lock is safe? I am concerned about race conditions where another thread might re-add a peer before the callback has been invoked. I reckon this is ok for the existing onGC callback, the MacCache cleanup, since it's a cache so worst case an incorrect removal results in a performance hit. But is it ok for the new one?

prog/weaver/main.go

@@ -262,13 +290,15 @@ func determineQuorum(initPeerCountFlag int, peers []string) uint {
 	return quorum
 }

-func handleHTTP(router *weave.Router, httpAddr string, allocator *ipam.Allocator, defaultSubnet address.CIDR, docker *docker.Client) {
+func handleHTTP(router *weave.Router, httpAddr string, allocator *ipam.Allocator, defaultSubnet address.CIDR, docker *docker.Client, ns *nameserver.Nameserver, dnsserver *nameserver.DNSServer) {


tomwilkie · 2015-07-10T18:57:10Z

Merged coverage report from both integration and unit test: https://circle-artifacts.com/gh/weaveworks/weave/1724/artifacts/0/tmp/circle-artifacts.PPiWyIL/coverage.html

dns.go 87.6%
entry.go 92.6%
http.go 52.0%
nameserver.go 96.2%

http.go is dragged down as its a short file with unexercised error handling, and has the GET handler for /name/{hostname} which is currently unused and should be removed (will do that now)

tomwilkie · 2015-07-13T10:17:39Z

@inercia how are we getting on with this review? Are we getting close to an LGTM?

- Nameserver holds a sorted list of (hostname, peer, container id, ip) tuples. - Entries are gossiped using topology gossip / broadcast - only new/updated entries are propogated as expected. - Entries for a given peer are removed when the connection to the peer is lost. - Entries are tombstoned when containers die. Tombstones are kept for 10mins, as they do not need to survive partition, only gossip propagation delay.

Gossip-based DNS embedded in the router.

This wasn't spotted because #1065, which removed launch-dns, got merged just before #1126, which introduced this test.

These tests were introduced in #1065 and #1126.

since $CONTAINER gets stomped on by various functions, in particular populate_dns. Broken by #1065.

Thus unbreaking it post #1065.

rade assigned tomwilkie Jul 2, 2015

tomwilkie force-pushed the gossipdns branch 2 times, most recently from 39b0eee to e660a0b Compare July 2, 2015 17:23

tomwilkie force-pushed the gossipdns branch from 96695ce to 29d841f Compare July 6, 2015 12:07

rade reviewed Jul 6, 2015
View reviewed changes

tomwilkie changed the title ~~WIP Gossip-based DNS embedded in the router.~~ Gossip-based DNS embedded in the router. Jul 6, 2015

awh mentioned this pull request Jul 6, 2015

weave launch hangs on IPAM claim #1083

Closed

tomwilkie removed their assignment Jul 6, 2015

rade reviewed Jul 6, 2015
View reviewed changes

router/peers.go

peers.onGC = append(peers.onGC, callback)

}

func (peers *Peers) triggerOnGC(removed []*Peer) {

This comment was marked as abuse.

Sign in to view

inercia reviewed Jul 10, 2015
View reviewed changes

tomwilkie force-pushed the gossipdns branch from 5c7c87b to 0b4b125 Compare July 10, 2015 18:37

tomwilkie force-pushed the gossipdns branch 2 times, most recently from da3bcf0 to 6748c8e Compare July 13, 2015 10:16

tomwilkie force-pushed the gossipdns branch 2 times, most recently from 36ec2c7 to d46301c Compare July 14, 2015 14:42

Tom Wilkie added 3 commits July 14, 2015 16:00

Remove legacy WeaveDNS code.

01f7201

Resolve unqualified domains in the weave.local domain.

361ed1b

tomwilkie force-pushed the gossipdns branch from d46301c to 361ed1b Compare July 14, 2015 16:06

inercia added a commit that referenced this pull request Jul 14, 2015

Merge pull request #1065 from weaveworks/gossipdns

a1ac031

Gossip-based DNS embedded in the router.

inercia merged commit a1ac031 into master Jul 14, 2015

This was referenced Jul 14, 2015

Reduce the complexity of operations other than hostname-lookup #1139

Open

Rate limit new entry broadcasts #1140

Open

Remove dns entries from weave status and add a separate command for them #1141

Closed

rade deleted the gossipdns branch July 14, 2015 16:34

tomwilkie mentioned this pull request Jul 14, 2015

[dns] what should go into DNS when a container has multiple weave IPs? #219

Open

rade added a commit that referenced this pull request Jul 14, 2015

fix breakage

86448a1

This wasn't spotted because #1065, which removed launch-dns, got merged just before #1126, which introduced this test.

rade added a commit that referenced this pull request Jul 14, 2015

rename tests so they actually run

3629221

These tests were introduced in #1065 and #1126.

rade modified the milestones: current, 1.1.0 Jul 15, 2015

rade added a commit that referenced this pull request Jul 19, 2015

weave launch-router wasn't returning the correct container id

64408b1

since $CONTAINER gets stomped on by various functions, in particular populate_dns. Broken by #1065.

rade added a commit that referenced this pull request Jul 19, 2015

disable dns in multiweave

ae23488

Thus unbreaking it post #1065.

rade mentioned this pull request Jul 19, 2015

Disable dns in multiweave #1193

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gossip-based DNS embedded in the router. #1065

Gossip-based DNS embedded in the router. #1065

tomwilkie commented Jul 2, 2015

tomwilkie commented Jul 2, 2015

tomwilkie commented Jul 2, 2015

rade commented Jul 2, 2015

rade commented Jul 2, 2015

tomwilkie commented Jul 2, 2015

rade commented Jul 3, 2015

tomwilkie commented Jul 3, 2015

rade commented Jul 3, 2015

tomwilkie commented Jul 3, 2015

rade commented Jul 3, 2015

inercia commented Jul 5, 2015

rade commented Jul 6, 2015

tomwilkie commented Jul 6, 2015

inercia commented Jul 6, 2015

tomwilkie commented Jul 6, 2015

inercia commented Jul 6, 2015

tomwilkie commented Jul 6, 2015

tomwilkie commented Jul 6, 2015

rade commented Jul 6, 2015

tomwilkie commented Jul 6, 2015

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

rade commented Jul 6, 2015

This comment was marked as abuse.

This comment was marked as abuse.

tomwilkie commented Jul 10, 2015

tomwilkie commented Jul 13, 2015

Gossip-based DNS embedded in the router. #1065

Gossip-based DNS embedded in the router. #1065

Conversation

tomwilkie commented Jul 2, 2015

tomwilkie commented Jul 2, 2015

tomwilkie commented Jul 2, 2015

rade commented Jul 2, 2015

rade commented Jul 2, 2015

tomwilkie commented Jul 2, 2015

rade commented Jul 3, 2015

tomwilkie commented Jul 3, 2015

rade commented Jul 3, 2015

tomwilkie commented Jul 3, 2015

rade commented Jul 3, 2015

inercia commented Jul 5, 2015

rade commented Jul 6, 2015

tomwilkie commented Jul 6, 2015

inercia commented Jul 6, 2015

tomwilkie commented Jul 6, 2015

inercia commented Jul 6, 2015

tomwilkie commented Jul 6, 2015

tomwilkie commented Jul 6, 2015

rade commented Jul 6, 2015

tomwilkie commented Jul 6, 2015

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

rade commented Jul 6, 2015

This comment was marked as abuse.

This comment was marked as abuse.

tomwilkie commented Jul 10, 2015

tomwilkie commented Jul 13, 2015