Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Gossip-based DNS embedded in the router. #1065

Merged
merged 3 commits into from
Jul 14, 2015
Merged

Gossip-based DNS embedded in the router. #1065

merged 3 commits into from
Jul 14, 2015

Conversation

tomwilkie
Copy link
Contributor

  • Nameserver holds a sorted list of (hostname, peer, container id, ip) tuples.
  • Entries are gossiped using topology gossip / broadcast - only new entries are
    propogated as expected.
  • Entries for a given peer are removed when the connection to the peer is lost.
  • Entries are tombstoned when containers die. Tombstones are kept for 10mins,
    as they do not need to survive partition, only gossip propagation delay.
  • Hostname lookups are O(logn) in memory.
  • All other operations (reverse DNS, tombstoning, deletion, etc) are O(n) in memory.

Fixes #826, Fixes #987, Fixes #733, Fixes #944, Fixes #833, Fixes #645, Fixes #583, Fixes #338

Left to do:

  • Make weave launch-dns and weave stop-dns print warning and exit cleanly
  • Add useful status output
  • Audit logging
  • Update proxy to send dns commands to the router.
  • Make sure restart still works & add tests for that
  • make ttl configurable, audit args, arg naming
  • Add tests for large responses
  • Make when_weave_running change
  • Docs
  • Deal with OnGC-gossip race
  • Proactively broadcast tombstones
  • Add EDNS and truncation
  • Manually test mixed weave 1.0 / weave 1.1

After the bulk of the review, we need to:

  • squash & rebase
  • remove old weavedns code.
  • rename package to nameserver
  • update for mflags

Not doing in this PR (left to follow-on issues, to be filed)

  • Reduce the complexity of non-hostname-lookups (all of which are O(n))
  • Rate limit new entry broadcasts
  • Remove dns entries from weave status and add a separate command for them

@tomwilkie
Copy link
Contributor Author

@rade made a PR as I don't want you comments on changesets to get lost as I push -f. Not ready to merge yet, but it just passed all the tests - https://circleci.com/gh/weaveworks/weave/1438

@tomwilkie
Copy link
Contributor Author

From @rade: many weave ... commands actually work w/o weave running as long as ipam isn't needed. So I wouldn't just drop the when_dns_running. Replace it with when_weave_running instead.

@rade
Copy link
Member

rade commented Jul 2, 2015

Not ready to merge yet

-> assigning to you in that case

@tomwilkie tomwilkie force-pushed the gossipdns branch 2 times, most recently from 39b0eee to e660a0b Compare July 2, 2015 17:23
@rade
Copy link
Member

rade commented Jul 2, 2015

btw, I'd prefer if you didn't rebase unless necessary, and especially didn't squash commits - until the very end. Otherwise I'll get bored very quickly following this, since I'd have to re-read the entire diff over and over.

@tomwilkie
Copy link
Contributor Author

Fair enough, I'll stop doing it until the end.

@rade
Copy link
Member

rade commented Jul 3, 2015

Hostname lookups are O(logn), reverse DNS lookups are O(n), all in memory.

What is the complexity of:

  • removing an individual record (e.g. as a result of weave detach)
  • removing all records for a container (e.g. as a result of container death)
  • removing all records for a peer (e.g. as a result of the peer disappearing)
    ?

@tomwilkie
Copy link
Contributor Author

Everything other than lookup-by-hostname is O(n), in memory.

There is a potential optimisation for receive gossip msg, to make merge ~O(logn). This is on the assumption most gossip message are only a few entries. For large ones O(n) is as good as you can get.

@rade
Copy link
Member

rade commented Jul 3, 2015

We should file a follow-on issue to reduce the complexity.

@tomwilkie
Copy link
Contributor Author

This PR also doesn't include rate limiting entry addition, which was part of the original design thoughts. Happy to leave that to another PR?

@rade
Copy link
Member

rade commented Jul 3, 2015

rate limiting entry addition

File an issue for it.

@inercia
Copy link
Contributor

inercia commented Jul 5, 2015

I think this code does not reuse any of our current WeaveDNS code. What is the plan? Are we going to throw all the code we have now?

@rade
Copy link
Member

rade commented Jul 6, 2015

I think there is a race here: I believe it is possible to receive some DNS gossip pertaining to a peer after that peer has been removed from Peers by GC. That would result in permanently stale records.

@tomwilkie
Copy link
Contributor Author

@inercia thats right, its a fresh implementation. Some of the code has been borrowed (mainly the dns.go bits) but they've been rewritten. The API should be the same.

There is no plan, and what to do with the existing code is up to you and @rade. I guess they could coexist for a while if there is a good reason, but in the end I don't see why we would need both.

@inercia
Copy link
Contributor

inercia commented Jul 6, 2015

@tomwilkie I don't see the point in throwing all the code we currently have for DNS, specially when most of the code in our current nameserver is transport agnostic. The gossiping mechanism could be plugged in our current zone database, remove the mDNS code (and maybe the cache), and we could reuse the rest of the weaveDNS code... It is well tested code, and many features and solutions for corner cases are already there (for example, the code in this dns.go is far from implementing all the features we current have in our nameserver).

@tomwilkie
Copy link
Contributor Author

@inercia what features are missing from gossipdns/dns.go?

@inercia
Copy link
Contributor

inercia commented Jul 6, 2015

@tomwilkie Some things...

  • fallback connections should use the same transport used by clients (eg, if the client uses TCP, we should use TCP to fallback servers)
  • you should also handle EDNS buffer sizes
  • you could also prune responses (there is no reason for returning 1000 answers, even when we have them)
  • this server is not tested, and maybe it is not easy to test (eg, things like etcResolvConf should be configurable)
  • you could also optionally cache responses...

@tomwilkie
Copy link
Contributor Author

  • fallback connections should use the same transport used by clients (eg, if the client uses TCP, we should use TCP to fallback servers)

The intention is that they do. Have you spotted a bug?

  • you should also handle EDNS buffer sizes

Whats EDNS?

  • you could also prune responses (there is no reason for returning 1000 answers, even when we have them)

I intentionally don't want to do that.

  • this server is not tested, and maybe it is not easy to test (eg, things like etcResolvConf should be configurable)

dns.go is just a shim onto nameserver.go, which is well tested. Its is also tested by the integration tests. I'm not convinced dns.go and http,go need tests.

  • you could also optionally cache responses...

All authoritative answers are served from memory by design, so no point in caching. All recursive answers are intentionally uncached - they would be doing a network call anyway, so why cache them?

Thanks for taking a look at the code - this is really useful feedback.

@tomwilkie
Copy link
Contributor Author

I think there is a race here: I believe it is possible to receive some DNS gossip pertaining to a peer after that peer has been removed from Peers by GC. That would result in permanently stale records.

@rade under what situation are you thinking? I'm not familiar enough with the gossip / broadcast / peers code. I could add a list of "live" peers to the nameserver, and filter incoming gossips by that. I would need to add a "peer added" callback to peers.go if one doesn't already exist. Then we'd need to make sure we get the peer-added callback before any gossips from the peer. Thoughts?

@rade
Copy link
Member

rade commented Jul 6, 2015

under what situation are you thinking?

There is no connection between the peer topology gossip and nameserver gossip. Especially not when multiple peers are involved. e.g. Peer A could detect B's death and still receive nameserver gossip from B containing records pertaining to C.

So I'd think it can happen quite easily.

we'd need to make sure we get the peer-added callback before any gossips from the peer.

That's just the same problem, in reverse. But it's less of an issue as long as it only goes wrong rarely - period gossip will fill in the blanks.

Also, instead of a callback you could just ask peers whenever adding records.

@tomwilkie
Copy link
Contributor Author

@rade I've added code to filter gossip messages based on the contents of peers. Had to be careful to avoid deadlocks, which meant I had to take the peers lock in a bit of a smelly way. Let me know what you think.

Other than EDNS (what is it?) I think this might be close to done. Who do you want to review this?

// write lock.
if n.peers != nil {
n.peers.RLock()
defer n.peers.RUnlock()

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@tomwilkie tomwilkie changed the title WIP Gossip-based DNS embedded in the router. Gossip-based DNS embedded in the router. Jul 6, 2015
@tomwilkie tomwilkie removed their assignment Jul 6, 2015
peers.onGC = append(peers.onGC, callback)
}

func (peers *Peers) triggerOnGC(removed []*Peer) {

This comment was marked as abuse.

@rade
Copy link
Member

rade commented Jul 6, 2015

How sure are your that moving the onGC callback invocation outside the lock is safe? I am concerned about race conditions where another thread might re-add a peer before the callback has been invoked. I reckon this is ok for the existing onGC callback, the MacCache cleanup, since it's a cache so worst case an incorrect removal results in a performance hit. But is it ok for the new one?

@@ -262,13 +290,15 @@ func determineQuorum(initPeerCountFlag int, peers []string) uint {
return quorum
}

func handleHTTP(router *weave.Router, httpAddr string, allocator *ipam.Allocator, defaultSubnet address.CIDR, docker *docker.Client) {
func handleHTTP(router *weave.Router, httpAddr string, allocator *ipam.Allocator, defaultSubnet address.CIDR, docker *docker.Client, ns *nameserver.Nameserver, dnsserver *nameserver.DNSServer) {

This comment was marked as abuse.

This comment was marked as abuse.

@tomwilkie
Copy link
Contributor Author

Merged coverage report from both integration and unit test: https://circle-artifacts.com/gh/weaveworks/weave/1724/artifacts/0/tmp/circle-artifacts.PPiWyIL/coverage.html

  • dns.go 87.6%
  • entry.go 92.6%
  • http.go 52.0%
  • nameserver.go 96.2%

http.go is dragged down as its a short file with unexercised error handling, and has the GET handler for /name/{hostname} which is currently unused and should be removed (will do that now)

@tomwilkie tomwilkie force-pushed the gossipdns branch 2 times, most recently from da3bcf0 to 6748c8e Compare July 13, 2015 10:16
@tomwilkie
Copy link
Contributor Author

@inercia how are we getting on with this review? Are we getting close to an LGTM?

@tomwilkie tomwilkie force-pushed the gossipdns branch 2 times, most recently from 36ec2c7 to d46301c Compare July 14, 2015 14:42
Tom Wilkie added 3 commits July 14, 2015 16:00
- Nameserver holds a sorted list of (hostname, peer, container id, ip) tuples.
- Entries are gossiped using topology gossip / broadcast - only new/updated entries
  are propogated as expected.
- Entries for a given peer are removed when the connection to the peer is lost.
- Entries are tombstoned when containers die.  Tombstones are kept for 10mins,
  as they do not need to survive partition, only gossip propagation delay.
inercia added a commit that referenced this pull request Jul 14, 2015
Gossip-based DNS embedded in the router.
@inercia inercia merged commit a1ac031 into master Jul 14, 2015
@rade rade deleted the gossipdns branch July 14, 2015 16:34
rade added a commit that referenced this pull request Jul 14, 2015
This wasn't spotted because #1065, which removed launch-dns, got
merged just before #1126, which introduced this test.
rade added a commit that referenced this pull request Jul 14, 2015
These tests were introduced in #1065 and #1126.
@rade rade modified the milestones: current, 1.1.0 Jul 15, 2015
rade added a commit that referenced this pull request Jul 19, 2015
since $CONTAINER gets stomped on by various functions, in particular
populate_dns.

Broken by #1065.
rade added a commit that referenced this pull request Jul 19, 2015
Thus unbreaking it post #1065.
@rade rade mentioned this pull request Jul 19, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants