-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix connection tracking race #111
Conversation
Before, we could end up (e.g.): 1. Creating two connections (both sides connect at the same time). 2. Try to test with the first one. 3. The first connection dies. 4. Get a stream reset and think that the other side doesn't support the DHT protocol. We tried to fix this by checking for an EOF. Unfortunately, reset streams don't return EOFs. This commit also simplifies peer tracking (and saves a bit of memory). fixes #99
I'd rather just provide a way to wait on the identify protocol but we need this fixed... |
notif.go
Outdated
p := v.RemotePeer() | ||
protos, err := dht.peerstore.SupportsProtocols(p, dhtProtocols...) | ||
if err == nil && len(protos) != 0 { | ||
dht.plk.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what exactly does this lock protect here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to protect Update -- but that's a public interface function.
It has no business being both public and requiring the lock to be held.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular lock is probably unnecessary because Connect
and Disconnect
notifications are synchronous (although I still want to leave it as it doesn't hurt and I like being consistent). However, we do need to take it below in the Disconnect
handler and in testConnection
. Otherwise, we could end up with the following interleaving:
testConnection | Disconnect |
---|---|
Observe that we are connected | |
Observe that we are disconnected (disconnect event happens) | |
Remove peer from routing table | |
Add peer to routing table |
This is an alternative to reference counting open connections (what we did before) that doesn't require keeping a bunch of additional state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fair enough. Can we add a comment some comments to that effect -- it looks totally out of place otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the effect of calling Update
without this lock? I am concerned about the public interface uses of it.
notif.go
Outdated
// Remember this choice (makes subsequent negotiations faster) | ||
dht.peerstore.AddProtocols(p, selected) | ||
|
||
dht.plk.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here again with the lock!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM; I am still a little concerned about that lock though, as it seems to protect the routingTable.
It doesn't "protect" anything really. It allows us to atomically update the routing table iff we're connected. It binds together the read and a write into an atomic operation. We could, alternatively, have a loop:
And a similar dance in |
Agreed, that's just unnecessarily complex. |
Before, we could end up (e.g.):
We tried to fix this by checking for an EOF. Unfortunately, reset streams don't return EOFs.
This commit also simplifies peer tracking (and saves a bit of memory).
fixes #99