-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: in the swarm move Connectedness emit after releasing conns #2373
fix: in the swarm move Connectedness emit after releasing conns #2373
Conversation
go-libp2p-kad-dht now listen to both EvtPeerIdentificationCompleted and EvtPeerConnectednessChanged and EvtPeerIdentificationCompleted calls .ConnsToPeer inorder to do some filtering. However it happens that it deadlocks because if the swarm is trying to emit a EvtPeerConnectednessChanged while the subscriber is trying to process an EvtPeerIdentificationCompleted, the subscriber is stuck on s.conns.RLock() while the swarm wont release it before having sent EvtPeerConnectednessChanged. Deadlock ! I havn't confirmed this fixes my bug given this takes time to reproduce, I'll startup a new experiment soon.
c66d5a4
to
cbc2e56
Compare
See: #2373 (comment)
|
Yes, this has happened to me in the wild. |
I agree this should be fixed, just want to get the scenario right.
Does this sound correct? |
@sukunrt what you described is exactly what happens right now because the emit is done while holding conns. |
Great work @Jorropo! I didn't consider how a consumer could try to grab a lock that the publisher has and thus create a deadlock. Publishers should only grab locks to guarantee the ordering of connect/disconnect events. I don't think it needs the swarm.conns lock, though I'll double check the code path. I can help here and add some tests. |
@MarcoPolo I don't have time to finish any of this, feel free to fix ci and add tests thx |
Test added and fixed a related deadlock (on the disconnect side of things). |
Once CI passes I'll merge a patch release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️ (can't approve) nice for the disconnect, I don't know how I overlooked this obvious codepath.
* fix: in the swarm move Connectedness emit after releasing conns go-libp2p-kad-dht now listen to both EvtPeerIdentificationCompleted and EvtPeerConnectednessChanged and EvtPeerIdentificationCompleted calls .ConnsToPeer inorder to do some filtering. However it happens that it deadlocks because if the swarm is trying to emit a EvtPeerConnectednessChanged while the subscriber is trying to process an EvtPeerIdentificationCompleted, the subscriber is stuck on s.conns.RLock() while the swarm wont release it before having sent EvtPeerConnectednessChanged. Deadlock ! I havn't confirmed this fixes my bug given this takes time to reproduce, I'll startup a new experiment soon. * Fix other deadlock and add a test * Make test a little faster * Bind on localhost --------- Co-authored-by: Marco Munizaga <[email protected]>
* fix: in the swarm move Connectedness emit after releasing conns go-libp2p-kad-dht now listen to both EvtPeerIdentificationCompleted and EvtPeerConnectednessChanged and EvtPeerIdentificationCompleted calls .ConnsToPeer inorder to do some filtering. However it happens that it deadlocks because if the swarm is trying to emit a EvtPeerConnectednessChanged while the subscriber is trying to process an EvtPeerIdentificationCompleted, the subscriber is stuck on s.conns.RLock() while the swarm wont release it before having sent EvtPeerConnectednessChanged. Deadlock ! I havn't confirmed this fixes my bug given this takes time to reproduce, I'll startup a new experiment soon. * Fix other deadlock and add a test * Make test a little faster * Bind on localhost --------- Co-authored-by: Marco Munizaga <[email protected]>
* fix: in the swarm move Connectedness emit after releasing conns (#2373) * fix: in the swarm move Connectedness emit after releasing conns go-libp2p-kad-dht now listen to both EvtPeerIdentificationCompleted and EvtPeerConnectednessChanged and EvtPeerIdentificationCompleted calls .ConnsToPeer inorder to do some filtering. However it happens that it deadlocks because if the swarm is trying to emit a EvtPeerConnectednessChanged while the subscriber is trying to process an EvtPeerIdentificationCompleted, the subscriber is stuck on s.conns.RLock() while the swarm wont release it before having sent EvtPeerConnectednessChanged. Deadlock ! I havn't confirmed this fixes my bug given this takes time to reproduce, I'll startup a new experiment soon. * Fix other deadlock and add a test * Make test a little faster * Bind on localhost --------- Co-authored-by: Marco Munizaga <[email protected]> * Release version v0.27.7 * identify: set stream deadlines for Identify and Identify Push streams (#2382) --------- Co-authored-by: Jorropo <[email protected]> Co-authored-by: Marten Seemann <[email protected]>
* fix: in the swarm move Connectedness emit after releasing conns go-libp2p-kad-dht now listen to both EvtPeerIdentificationCompleted and EvtPeerConnectednessChanged and EvtPeerIdentificationCompleted calls .ConnsToPeer inorder to do some filtering. However it happens that it deadlocks because if the swarm is trying to emit a EvtPeerConnectednessChanged while the subscriber is trying to process an EvtPeerIdentificationCompleted, the subscriber is stuck on s.conns.RLock() while the swarm wont release it before having sent EvtPeerConnectednessChanged. Deadlock ! I havn't confirmed this fixes my bug given this takes time to reproduce, I'll startup a new experiment soon. * Fix other deadlock and add a test * Make test a little faster * Bind on localhost --------- Co-authored-by: Marco Munizaga <[email protected]>
go-libp2p-kad-dht now listen to both EvtPeerIdentificationCompleted and EvtPeerConnectednessChanged and EvtPeerIdentificationCompleted calls .ConnsToPeer inorder to do some filtering.
However it happens that it deadlocks because if the swarm is trying to emit a EvtPeerConnectednessChanged while the subscriber is trying to process an EvtPeerIdentificationCompleted, the subscriber is stuck on s.conns.RLock() while the swarm wont release it before having sent EvtPeerConnectednessChanged. Deadlock !
I havn't confirmed this fixes my bug given this takes time to reproduce, I'll startup a new experiment later.