protocols/dcutr: Don't retry when incoming direct connection succeeded #2607

mxinden · 2022-04-06T14:24:53Z

As of today we retry the DCUtR process when the outgoing direct connection upgrade failed.

In the case where the incoming direction connection upgraded succeeded, there is no need for retrying the process.

We should check direct_connections to see whether an (incoming) direct connection succeeded in the meantime.

rust-libp2p/protocols/dcutr/src/behaviour.rs

Lines 152 to 180 in 7a11478

    
           } => { 
        
               let peer_id = 
        
                   peer_id.expect("Peer of `Prototype::DirectConnection` is always known."); 
        
               if attempt < MAX_NUMBER_OF_UPGRADE_ATTEMPTS { 
        
                   self.queued_actions.push_back(ActionBuilder::Connect { 
        
                       peer_id, 
        
                       handler: NotifyHandler::One(relayed_connection_id), 
        
                       attempt: attempt + 1, 
        
                   }); 
        
               } else { 
        
                   self.queued_actions.extend([ 
        
                       NetworkBehaviourAction::NotifyHandler { 
        
                           peer_id, 
        
                           handler: NotifyHandler::One(relayed_connection_id), 
        
                           event: Either::Left( 
        
                               handler::relayed::Command::UpgradeFinishedDontKeepAlive, 
        
                           ), 
        
                       } 
        
                       .into(), 
        
                       NetworkBehaviourAction::GenerateEvent( 
        
                           Event::DirectConnectionUpgradeFailed { 
        
                               remote_peer_id: peer_id, 
        
                               error: UpgradeError::Dial, 
        
                           }, 
        
                       ) 
        
                       .into(), 
        
                   ]); 
        
               } 
        
           }

The text was updated successfully, but these errors were encountered:

esotericbyte · 2022-04-07T19:43:18Z

I have skimmed: https://pdos.csail.mit.edu/papers/p2pnat.pdf
Is the current behavior inefficient or does is lead to more connection failures because the peers pursue different connections?
Is timing of the connections and interplay between the network, queued events, and futures executor a relevant concern?

Implement direct_connections check:
If there is a direct connection then stop trying to connect , and just drop effort and don't inject any events because it would have generated it's own success event)
If there is not then continue to a retry of the process
Finish reading the library code used by dcutr behavior
Test the code and evaluate the solution
Based on testing consider instrumenting with metrics or tracing.

Code reading questions:
The check needs to be made before the upgrade process from a relay connection to direct connection is restarted.
Precisely where should the check go?

Is this just a single code branch?
Keep in mind that rust resolves some of the logic during compile time even if the code is futures code
Are the events created by GenerateEvent futures tasks or a separate event queue?
If separate when they are handled do they create tasks and thus placed on a secondary queue?
What happens if both connections succeed? ( believe the direct_connections is a hash from ids to connections and so it would contain more than one connection on each end for their respective ids. )
If the the incoming direction connection upgrade is in process how many success cases may still be missed even if direct_connections is checked? (tasks are executed in parallel)
What is the latency of various queues? What applications use this code that might be analyzed to understand under various loads? Under various loads and contexts do these queues actually cause significant delays or are the delays from internal queues negligible? If not: Look for a diagram of network messages combined with event & event queue activity, and if there is none then make a diagram.

mxinden · 2022-04-16T14:43:02Z

Is the current behavior inefficient or does is lead to more connection failures because the peers pursue different connections?

Just inefficient.

Is timing of the connections and interplay between the network, queued events, and futures executor a relevant concern?

Probably relevant, though we will be collecting more data soon to back up this point. See https://github.com/dennis-tra/punchr.

If there is a direct connection then stop trying to connect , and just drop effort

Yes, that sounds right.

Precisely where should the check go?

inject_dial_failure should work.

If separate when they are handled do they create tasks and thus placed on a secondary queue?

There is a single task running all NetworkBehaviour and Swarm logic plus one task per connection.

What happens if both connections succeed? ( believe the direct_connections is a hash from ids to connections and so it would contain more than one connection on each end for their respective ids. )

For now, one connection would eventually idle and thus be closed. In the long run we would need libp2p/specs#328.

What is the latency of various queues? What applications use this code that might be analyzed to understand under various loads? Under various loads and contexts do these queues actually cause significant delays or are the delays from internal queues negligible? If not: Look for a diagram of network messages combined with event & event queue activity, and if there is none then make a diagram.

Unfortunately I don't have data on this. Based on intuition and past experience I would guess it is negligible given that network latencies are an order of magnitude higher.

normanade · 2022-10-11T06:22:30Z

Actually, the direct_connections hashmap cannot be reached from outside the Behaviour struct, making the pub struct rather useless.

pub struct Behaviour {
    /// Queue of actions to return when polled.
    queued_actions: VecDeque<ActionBuilder>,

    /// All direct (non-relayed) connections.
    direct_connections: HashMap<PeerId, HashSet<ConnectionId>>,
}

thomaseizinger · 2022-10-11T10:44:35Z

Actually, the direct_connections hashmap cannot be reached from outside the Behaviour struct

No need to access it from the outside. The behaviour gets notified about new incoming connections!

esotericbyte · 2022-11-29T15:44:16Z

Looking back I am beguiled by how to debug multi-node issues like this in libp2p or an app where the network infrastructure like routers is involved. I probably should have asked about it sooner. Are there docs on how to set up automated testing or is it necessary to recruit machines?

mxinden · 2022-12-06T11:34:34Z

@esotericbyte unfortunately we don't have automated testing in place here today. You can follow progress here libp2p/test-plans#21.

thomaseizinger · 2023-10-27T00:46:42Z

Update: We now have automated tests for hole-punching!

mxinden added bug difficulty:moderate help wanted getting-started Issues that can be tackled if you don't know the internals of libp2p very well labels Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

protocols/dcutr: Don't retry when incoming direct connection succeeded #2607

protocols/dcutr: Don't retry when incoming direct connection succeeded #2607

mxinden commented Apr 6, 2022

esotericbyte commented Apr 7, 2022 •

edited

Loading

mxinden commented Apr 16, 2022

normanade commented Oct 11, 2022

thomaseizinger commented Oct 11, 2022

esotericbyte commented Nov 29, 2022 •

edited

Loading

mxinden commented Dec 6, 2022

thomaseizinger commented Oct 27, 2023

protocols/dcutr: Don't retry when incoming direct connection succeeded #2607

protocols/dcutr: Don't retry when incoming direct connection succeeded #2607

Comments

mxinden commented Apr 6, 2022

esotericbyte commented Apr 7, 2022 • edited Loading

mxinden commented Apr 16, 2022

normanade commented Oct 11, 2022

thomaseizinger commented Oct 11, 2022

esotericbyte commented Nov 29, 2022 • edited Loading

mxinden commented Dec 6, 2022

thomaseizinger commented Oct 27, 2023

esotericbyte commented Apr 7, 2022 •

edited

Loading

esotericbyte commented Nov 29, 2022 •

edited

Loading