Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: AutoNAT v2 #503

Open
marten-seemann opened this issue Jan 12, 2023 · 9 comments · May be fixed by #538
Open

Proposal: AutoNAT v2 #503

marten-seemann opened this issue Jan 12, 2023 · 9 comments · May be fixed by #538

Comments

@marten-seemann
Copy link
Contributor

This proposal has been around for the longest time, but it’s living inside this go-libp2p issue: libp2p/go-libp2p#1480 (comment).
Writing this up here now, so we have all the protocol improvements in one place.

Protocol Description

Compared to AutoNAT v1, AutoNAT v2 should provide the following features:

  1. It should be a protocol to test the reachability of particular addresses (not list of addresses). This allows nodes to use AutoNAT to test address candidates (derived from different sources, e.g. Identify, circuit addresses etc.) for their reachability. The idea is to enable building of an address pipeline, where address candidates can go from “suspected” to “confirmed” listen address (with occasional re-confirmations of already confirmed addresses).
  2. It makes it more difficult to lie to us. Now this only works in one direction: we can have a node prove that a connection attempt succeeded. The other direction isn’t provable: a node can always claim that it tried to dial us, but that dialing failed. This can be achieved by asking the node to provide a certain identifier (random string / number), and then wait for an incoming connection (on the right transport, we won’t be able to distinguish between different addresses on the same transport) where this identifier has to be presented.

(2) definitely is a breaking a protocol change, and (1) in practice is as well, because old go-libp2p versions would return a DIAL_FAILED instead of a DIAL_REFUSED error if a transport was unsupported.

Rollout Strategy

AutoNAT v2 is only useful if a sufficient number of nodes support the protocol. During a transition period, nodes will need to support both versions (at least on the client side). Once a large enough fraction of the network (20%? tbd) supports AutoNAT v2, we can start disabling support for v1.

@elenaf9
Copy link
Contributor

elenaf9 commented Jan 12, 2023

Thanks for moving the discussion here @marten-seemann.
Concerning (2): I know that in go-libp2p AutoNAT servers uses a background host for the dial back and thus the PeerId is unknown. Could you explain why it is not possible in go-libp2p to do the dial as a "normal" dial with the server's PeerId?
That's how it's done in rust-libp2p. It would avoid the need of the additional nonce, since the client could simply use the PeerId to check if it received an inbound connection.

@marten-seemann
Copy link
Contributor Author

How does rust-libp2p handle dial-back requests for the same TCP addressed received over a TCP connection? You won't be able to establish two TCP connections for the same 4-tuple when using SO_REUSEPORT. Do you special-case this?

@Menduist
Copy link
Contributor

Can't speak for rust, but in Nim we only enable REUSEADDR for hole-punching. For regular connections, a random port is used

@mxinden
Copy link
Member

mxinden commented Jan 13, 2023

but in Nim we only enable REUSEADDR for hole-punching. For regular connections, a random port is used

For what it is worth, cross-referencing #389 here.

@marten-seemann
Copy link
Contributor Author

but in Nim we only enable REUSEADDR for hole-punching. For regular connections, a random port is used

For what it is worth, cross-referencing #389 here.

@mxinden Is that what rust-libp2p does?

If not, I'd expect high dial-back failure rates reported from rust-libp2p nodes (at least those that don't have QUIC enabled), namely for nodes that only have a single IP address.

@mxinden
Copy link
Member

mxinden commented Jan 16, 2023

but in Nim we only enable REUSEADDR for hole-punching. For regular connections, a random port is used

For what it is worth, cross-referencing #389 here.

@mxinden Is that what rust-libp2p does?

rust-libp2p does not implement #389, in other words rust-libp2p currently either reuses the same socket for all or none of the connections.

If not, I'd expect high dial-back failure rates reported from rust-libp2p nodes (at least those that don't have QUIC enabled), namely for nodes that only have a single IP address.

Good point. For AutoNAT 2 I would favor adding support for non-reuse-dial (similar to #389) in rust-libp2p, instead of spawning a new host (aka. Swarm in rust-libp2p) with a new identity.

Concerning (2): I know that in go-libp2p AutoNAT servers uses a background host for the dial back and thus the PeerId is unknown. Could you explain why it is not possible in go-libp2p to do the dial as a "normal" dial with the server's PeerId?
That's how it's done in rust-libp2p. It would avoid the need of the additional nonce, since the client could simply use the PeerId to check if it received an inbound connection.

Even if we require the identity of the remote AutoNAT server not to change, the nonce is still useful to make matching AutoNAT requests to incoming dials unambiguous.

@marten-seemann
Copy link
Contributor Author

A question that often comes up in discussions about this proposal is the following:

Why not use the AutoNAT protobuf and just send a single address?

Unfortunately, this doesn't work for two reasons:

  1. Pre v0.20.0, go-libp2p nodes reported E_DIAL_ERROR instead of E_DIAL_REFUSED due to a bug in the code, as mentioned above.
  2. When performing the dial-back, nodes not only dial the addresses from the protobuf, but in addition also the observed address of the connection on which the dial-back request was received (see here).

By now, (1) is not a big problem any more, since the fraction of old nodes has dropped to 10-15%. In addition, we could filter out old nodes by their reported agent version.

I can't however think of any way to circumvent the false positives / negatives introduced by(2).

@achingbrain
Copy link
Member

instead of making this v2, it would be great if we could change the name of the protocol since people read "AutoNAT" and think "NAT hole punching implementation", which comes up in support threads over and over again.

Since this is a breaking change, maybe it's a good opportunity?

Something like:

  • Public Address Verification
  • Routable Address Finder
  • Am I reachable
  • Holla Back
  • ...more suggestions wanted

@MarcoPolo
Copy link
Contributor

I think that makes sense. Maybe we can come up with a fun acronym to follow the spirit of STUN, TURN, ICE, SIP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triage
Development

Successfully merging a pull request may close this issue.

6 participants