-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Project Flare aka. decentralised hole punching #21
base: main
Are you sure you want to change the base?
Conversation
Fixed markdown formatting
Fixed bug in formatting
proposals/Hole Punching.md
Outdated
|
||
Also, based on anecdotal evidence in the wild and engineering war stories (see [Tailscales’s blog](https://tailscale.com/blog/how-nat-traversal-works/)), **Cone NATs are much more pervasive in Home ISPs over Symmetric NATs, which justifies the~60-80% success for Hole Punching**. \ | ||
\ | ||
Given that libp2p is a library to build peer to peer applications,**implementing Hole Punching in libp2p will allow any application/network that builds on top of libp2p to also benefit from Hole Punching**. This will be a significant step forward in the ease of building well connected p2p networks and will be a major added motivation for developers to build on top of our stack. Based on our research, **no such library that comes with out of the box NAT traversal via Hole Punching exists out in the wild today and we have the opportunity to provide the first such library & infra to herald the age of better connectivity in Web3 apps**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✊ yes!
### Technical deliverables | ||
|
||
* **Phase1** | ||
- ~90% hole punching success if both peers are behind a Cone NAT in the first dog-fooding phase using PL hosted Limited Relays. | ||
* **Phase 2** | ||
- Optimise based on Dogfooding results and metrics and ship the feature using statically configured PL hosted Limited Relays. | ||
* **Phase 3** | ||
- Once we conclude that PL hosted Limited Relays are stable, ship a release that turns on the Limited Relay protocol in public DHT servers. | ||
* **Phase 4** | ||
- Once ~30% of public DHT servers upgrade to support the Limited Relay protocol (measure using Hydra Boosters), ship automated discovery & use of Limited Relays to coordinate a hole-punch rather than using statically configured Limited Relays servers. | ||
* **Phase 5** | ||
- ~90% hole punching success if both peers are behind a Cone NAT in the second dog-fooding phase that uses AutoRelay to discover and connect to Limited Relays in the wild rather than using statically configured Limited Relays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided some private feedback on this wording, but just to summarise, this roadmap reads like a mixture of goals and tasks. It's not entirely clear what needs to be done for each milestone to be done. Consider wording these as task lists, or definition of done (deliverables), but not as a mixture of abstract goals and high-level tasks.
### Success criteria | ||
|
||
* Dog-fooding phases deliver ~90% success for labbers using Home ISPs with Cone NATs (TCP & QUIC). | ||
* No bugs related to Hole Punching failures if both peers involved in a Hole Punch are behind a Cone NAT (we have good PRs for and will ship code/tools for users to detect their NAT type). | ||
* Users do not file bug reports about their public DHT peers getting DDosed/consuming too much bandwidth/resources because of acting as Limited Relays. | ||
* We receive great traction and feedback on the ease of use and robustness of Hole Punching on channels such as Twitter, user surveys and from our community of users/partners. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should fold the success criteria into each milestone.
Also, based on anecdotal evidence in the wild and engineering war stories (see [Tailscales’s blog](https://tailscale.com/blog/how-nat-traversal-works/)), **Cone NATs are much more pervasive in Home ISPs over Symmetric NATs, which justifies the~60-80% success for Hole Punching**. | ||
|
||
Given that libp2p is a library to build peer to peer applications,**implementing Hole Punching in libp2p will allow any application/network that builds on top of libp2p to also benefit from Hole Punching**. This will be a significant step forward in the ease of building well connected p2p networks and will be a major added motivation for developers to build on top of our stack. Based on our research, **no such library that comes with out of the box NAT traversal via Hole Punching exists out in the wild today and we have the opportunity to provide the first such library & infra to herald the age of better connectivity in Web3 apps**. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gozala Does HyperSwarm use decentralised signalling ? How does it signal/co-ordinate hole punching ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I do not know how it goes about it. Reading the description of it (quoted below) makes me think that bootstrap nodes facilitate this.
If your IP and port is consistent across the bootstrap nodes holepunching usually works.
We could try to ask @mafintosh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly, it uses webrtc ICE/STUN.
* Libp2p will be one of the first libraries in the web3 ecosystem that provides Hole Punching and hence better connectivity out of the box. This will increase the functionality of our stack, and will encourage more developers to build on top of it. | ||
* PL’s applications such as Filecoin & IPFS will get turbocharged with better connectivity as they build on top of libp2p. | ||
* Important projects such as Eth2, 0x etc that build on top of libp2p will ALSO get this huge benefit should they choose to use it. Eth2 is likely to need this feature in Phase 2, which introduces browser-based light clients. | ||
* New browser-centric use cases will be possible when this functionality is implemented in js-libp2p. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be elaborated a bit more, I am not sure I understand what are those use cases or how hole punching would make browser nodes dialable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that JS peers can't connect to go peers directly right now, shipping the Limited Relay based decentralised signalling infra is the first step in implementing a WebRTC transport in go & js libp2p where signalling does NOT rely on centralised STAR servers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"JS peers can't connect to go peers directly right now"
The root reason for this is that go-ipfs
does not have the websockets
transport by default. Also these nodes are not usually reachable with a SSL+DNS multiaddr, which is essential for the browser.
Unless we plan to have webRTC
used by default in go-ipfs
, the browser limitations will not be solved.
I remember that Chrome had speed limitations for webRTC
, which made websockets a faster option. @Gozala do you know if this is still the case? If this is the case, I think it would be better to focus on a solution for generating certificates instead of webRTC
in go. (and of course enable websockets by default in go)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember that Chrome had speed limitations for
webRTC
, which made websockets a faster option. @Gozala do you know if this is still the case? If this is the case, I think it would be better to focus on a solution for generating certificates instead ofwebRTC
in go. (and of course enable websockets by default in go)
I do not know what speed limitations chrome or other browsers put on WebRTC, but there are problems with WebRTC beyond that:
- WebRTC is not implemented in workers threads.
- This makes building responsive apps a lot more challenging given all the other worker that is happening on main thread
- Prevents IPFS node sharing a node across contexts like tabs or iframes.
- I have anecdotal reports that in practice WebRTC is impractical without TURN servers, suggesting that often times data is relayed anyway.
- Again anecdotal, evidence suggests that WebRTC seems to cause significant CPU load
All the above combined often leads teams to pivot towards WebSocket based solutions. I also heard teams reporting reduced bills when operating WebSocket based relay as opposed to TURN servers.
Please take all this with a grain of last, because I have not seen any comprehensive studies to support anecdotal evidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here is some data regarding ICE success rate https://medium.com/the-making-of-whereby/so-i-read-that-20-of-webrtc-calls-fail-67b185e49765
* Important projects such as Eth2, 0x etc that build on top of libp2p will ALSO get this huge benefit should they choose to use it. Eth2 is likely to need this feature in Phase 2, which introduces browser-based light clients. | ||
* New browser-centric use cases will be possible when this functionality is implemented in js-libp2p. | ||
|
||
**Summary: The idea of a peer to peer library that makes NAT traversal via Hole Punching easy and pervasive is a very important & exciting development in the world of peer to peer applications.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go as far as claim that there is a general assumption that p2p networking library would come with Nat traversal & Hole Punching built-in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. We were surprised when it didn't come delivered "out of the box". In fact, initially we built a publicly reachable "mailbox" solution that leveraged polling. Obviously terrible at scale for both transit and storage.
|
||
_How much would nailing this project improve our knowledge and ability to execute future projects?_ | ||
|
||
🎯🎯 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think higher score is warranted here
🎯🎯 | |
🎯🎯🎯 |
* No bugs related to Hole Punching failures if both peers involved in a Hole Punch are behind a Cone NAT (we have good PRs for and will ship code/tools for users to detect their NAT type). | ||
* Users do not file bug reports about their public DHT peers getting DDosed/consuming too much bandwidth/resources because of acting as Limited Relays. | ||
* We receive great traction and feedback on the ease of use and robustness of Hole Punching on channels such as Twitter, user surveys and from our community of users/partners. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to add a criteria such as "IPFS node is able to tell if it can be dialed (hole punched to), combined with some troubleshooting interface that can guide a node operator in terms of what to do to make node reachable"
|
||
### Background | ||
|
||
Given the pervasiveness of IPV4 peers that are behind NATs on the internet, NAT traversal is an essential requirement for a peer to peer application. The inability to traverse around NATs means that such NATT’d peers are NOT reachable on the network and are thus unable to provide any meaningful service to the network, nor interact with network participants under protocol patterns that require inbound connections (e.g. dialbacks). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the pervasiveness of IPV4 peers that are behind NATs on the internet, NAT traversal is an essential requirement for a peer to peer application. The inability to traverse around NATs means that such NATT’d peers are NOT reachable on the network and are thus unable to provide any meaningful service to the network, nor interact with network participants under protocol patterns that require inbound connections (e.g. dialbacks). | |
Given the pervasiveness of IPv4 peers that are behind NATs on the internet, NAT traversal is an essential requirement for a peer to peer application. The inability to traverse NATs means that such NATT’d peers are NOT reachable on the network and are thus unable to provide any meaningful service to the network, nor interact with network participants under protocol patterns that require inbound connections (e.g. dialbacks). |
There are still some unaddressed comments from @raulk and others before we can move towards assignment |
Adding to the above, does this project include writing specifications? If I am not mistaken specs are missing for circuit relay v2, the signaling protocol (though there is libp2p/specs#173) and AutoNAT. |
Tracking issue in libp2p libp2p/go-libp2p#1039 |
|
||
* Applications that build on top of our stack want peers to be _directly_ reachable from the network even though they are behind a NAT (~80% peers in the current DHT network). | ||
* PL is not willing to keep funding expensive bandwidth-unrestricted Relay servers as the network keeps growing to enable data transfer to/from NATT’d peers. | ||
* Users would love to use our p2p stack if doing so means the applications they build get NAT traversal via Hole Punching out of the box. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope we are not too late to the party, but can definitely agree from the perspective of downstream consumers of libp2p-rust, that not only is hole-punchy NAT traversal the ultimate goal, but the point above about the cost of running full relays also means that, since there is no incentive to do so, such holistic systems are likely to suffer from centralisation concerns.
|
||
### Alternatives | ||
|
||
Maybe implementing a WebRTC transport in go-libp2p that performs signalling/co-ordination via Limited Relay servers can help solve the connectivity problems that hole punching seeks to address but that means that we get tied to using WebRTC as a transport. Compared to that, implementing hole punching as a first class feature in Libp2p makes the whole feature transport agnostic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to raise another point here, webrtc is not available in all webviews, and even though there is promising work being done on the rust front with eg https://webrtc.rs - projects that build atop of Tauri are very likely to prefer to use a wss connection.
Similarly, there are privacy concerns since not everyone will run their own STUN services and then probably use something like a public google service.
See: tauri-apps/wry#85
NAT traversal in libp2p via Hole Punching with Limited Relays