Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Flare aka. decentralised hole punching #21

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

aarshkshah1992
Copy link

@aarshkshah1992 aarshkshah1992 commented Feb 17, 2021

NAT traversal in libp2p via Hole Punching with Limited Relays

@aarshkshah1992 aarshkshah1992 changed the title [WIP] Hole punching project pitch libp2p Hole punching project pitch Feb 17, 2021
@aarshkshah1992 aarshkshah1992 changed the title libp2p Hole punching project pitch Hole punching project pitch Feb 17, 2021
@aarshkshah1992 aarshkshah1992 changed the title Hole punching project pitch NAT traversal in libp2p via Hole Punching with Limited Relays Feb 17, 2021
@aarshkshah1992 aarshkshah1992 changed the title NAT traversal in libp2p via Hole Punching with Limited Relays Project Flare Pitch Feb 17, 2021

Also, based on anecdotal evidence in the wild and engineering war stories (see [Tailscales’s blog](https://tailscale.com/blog/how-nat-traversal-works/)), **Cone NATs are much more pervasive in Home ISPs over Symmetric NATs, which justifies the~60-80% success for Hole Punching**. \
\
Given that libp2p is a library to build peer to peer applications,**implementing Hole Punching in libp2p will allow any application/network that builds on top of libp2p to also benefit from Hole Punching**. This will be a significant step forward in the ease of building well connected p2p networks and will be a major added motivation for developers to build on top of our stack. Based on our research, **no such library that comes with out of the box NAT traversal via Hole Punching exists out in the wild today and we have the opportunity to provide the first such library & infra to herald the age of better connectivity in Web3 apps**.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✊ yes!

@raulk raulk changed the title Project Flare Pitch Project Flare aka. decentralised hole punching Feb 17, 2021
Comment on lines +95 to +106
### Technical deliverables

* **Phase1**
- ~90% hole punching success if both peers are behind a Cone NAT in the first dog-fooding phase using PL hosted Limited Relays.
* **Phase 2**
- Optimise based on Dogfooding results and metrics and ship the feature using statically configured PL hosted Limited Relays.
* **Phase 3**
- Once we conclude that PL hosted Limited Relays are stable, ship a release that turns on the Limited Relay protocol in public DHT servers.
* **Phase 4**
- Once ~30% of public DHT servers upgrade to support the Limited Relay protocol (measure using Hydra Boosters), ship automated discovery & use of Limited Relays to coordinate a hole-punch rather than using statically configured Limited Relays servers.
* **Phase 5**
- ~90% hole punching success if both peers are behind a Cone NAT in the second dog-fooding phase that uses AutoRelay to discover and connect to Limited Relays in the wild rather than using statically configured Limited Relays.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provided some private feedback on this wording, but just to summarise, this roadmap reads like a mixture of goals and tasks. It's not entirely clear what needs to be done for each milestone to be done. Consider wording these as task lists, or definition of done (deliverables), but not as a mixture of abstract goals and high-level tasks.

Comment on lines +108 to +113
### Success criteria

* Dog-fooding phases deliver ~90% success for labbers using Home ISPs with Cone NATs (TCP & QUIC).
* No bugs related to Hole Punching failures if both peers involved in a Hole Punch are behind a Cone NAT (we have good PRs for and will ship code/tools for users to detect their NAT type).
* Users do not file bug reports about their public DHT peers getting DDosed/consuming too much bandwidth/resources because of acting as Limited Relays.
* We receive great traction and feedback on the ease of use and robustness of Hole Punching on channels such as Twitter, user surveys and from our community of users/partners.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fold the success criteria into each milestone.

Also, based on anecdotal evidence in the wild and engineering war stories (see [Tailscales’s blog](https://tailscale.com/blog/how-nat-traversal-works/)), **Cone NATs are much more pervasive in Home ISPs over Symmetric NATs, which justifies the~60-80% success for Hole Punching**.

Given that libp2p is a library to build peer to peer applications,**implementing Hole Punching in libp2p will allow any application/network that builds on top of libp2p to also benefit from Hole Punching**. This will be a significant step forward in the ease of building well connected p2p networks and will be a major added motivation for developers to build on top of our stack. Based on our research, **no such library that comes with out of the box NAT traversal via Hole Punching exists out in the wild today and we have the opportunity to provide the first such library & infra to herald the age of better connectivity in Web3 apps**.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gozala Does HyperSwarm use decentralised signalling ? How does it signal/co-ordinate hole punching ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I do not know how it goes about it. Reading the description of it (quoted below) makes me think that bootstrap nodes facilitate this.

If your IP and port is consistent across the bootstrap nodes holepunching usually works.

We could try to ask @mafintosh

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly, it uses webrtc ICE/STUN.

* Libp2p will be one of the first libraries in the web3 ecosystem that provides Hole Punching and hence better connectivity out of the box. This will increase the functionality of our stack, and will encourage more developers to build on top of it.
* PL’s applications such as Filecoin & IPFS will get turbocharged with better connectivity as they build on top of libp2p.
* Important projects such as Eth2, 0x etc that build on top of libp2p will ALSO get this huge benefit should they choose to use it. Eth2 is likely to need this feature in Phase 2, which introduces browser-based light clients.
* New browser-centric use cases will be possible when this functionality is implemented in js-libp2p.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be elaborated a bit more, I am not sure I understand what are those use cases or how hole punching would make browser nodes dialable.

Copy link
Author

@aarshkshah1992 aarshkshah1992 Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gozala

Given that JS peers can't connect to go peers directly right now, shipping the Limited Relay based decentralised signalling infra is the first step in implementing a WebRTC transport in go & js libp2p where signalling does NOT rely on centralised STAR servers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"JS peers can't connect to go peers directly right now"

The root reason for this is that go-ipfs does not have the websockets transport by default. Also these nodes are not usually reachable with a SSL+DNS multiaddr, which is essential for the browser.

Unless we plan to have webRTC used by default in go-ipfs, the browser limitations will not be solved.

I remember that Chrome had speed limitations for webRTC, which made websockets a faster option. @Gozala do you know if this is still the case? If this is the case, I think it would be better to focus on a solution for generating certificates instead of webRTC in go. (and of course enable websockets by default in go)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that Chrome had speed limitations for webRTC, which made websockets a faster option. @Gozala do you know if this is still the case? If this is the case, I think it would be better to focus on a solution for generating certificates instead of webRTC in go. (and of course enable websockets by default in go)

I do not know what speed limitations chrome or other browsers put on WebRTC, but there are problems with WebRTC beyond that:

  1. WebRTC is not implemented in workers threads.
    • This makes building responsive apps a lot more challenging given all the other worker that is happening on main thread
    • Prevents IPFS node sharing a node across contexts like tabs or iframes.
  2. I have anecdotal reports that in practice WebRTC is impractical without TURN servers, suggesting that often times data is relayed anyway.
  3. Again anecdotal, evidence suggests that WebRTC seems to cause significant CPU load

All the above combined often leads teams to pivot towards WebSocket based solutions. I also heard teams reporting reduced bills when operating WebSocket based relay as opposed to TURN servers.

Please take all this with a grain of last, because I have not seen any comprehensive studies to support anecdotal evidence.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* Important projects such as Eth2, 0x etc that build on top of libp2p will ALSO get this huge benefit should they choose to use it. Eth2 is likely to need this feature in Phase 2, which introduces browser-based light clients.
* New browser-centric use cases will be possible when this functionality is implemented in js-libp2p.

**Summary: The idea of a peer to peer library that makes NAT traversal via Hole Punching easy and pervasive is a very important & exciting development in the world of peer to peer applications.**
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go as far as claim that there is a general assumption that p2p networking library would come with Nat traversal & Hole Punching built-in.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. We were surprised when it didn't come delivered "out of the box". In fact, initially we built a publicly reachable "mailbox" solution that leveraged polling. Obviously terrible at scale for both transit and storage.


_How much would nailing this project improve our knowledge and ability to execute future projects?_

🎯🎯
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think higher score is warranted here

Suggested change
🎯🎯
🎯🎯🎯

* No bugs related to Hole Punching failures if both peers involved in a Hole Punch are behind a Cone NAT (we have good PRs for and will ship code/tools for users to detect their NAT type).
* Users do not file bug reports about their public DHT peers getting DDosed/consuming too much bandwidth/resources because of acting as Limited Relays.
* We receive great traction and feedback on the ease of use and robustness of Hole Punching on channels such as Twitter, user surveys and from our community of users/partners.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add a criteria such as "IPFS node is able to tell if it can be dialed (hole punched to), combined with some troubleshooting interface that can guide a node operator in terms of what to do to make node reachable"


### Background

Given the pervasiveness of IPV4 peers that are behind NATs on the internet, NAT traversal is an essential requirement for a peer to peer application. The inability to traverse around NATs means that such NATT’d peers are NOT reachable on the network and are thus unable to provide any meaningful service to the network, nor interact with network participants under protocol patterns that require inbound connections (e.g. dialbacks).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Given the pervasiveness of IPV4 peers that are behind NATs on the internet, NAT traversal is an essential requirement for a peer to peer application. The inability to traverse around NATs means that such NATT’d peers are NOT reachable on the network and are thus unable to provide any meaningful service to the network, nor interact with network participants under protocol patterns that require inbound connections (e.g. dialbacks).
Given the pervasiveness of IPv4 peers that are behind NATs on the internet, NAT traversal is an essential requirement for a peer to peer application. The inability to traverse NATs means that such NATT’d peers are NOT reachable on the network and are thus unable to provide any meaningful service to the network, nor interact with network participants under protocol patterns that require inbound connections (e.g. dialbacks).

@mikeal mikeal added confidence:low Confidence rating is 5 or below. ease:low Ease rating is 5 or below. impact:high Impact rating is 6 or above. labels Mar 24, 2021
@mikeal
Copy link
Contributor

mikeal commented Mar 24, 2021

There are still some unaddressed comments from @raulk and others before we can move towards assignment

@aschmahmann
Copy link
Contributor

@momack2 @raulk I noticed that this project was moved into the Completed tab. Is it done? i.e. what needs to happen for me to ship a release of go-ipfs where a peer behind a NAT can pull data from another peer behind a NAT?

@mxinden
Copy link

mxinden commented Apr 6, 2021

@momack2 @raulk I noticed that this project was moved into the Completed tab. Is it done? i.e. what needs to happen for me to ship a release of go-ipfs where a peer behind a NAT can pull data from another peer behind a NAT?

Adding to the above, does this project include writing specifications? If I am not mistaken specs are missing for circuit relay v2, the signaling protocol (though there is libp2p/specs#173) and AutoNAT.

@jacobheun
Copy link
Contributor

Tracking issue in libp2p libp2p/go-libp2p#1039


* Applications that build on top of our stack want peers to be _directly_ reachable from the network even though they are behind a NAT (~80% peers in the current DHT network).
* PL is not willing to keep funding expensive bandwidth-unrestricted Relay servers as the network keeps growing to enable data transfer to/from NATT’d peers.
* Users would love to use our p2p stack if doing so means the applications they build get NAT traversal via Hole Punching out of the box.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope we are not too late to the party, but can definitely agree from the perspective of downstream consumers of libp2p-rust, that not only is hole-punchy NAT traversal the ultimate goal, but the point above about the cost of running full relays also means that, since there is no incentive to do so, such holistic systems are likely to suffer from centralisation concerns.

iotaledger/stronghold.rs#210


### Alternatives

Maybe implementing a WebRTC transport in go-libp2p that performs signalling/co-ordination via Limited Relay servers can help solve the connectivity problems that hole punching seeks to address but that means that we get tied to using WebRTC as a transport. Compared to that, implementing hole punching as a first class feature in Libp2p makes the whole feature transport agnostic.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to raise another point here, webrtc is not available in all webviews, and even though there is promising work being done on the rust front with eg https://webrtc.rs - projects that build atop of Tauri are very likely to prefer to use a wss connection.

Similarly, there are privacy concerns since not everyone will run their own STUN services and then probably use something like a public google service.

See: tauri-apps/wry#85

@momack2 momack2 added the Steward Priority Stewards priority project due to enabling us to move faster and/or safer. label Jun 28, 2021
@iduartgomez iduartgomez mentioned this pull request Sep 1, 2021
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confidence:low Confidence rating is 5 or below. ease:low Ease rating is 5 or below. impact:high Impact rating is 6 or above. Steward Priority Stewards priority project due to enabling us to move faster and/or safer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.