Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🏆 Golden path scenario] Browser-authored content retrievable by another machine through the ipfs.io gateway directly #182

Open
6 tasks
BigLep opened this issue Jul 13, 2023 · 12 comments
Labels
status/blocked Unable to be worked further until needs are met

Comments

@BigLep
Copy link
Contributor

BigLep commented Jul 13, 2023

Done Criteria

A user can author content in their browser via Helia and have it retrievable by another machine through the ipfs.io gateway without relying on pinning services or preload nodes.

Why Important

This is a common usecase that users hit. Failure here feeds the narrative that "IPFS doesn't just work".

Notes

  1. This builds on [🏆 Golden path scenario] Browser-authored content retrievable through the ipfs.io gateway via a pinning service  #256, which used pinning services. We’re assuming it was completed first before taking this on.
  2. Even though this usecase request has come in (e.g., HackFS 2023), it’s a lower priority because it has the fundamental flaw of relying on a browser tab to stay open/active. This may make sense for a demo, but has serious usability flaws. In practice, we expect most browser apps will want to be more resilient, and for that, they need a way to get data off the browser (e.g., use a pinning service which [🏆 Golden path scenario] Browser-authored content retrievable through the ipfs.io gateway via a pinning service  #256 satisfies).
  3. "retrievability from the ipfs.io gateway" is used as a popular "stand in" for other nodes on the network.
  4. We need to enable discoverability of the content created in the browser so that the ipfs.io gateway can discover it. This requires one or more of:
  5. DCUtR in js-libp2p is needed so the Kubo gateway can run the protocol with the browser node via the relay and instruct the browser to dial one of its public multiaddrs supported from the browser (e.g., WSS, WebTransport). This will have been handled in [🏆 Golden path scenario] Browser-authored content retrievable through the ipfs.io gateway via a pinning service  #256.
  6. The next step here is to allow a private Kubo node (e.g., Kubo running in one's Brave browser) to fetch the content authored in a browser on a separate host. We ultimately need Kubo to support WebRTC since WebRTC is required for browser/private-node connectivity per here. (Kubo tracking issue: Enable WebRTC Transport kubo#9724 ). This should have been completed though by [🏆 Golden path scenario] Browser-authored content retrievable through the ipfs.io gateway via a pinning service  #256.
  7. Per above, this isn't a pure Helia issue. Tracking the usecase needs to go somewhere though, so I'm putting it Helia for now so we can link against it.

Tasks

@aschmahmann
Copy link

This is more of a libp2p issue, and more of a go-libp2p issue

While support for WebRTC will certainly help in some scenarios (i.e. the browser does not support WebTransport and the node fetching the data doesn't have a WSS address with a CA cert) IIUC the main difficulty in getting data from browser helia nodes discoverable by gateways, etc. is data not being advertised to the DHT, IPNI, etc.

Am I mistaken and it turns out advertising small amounts of data to the DHT from a helia browser node is working well enough at the moment (at least for browsers that support WebTransport)?

@SgtPooki
Copy link
Member

SgtPooki commented Jul 13, 2023

@aschmahmann even in a browser that supports WebTransport, i've been having difficulty getting any successful webtransport connections. I just pushed up a repo where I was playing around: https://github.com/SgtPooki/helia-playground -- it was essentially copied from https://codesandbox.io/p/sandbox/helia-script-tag-forked-3q8y35 to a local workspace so i could modify things more easily.

One thing I started seeing was that activeStreams.length never breaches 0 for me, no matter how many peers or how many connections I have. I suspect a bug in libp2p/webtransport but I haven't been able to fully track it down.

I want to create a simple test where a browser helia node can successfully talk to a backend helia node, but that will have to wait for a bit.


ninja-edit:

Also, there seems to be a non-stop spamming of webtransport dial attempts.. and i'm not sure how best to control that with libp2p-connection-manager.

@BigLep
Copy link
Contributor Author

BigLep commented Jul 13, 2023

@aschmahmann : good callouts - thanks.


Let's assume:

  1. the ipfs.io Kubo node has maximum connectivity possible today with WebTransport and WSS address with a CA cert
  2. the browser authoring the data supports WebTransport and WebSockets
  3. the ipfs.io Kubo node discovered the multiaddr of the browser that authored the data

How does this ipfs.io Kubo node retrieve the data from the browser node? My understanding is that it still can't initiate a connection to the browser in this scenario and this scenario would only work if there was a preexisting connection between the browser node and the ipfs.io Kubo node.


Also, I expanded the "Notes" section in the top description to further expand on the underlying issues:

  1. Underlying issue 1: discoverability of the content created in the browser so that the ipfs.io gateway can discover it. This requires one or more of:
  2. Underlying issue 2: libp2p connectivity, especially go-libp2p connectivity, since we ultimately need Kubo to support WebRTC since WebRTC is required for server nodes to dial browsers. (Kubo tracking issue: Enable WebRTC Transport kubo#9724 ).

Please go ahead and fix/correct any mistakes here.


Thank you!

@aschmahmann
Copy link

aschmahmann commented Jul 14, 2023

How does this ipfs.io Kubo node retrieve the data from the browser node? My understanding is that it still can't initiate a connection to the browser in this scenario and this scenario would only work if there was a preexisting connection between the browser node and the ipfs.io Kubo node.

Yeah, that's right good callout. I had assumed there was some level of support for DCuTR in js-libp2p that came along the relay-v2 support. With the simplest DCuTR support (dialbacks) what would happen is that the helia node would connect to a (limited) relay-v2 node that speaks some protocol the helia node can speak (e.g. WSS, WebTransport, etc.) and they would then have as their address /the/multiaddr/of/the/relay/circuit-relay/p2p/helia-node-peerID which when a publicly reachable node (e.g. the ipfs.io kubo nodes) wanted to contact the helia it would ask the relay to have the helia node dial it back (using WSS, WebTransport, etc.).

This doesn't require any holepunching kinds of magic, just a simple relay + the dialback portion of the DCUtR protocol.

Seems like it might be worth scoping this as a smaller and more important set of work in libp2p/js-libp2p#1460.

@SgtPooki
Copy link
Member

SgtPooki commented Jul 27, 2023

Notes from Helia WG 2023-07-27

  • We need to make sure Gateways support webrtc
    • either go-libp2p + kubo/boxo need to support webrtc
    • OR js-libp2p on Node needs to support webrtc + we need to stand up a JS backed gateway?
      • webrtc in NodeJS is not supported currently.
  • We need DCUtR in js-libp2p so the gateway can run the protocol with the browser node via the relay and instruct the browser to dial it
    • the gateway and the relay need a common transport (e.g. tcp, quic, etc)
    • the browser needs to be able to dial the gateway (e.g. wss or webtransport)
    • the gateway does not need to be able to dial the browser (e.g. WebRTC)

@achingbrain
Copy link
Member

DCUtR for js-libp2p is in progress here: libp2p/js-libp2p#1928

@SgtPooki
Copy link
Member

SgtPooki commented Aug 7, 2023

Note that the libp2p hole-punching vision table also illustrates the problem here fairly well: https://github.com/libp2p/specs/blob/d2106f43e878ae4c3a1c6465a7c329835290fe22/connections/hole-punching.md#vision

@BigLep
Copy link
Contributor Author

BigLep commented Aug 8, 2023

It's great that progress is happening here.

Folks have correctly pointed out that for the stated usecase of Kubo ipfs.io gateway retrieving content from the browser that go-libp2p WebRTC isn't needed. We only need js-libp2p DCUTR. That's great, and I agree that should be the first usecase.

That said, I don't want to let up there since the ultimate is "universal connectivity". The next step here is to allow a private Kubo node (e.g., Kubo running in one's Brave browser) to fetch the content authored in the browser. For this we ultimately need Kubo to support WebRTC since WebRTC is required for browser/private-node connectivity per here. This can come after, but I have updated the issue notes to be accurate and to discuss this followup step.

@SgtPooki SgtPooki changed the title [🏆 Golden path scenario] Browser-authored contend retrievable by another machine through the ipfs.io gateway directly [🏆 Golden path scenario] Browser-authored content retrievable by another machine through the ipfs.io gateway directly Aug 10, 2023
@achingbrain
Copy link
Member

achingbrain commented Aug 10, 2023

I've been doing a bit of investigation, what I've found is:

  1. Browser connections are unstable
    • This causes remotes to drop connections, including relay connections
  2. This can cause relay addresses to change as new relays are found
  3. Publishing DHT provider records does not always succeed
    • This is because the ADD_PROVIDER query frequently traverses through nodes it can't dial
    • This will improve as more of the network supports webtransport and webrtc
  4. Even when ADD_PROVIDER succeeds, Kubo nodes (my local one at least) can't always resolve the record
  5. Kubo nodes (my local one at least) can't always look up browser nodes in the DHT
    • This could be because private Kubo DHT nodes can't dial browsers yet to ping them so they are evicting them from their routing tables?
    • Also if the relay addresses for the browser peer changes Kubo DHT nodes won't be able to re-dial them?

Browser CPU usage is very high, this may contribute to 1. 2. is quite concerning because if the relay address changes the published provider records then have out of date multiaddrs.

Right now I think in the circuit relay code if a relay connection is lost we assume the relay is bad and we start to search for new relays, but we may need to assume that we are bad and make some sort of attempt to reconnect, if that fails then start searching for others.

Until adoption of webtransport improves, we may need some sort of web service that can publish provider records on behalf of the browser? But ones where the browser is the provider, not the web service so is slightly different to the delegated content routing strategy we used to use.

Also found a few other weird bits and pieces

@BigLep
Copy link
Contributor Author

BigLep commented Aug 10, 2023

@achingbrain

I've been doing a bit of investigation,

Thanks - good write up!

(For others to be aware) per 2023-08-10 Helia Working Group, I don't think it's not worth the investment right now to focus on writing provider records directly to the public IPFS DHT from the browser. We'll instead rely on solving the write-side of "Underlying issue 1: discoverability of the content created in the browser so that the ipfs.io gateway can discover it" through to-be-created/updated delegated routing endpoint. Kubo/Boxo maintainers are aware of the priority of this work and are taking it on now as they finish up the read side of HTTP /routing/v1.

Also, it sounds like you have a test setup (awesome). I assume we're going to need this throughout the golden path development. If there is anything to document or check in to help others in testing or verifying their work, please share.

I have updated the task list in the issue description with everything I'm aware of that needs to be done along the different tracks:

  1. Tasks for underlying issue 1: discoverability of the content created in the browser
  2. Tasks for underlying issue 2 / libp2p connectivity part 1: Kubo gateway can instruct the browser to dial one of its public multiaddrs supported from the browser
  3. Tasks for underlying issue 3 / libp2p connectivity part 2: Kubo supports for browser/private-node connectivity

Thanks also for the fixes along the way - good stuff!

@BigLep
Copy link
Contributor Author

BigLep commented Sep 7, 2023

I has morphed this golden path issue to be scoped to retrievability of browser authored content without relying on pinning services (i.e., as long as one's browser tab is open).

For retrievability of browser-authored content, we're going to focus first on relying on pinning services: #256

That said, the top priority is reliable browser retrieval of any content. This is happening in #255 . This is the top "golden path scenario" focus.

@whizzzkid whizzzkid added the status/blocked Unable to be worked further until needs are met label Sep 7, 2023
@achingbrain
Copy link
Member

achingbrain commented Sep 18, 2023

Browser connections are unstable
This causes remotes to drop connections, including relay connections

In recent releases this is much improved:

image

keiner5212 added a commit to keiner5212/heliajs-implementation-react that referenced this issue Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/blocked Unable to be worked further until needs are met
Projects
No open projects
Development

No branches or pull requests

5 participants