Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFS to store videos #494

Open
alxlg opened this issue Apr 11, 2018 · 107 comments
Open

IPFS to store videos #494

alxlg opened this issue Apr 11, 2018 · 107 comments
Labels
Component: PeerTube Plugin 📦 Features that can be developed in a plugin, but require PeerTube plugin API development Type: Feature Request ✨

Comments

@alxlg
Copy link

alxlg commented Apr 11, 2018

I think that what limits PeerTube adoption is that instances are perfect for personal/organization use but not to build a free service like YouTube where everyone can upload videos without limits. The issue is that storage has a cost and videos make the necessary storage grow quickly.

IPFS (InterPlanetary File System) can be used to solve the storage issue because every user can store the files by himself but it doesn't have a way to browse and interact with them. PeerTube instead has an awesome UI that can be used by everyone.

Would it be possible to combine PeerTube and IPFS? Ideally the instance administrator would limit the classic upload for each user but eventually let users upload videos by specify an IPFS address. I guess when a second user browse a PeerTube instance and want to watch a video hosted on IPFS, PeerTube provides it by reading from IPFS and not from its local storage. PeerTube instances would cache IPFS contents like IPFS users and admins would monitor IPFS cache impact on their storage. If a PeerTube user wants to be sure its video is available he just have to keep it on IPFS with his machine. This could have another advantage: if the used PeerTube instance won't be available anymore its users won't need to upload videos on other PeerTube instances if they are on IPFS: they would just "upload" the IPFS addresses.

I will be grateful to those who answer by denying or confirming my assumptions.

@rigelk
Copy link
Collaborator

rigelk commented Apr 11, 2018

Ideally the instance administrator would limit the classic upload for each user but eventually let users upload videos by specify an IPFS address. I guess when a user […] wants to watch a video hosted on IPFS, PeerTube provides it by reading from IPFS and not from its local storage.

@alxlg indeed that's more or less how I envisioned the potential use of IPFS. But then there's the fact an IPFS endpoint is not a webseed[¹] nor holds versions of different quality. In other words, IPFS would only be a second class citizen feature-wise.

¹: let me extend a bit on that issue. The fact is that we use WebTorrent (BitTorrent/WebRTC) on the client side to watch videos. It provides a handy pool of content seeders and direct browser connection. Watching a video via IPFS would mean to replace entirely that component with an IPFS client in the browser. So it's not just thinking of a different storage/upload mechanism.

If you have any ideas as to how to solve these problems, I'm all ears :)

P.S.: we have also not heard a lot about IPFS performance-wise when it comes to streaming videos.

@alxlg
Copy link
Author

alxlg commented Apr 11, 2018

@rigelk thanks for your reply!

I had not thought of different quality versions of videos. Since IPFS is really low in the stack the only solution I can think of is storing a different IPFS file for each version. The user should be able to specify the IPFS address for each quality version he wants to maintain... This doesn't seem user-friendly but with a desktop client that automatically manage versions on IPFS it could gain adoption... Ideally the desktop client could use some API to upload the video to PeerTube by specifying many IPFS addresses. Desktop client's users should just pick a video from their HDD and the desktop client would generate different versions, upload them to IPFS and load the addresses to a PeerTube instance.

It seems like a big amount of work but promising to me and the idea could get many contributors.

@rigelk
Copy link
Collaborator

rigelk commented Apr 11, 2018

@alxlg we're not even close to writing a Desktop client. This is a non-option considered our resources.

I was considering leaving the video uploaded without transcoding, thus leaving a single quality available. It's always better than no video at all.


But now that I come to think of it, about ¹: do we even have to replace the webtorrent client for ipfs videos? If we could manage to mark ipfs endpoints as WebSeed, we could just use them under the hood of webtorrent by making the webtorrent client aware of them.

@alxlg
Copy link
Author

alxlg commented Apr 11, 2018

@rigelk in fact I did not intend to replace the video player. I thought that the server could run both PeerTube and a IPFS node, and the PeerTube instance see the file cached with IPFS like local files... I hope it has sense now...

This feature doesn't depend on a desktop client, but it would just help normal PeerTube users to automatically store their videos locally.

I would be happy to store some videos with IPFS on my HDD without a PeerTube instance that need much more maintenance.

I think the change would mostly be in PeerTube UI providing a way to upload a video through a IPFS address and not uploading the entire file to the PeerTube instance. Of course the server admin should configure it to run IPFS...

@Openmedianetwork
Copy link

Would not a webRTC torrent app running in the users PC do the same thing when she/he leave it seeding the video? This also allows user to create "seedboxs" using RSS auto downloading torrent apps. Easy simple/KISS way of "distributing" the video hosting.

https://github.com/Openmedianetwork/visionOntv/wiki/Seedbox

@alxlg
Copy link
Author

alxlg commented Apr 18, 2018

@Openmedianetwork good point! I think your proposal is easier to implement but using IPFS too could have some advantages. For example, I'm pretty sure that if an instance of PeerTube is no longer available an user can reupload his/her IPFS videos on another instance just sharing IPFS addresses, much better than reuploading video files! Do you think this could be achieved with torrents/WebTorrents too? Would changes to PeerTube be needed? Maybe uploading a *.torrent file instead of a video file?

@rigelk
Copy link
Collaborator

rigelk commented Apr 18, 2018

@alxlg using an IPFS address or a *.torrent file yields the same import capabilities. See #102. The only advantage of IPFS I see is that there are pinning brokers. (For BitTorrent too? I didn't check)

Repository owner deleted a comment from Serkan-devel Apr 19, 2018
@Openmedianetwork
Copy link

@rigelk
Copy link
Collaborator

rigelk commented May 25, 2018

@Openmedianetwork this has nothing to do with IPFS. Please find a related issue and detail your problem there, not in a blog post.

@Openmedianetwork
Copy link

Sorry this was a update for alexig that my suggestion for webRTC torrent as an alternative for seeding dues not appear to work after actually testing. Will start a new thread after further of tests.

@Chocobozzz
Copy link
Owner

Closing this issue since we have no really use cases for IPFS for now.

@NightA
Copy link

NightA commented Jun 27, 2018

@rigelk
@Chocobozzz
IPFS can serve as a backup for local storage and dedicated seeding pools, as it effectively transfers each newly added file to an entire pre-existing network of 300+ peers from the get-go. In such an instance, PeerTube might not necessarily even have to double as a WebRTC-based IPFS node but simply run along side a regular one (which in-itself can be optional), first to ensure the files initially propagate through the network and secondly to provide an optional method for retrieving them through a local gateway. However simply linking to an IPFS address should suffice if it would be possible to configure a PeerTube instance to use external public gateways for retrieval.

In a case where a PeerTube instance should go down with no pre-existing seeding pools in place, as long as the videos are still present on IPFS, it should be possible to retrieve them by simply following each video's address (that presumably was shared beforehand with other federated instances). This way each video will remain accessible and therefor could be later conventionally reseeded via a different instance.

As a by-product, if it'll be possible to authenticate each user's identity, perhaps it might also be possible to use this method for transferring channels between different PeerTube instances.

@poperigby
Copy link

poperigby commented Sep 26, 2019

@Chocobozzz shouldn't this be reopened? It seems what @NightA mentioned was a pretty good idea.

@ghost
Copy link

ghost commented Sep 26, 2019

I'm actually interested in implementing this, but I think a roadmap should be discussed.

What I propose:

Phase 1: Server uploads to IPFS and stores hashes of videos

This phase has the potential of requiring double the disk space since the files will be stored on disk normally, then uploaded to IPFS and pinned. In order to prevent that, ipfs-fuse could be used. That will allow mounting ipfs to a directory and designating all videos to be stored there / storing them in one place : IPFS.

I assume there's a json or a table in the db with the videos, where a field or column for the hashes of the video files can be added.

Phase 2: Import from IPFS

In this phase, the user will have the option to provide a hash that the server can download and process. Maybe it's possible to check the filetype before downloading it in order to save bandwidth, I dunno.

If phase 1 is done intelligently, hashes already present in the db will be rejected e.g fil

If phase 1 is done intelligently, hashes of videos already in the db can be rejected since that would create an unnecessary duplicate. But if another server uses the same hash with different properties (title, description) that might not be good. Up for discussion

Phase 3: Syncing with other instances and downloading hashes

This is separated from Phase 2 only if the code is in a separate area. Importing from IPFS might be a different procedure from receiving a video file another instance.
Since I don't know what data is sent over activity pub, I assume it's either the torrent or a link for a server-to-server API call, which include information about the video e.g title, description, resolutions and (of course) hashes.

Phase 4: (wishful thinking) ipfs:// links for videos

Users running their own IPFS nodes with IPFS companion could then stream using IPFS.
I haven't actually done a lot of research into this, so I don't know if it's possible. Maybe plugin for videojs would be necessary - I dunno.

End goal

Instances running IPFS nodes and using that to download and pin hashes, which would allow :

  • greater resilience to takedowns or simple storage failure
  • additional data sources since users could host the data too
  • possibly less bandwidth consumption if Phase 4 actually is possible and is done

@rigelk
Copy link
Collaborator

rigelk commented Sep 27, 2019

Server uploads to IPFS and stores hashes of videos

This phase has the potential of requiring double the disk space since the files will be stored on disk normally, then uploaded to IPFS and pinned. In order to prevent that, ipfs-fuse could be used. That will allow mounting ipfs to a directory and designating all videos to be stored there / storing them in one place : IPFS.

Where is it uploaded exactly? A third party (a pinning service?)?

ipfs-fuse requires a the Go IPFS runtime alongside, so this will complexify the deployment.

Syncing with other instances and downloading hashes

This is trivially done by adding another Link object in the ActivityPub Video.url field.

Users running their own IPFS nodes with IPFS companion could then stream using IPFS.

What bothers me is that users are expected to have this extension and a running IPFS Go runtime.

I haven't actually done a lot of research into this, so I don't know if it's possible. Maybe plugin for videojs would be necessary - I dunno.

I haven't found any videojs plugin for ifps, or its companion extension.

Instances running IPFS nodes and using that to download and pin hashes, which would allow :

* greater resilience to takedowns or simple storage failure

* additional data sources since users could host the data too

* possibly less bandwidth consumption if Phase 4 actually is possible and is done
  • resilience can be achieved in IMHO simpler ways. Right now what makes a video's origin instance disappearance a problem is that the BitTorrent tracker is the instance. Sharing the video over DHT alongside would already solve the problem, without changing our infrastructure.
  • users can already host the data, with a webtorrent-compatible torrent client.
  • less bandwidth consumption is already achieved with WebTorrent: people watching share their bandwidth (and not just those with an extension)
  • less bandwidth consumption is already achieved with WebSeeds/replication, with less impact to the buffering speed.

@ghost
Copy link

ghost commented Sep 27, 2019

Where is it uploaded exactly? A third party (a pinning service?)?

It's pinned locally and uploaded when somebody else requests it over IPFS. If someone does so over a third party, then they'll have it, but not pinned.

ipfs-fuse requires a the Go IPFS runtime alongside, so this will complexify the deployment.

That depends. It's very possible to do that in stages too:

  • stage 1: tell admin that IPFS has to be installed
  • stage 2: provide config interface for admin to target a IPFS node of choice (IP:port of IPFS HTTP API used by ipfs-fuse)
  • stage 3: provide option to install IPFS for the admin

What bothers me is that users are expected to have this extension and a running IPFS Go runtime.

This is not a proposal to force users to use IPFS, merely to give users the option. Right now they have the opportunity to use HTTP or webtorrent. This would merely be another one

I haven't found any videojs plugin for ifps, or its companion extension.

Yes, that would have to be developed if necessary. The companion provides the useful feature of redirecting URLs to IPFS instances :

Requests for IPFS-like paths (/ipfs/{cid} or /ipns/{peerid_or_host-with-dnslink}) >>are detected on any website.
If tested path is a valid IPFS address it gets redirected and loaded from a local gateway, e.g:

https://ipfs.io/ipfs/QmbWqxBEKC3P8tqsKc98xmWNzrzDtRLMiMPL8wBuTGsMnR
http://127.0.0.1:8080/ipfs/QmbWqxBEKC3P8tqsKc98xmWNzrzDtRLMiMPL8wBuTGsMnR

I assume we use HLS for streaming, so serving a different .m3u8 playlist with /ipfs links would be the only work required.


Your final points are valid, but I don't see the harm in providing an additional option. In case you missed it, I am willing to implement this, so your workload would be reduced to code-reviews and handling future bugs (since no code is perfect).

Of course, if you are firmly against having it in the original code base, I'll investigate if a plugin can be written and if not, simply fork it.

@rigelk
Copy link
Collaborator

rigelk commented Sep 27, 2019

This is not a proposal to force users to use IPFS, merely to give users the option. Right now they have the opportunity to use HTTP or webtorrent. This would merely be another one more.

Your final points are valid, but I don't see the harm in providing an additional option.

👍

In case you missed it, I am willing to implement this, so your workload would be reduced to code-reviews and handling future bugs (since no code is perfect).

I am not sure we could handle the future bugs part, especially when dealing with technologies we don't use regularly. And since merging code means taking responsibility in its maintainance…

Of course, if you are firmly against having it in the original code base, I'll investigate if a plugin can be written and if not, simply fork it.

I would suggest waiting for @Chocobozzz to answer about that - regarding plugins, not everything can be changed via their API. Depending on how many and where changes to the codebase are required, the plugin API could be expanded to facilitate your changes.

Now that being said, implementing it directly in the codebase at first is not a bad idea. It is not time lost, as this will serve as a POC and help us understand the reach of the needed changes - and thus the potential changes to the plugin API.

@rigelk rigelk added Component: PeerTube Plugin 📦 Features that can be developed in a plugin, but require PeerTube plugin API development Type: Feature Request ✨ and removed Type: Discussion 💭 labels Sep 27, 2019
@rigelk rigelk reopened this Sep 27, 2019
@NightA
Copy link

NightA commented Sep 28, 2019

Where is it uploaded exactly? A third party (a pinning service?)?

To add a bit more details to what @LoveIsGrief mentioned, when a file is added to IPFS it is given a UnixFX structure, cryptographically signed, hashed, registered as a content-identifier (CID) in an IPLD, broken into sets of Merkle-DAG blocks and then distributed peer-to-peer using a DHT.

* resilience can be achieved in IMHO simpler ways. Right now what makes a video's origin instance disappearance a problem is that the BitTorrent tracker is the instance. Sharing the video over DHT alongside would already solve the problem, without changing our infrastructure.

From what i understood, unlike the DHTs utilized with torrents, the IPFS network doesn't focus on seeding each file individually rather focuses on distributing the individual file-blocks themselves among many peers. In which case, a typical IPFS nodes doesn't really "seed" individual files, rather only temporarily caches blocks of various others and only stores complete sets of blocks for specific files when those are explicitly pinned. Otherwise, the cache gets garbage collected and erased after a specific time period that's configured on each individual node.

So unlike with a PeerTube instances that goes under with a specific torrent which happened to be seeded from one specific location (the instance itself), once the same file gets cached throughout enough IPFS nodes, it has this sort-of a grace-period for being retrieved. During that period it has the chance of being saved/pinned on another IPFS node or be imported unto another hypothetical PeerTube instance that supports retrieval from IPFS.

The IPFS node in this regard also doesn't have to have anything to do with PeerTube instances to begin with, as it simply provides the files as long as there's someone requesting them.

TL;DR - Torrent DHT's only replace a tracker and in that regard only point to files that may or may not be seeded anymore. IPFS provides a P2P CDN of sorts that can cache those files independently of their initial seeding PeerTube instance, thus preserves them over a pool that operates independently and is not restricted to any particular instance and/or file.

* users can already host the data, with a webtorrent-compatible torrent client.

In practice users who just consume content tend not to do so, unlike IPFS nodes who do so from the get-go upon request. E.g, while the initial "seed" has to come from an IPFS node running along with a PeerTube instance, once the file had propagated through multiple requests, it can be retrieved from other non-associated nodes within a given time-span.

That being said, yes, to ensure the file doesn't disappear from the network there has to be an IPFS node somewhere that pinns some of the content from the aforementioned PeerTube instance, which is a concept that sounds similar to a basic seedbox.
However considering the pre-existing given network of peers that can automatically participate, this gives more chances in terms of availability for those files.

* less bandwidth consumption is already achieved with WebTorrent: people watching share their bandwidth (and not just those with an extension)

With IPFS there are also dedicated nodes that run on Independent servers/VPS's who distribute the content, in addition to users who just seed some of it for the duration of its run and then move along.

So essentially if a video can be streamed from each public IPFS gateway within a pre-defined list, all the PeerTube instance has to do to offload traffic is just pick and point to one of those IPFS "edge" gateways and serve the page to the user as usual.

@ivan386
Copy link

ivan386 commented Sep 29, 2019

This phase has the potential of requiring double the disk space since the files will be stored on disk normally, then uploaded to IPFS and pinned. In order to prevent that, ipfs-fuse could be used.

You can just use --nocopy option on add. And --raw-leaves if you want to store it only in ipfs and have same hash as with --nocopy.

ipfs add --nocopy file_name
ipfs add --nocopy -r directory_name

Additionally if you use --chunker=rabin different files will share same parts in it.

ipfs add --nocopy --chunker=rabin file_name
ipfs add --nocopy -r --chunker=rabin directory_name

In torrent file can be added local webseed http://127.0.0.1:8080/ipfs/{cid(hash) of file or directory}. Torrent clients will use it if it will be available.

Webtorrent can fetch public gateways and replase ":hash" with {cid(hash) of file or directory} from local webseed link. And use that gateways as alternative webseeds.

You can add a comment to torrent file with options that used when file added in ipfs. And user than can re add it to ipfs with this options and get same root hash.

@ghost
Copy link

ghost commented Dec 18, 2019

Any updates on this issue?
IPFS is very important because it's an efficient tool against censorship.

@sundowndev
Copy link
Contributor

sundowndev commented Feb 4, 2020

I don't think IPFS support can be added to Peertube that easy. My suggestion would be to create a different client-side intended to use IPFS network-based backend. Adding IPFS to Peertube as it would add several compatibility issues and increase code complexity. Also may be a PoC about IPFS video streaming with benchmarking should be done first?

@ivan386
Copy link

ivan386 commented Feb 4, 2020

@sundowndev
Simple player: https://github.com/ivan386/ipfs-online-player
Example of how it works: https://gateway.ipfs.io/ipfs/QmX8DUfyL7sVSa61Hwvx4qiTHPcpWrNxvb3XSm5iiqH8Ph/#/ipfs/QmdtE78NHJGByBpoPREMQA142oj9hFPmQRxMniDsbdhw5d

@ghost
Copy link

ghost commented Jan 16, 2023

unless some servers configured IPFS as persistent

Not sure i understand your point. I'll try to respond anyway?

You can host data permanently if you want, in IPFS terminology this is "pinning". All this does is, as you suggest, gets the data and prevents the data expiring from a cache that it would otherwise expire from.

IPFS relies on gossiping the state of this local cache to nodes that you have a direct (tcp/udp) connection with. If one server happens to still hold onto a chunk because it was fetched recently, its neighbors will find out about this chunk. If it stays cached for long enough, unconnected servers could learn about it through the DHT.

This pinned-ness is a flag that has to be set per chunk. (Chunks are MAX about a megabyte, usually half that.) It can be applied recursively to all the chunks necessary for a file or directory, via IPLD links, but the server still fundamentally is just gossiping about which specific <1MB pieces of data are in the cache.

This is the only replication you get: either someone explicitly chooses to host the file by pinning it, or you get lucky and someone still has a piece cached. That latter strategy can make IPFS appear pretty fast when you're dealing with data that isn't too large or too uncommon or too deeply nested. Every time you have to discover a new layer of blocks that you need but don't have, it takes considerable time to broadcast that request to neighbors, search for it in the DHT, make new connections, gossip block lists, and eventually find someone who has a copy of that data.

@manalejandro
Copy link

Hello, to clarify the matter a bit, each instance of PeerTube would have its own IPFS repository, the repositories share the information with each other, but i can have a repository that does not accept sharing only the content that i have locally, in that case it could be decide which instances to share my pinned blocks with the others. I was looking at the current implementation and it is not modular, so the PeerTube storage layer would have to be modularized first to integrate this P2P storage.

@ROBERT-MCDOWELL
Copy link
Contributor

ROBERT-MCDOWELL commented Jan 17, 2023

@manalejandro
your suggestion seems to be interesting indeed. why not build a plugin and when it's reaching maturity why not to integrate it in peertube core.... It would be interesting to study how archive.org uses IPFS......

@ShadowJonathan
Copy link

ShadowJonathan commented Jan 25, 2023

Peertube doesn't even need to host its own IPFS store, it can just start streaming directly from ipfs.io or dweb.link (both ran by protocol labs) to the browser.

For extra authenticity / trustlessness, peertube can download the video in car format (like .tar but for IPFS content, and self-verifying), and reconstruct it locally to verify its authenticity, or do that to speed up downloads / archive the video via traditional storage means.

In the end, running an active IPFS node, or even caching it locally via IPFS, is only needed if the peertube instance is going to fetch remote content over the network, and not via gateways.

The only problem is that this would make the job of pinning the data nebulous, as the video is just a reference to a CID (IPFS file hash/reference, "Content IDentifier"), and the backing data to that CID can be either alive or not.

For this, pinning outsourcing can be done via the Pinning Services API a REST Spec that some commercial pinning services follow, but more interestingly, ipfs-cluster - a self-hostable pinning 'orchestrator' - also uses. A peertube instance can be paired up with a ipfs-cluster (running an IPFS node), and pin locally-saved videos to this cluster (or those other pinning services) via this API.

All in all, there are plenty of options to make this more speedy.

What I will not recommend is defaulting to adding an IPFS node to every peertube instance, this might work if it is put into dhtclient mode (or in other words: it does not participate in the DHT exchange comprehensively, and doesn't announce it's presence to the network, as exchanging and refreshing DHT records is a major idle load), so that the peertube node can at least attempt to find a video on the network without relying on public gateways, but this requires investigation.

What I will also warn against is putting the ipfs client directly in the browser. IPFS runs best when it's been warming up for a while, as then it has a healthy cache of known closest-distance peers on the network, and it can resolve instances relatively quickly. A cold-started IPFS client on a video page will not be able to serve content (quickly) for at least half a minute or so, even more if the content is behind NAT, and finding addresses for that peer is tricky.

A service worker could work, maybe. If one is connected to the Peertube instance's "home ipfs node" by default, and/or other "known peers that host the content", that federated peertubes could hint at, retrieval could start immediately. Though this would simply turn the IPFS node into a 'workhorse' as a fancy CDN, giving it non-distributed load scaling.

Finally, Kubo (aka go-ipfs) isn't the only implementation out there, recently i found out about iroh, a WIP rust-powered IPFS client, which could maybe illustrate that not everything is about the golang impl.

All-in-all, here's the summary of IPFS-in-peertube as I see it;

Pros:

  • Disconnect of data and addressing with CIDs
    • Videos can be simple links to CIDs, or can be the CIDs
    • These CIDs and their content will be immutable, and self-verifying
  • Hosting can be done else-where, or can even be done by goodwill of a community.
    • (as long as one node has the content and is reproviding it to the network, it has a high chance of being found and retrieved)
  • Less load on the peertube instance, especially in a distributed setting
    • Assuming that other nodes grab the data from public gateways or with their own IPFS nodes, they can also re-provide (as long as it is cached and reproviding is turned on for cached content)
    • Public gateways at least have a level of caching, lessening load and need to re-fetch content constantly
    • Users who pinned the content may also be able to re-provide
    • Load may be avoided from the peertube instance entirely if the client loads directly from a public gateway

Cons:

  • Unpredictable loading times
    • DHT search can be slow
    • Content can be missing, never completing a search
    • Providers may be offline, or unaccessible
    • Public gateways time out, as the content does not meet their need of relative spread and knowledge of that content (i.e. content has been reprovided a bunch of times, and is well-known in the DHT, this isn't the case for hours after the video is first "seen" by the network, especially if it's a big file.)
  • car format cannot be downloaded in chunks
    • I found nowhere that it is possible to do this, and i read that car data is given opportunistically in a "first seen in strict dag order". The CARv1 specification also notes that there needs to be an additional spec for determinism, basically waiving this problem.
    • This means that a car file needs to be downloaded whole, in one HTTP session, or otherwise restarted
      • This makes it relatively unsuitable for client-side reconstruction of the IPFS file DAG
        • (directed a-cyclical graph, basically a self-verifying set of blocks in a strict order and relationship, chunks of data that're all verifiable by one hash, which is inside the CID)
        • This means that the client needs to download a direct video file from somewhere, and trust that they're sending the right data according to the CID
  • Some gateways explicitly forbid video streaming and downloading
    • Cloudflare's IPFS gateway is one of these, I don't know if they've reversed their policy or if there are any others, but this is worth noting
  • IPFS (at least in a server profile) has high overhead and high idle load
    • This has already been covered above, but yes, Kubo constantly connects and gets connected by other peers exchanging DHT records, constantly reproviding them. Needing to do a full cycle every 12h.
      • It is worth noting that with Kubo v0.18.0 this period is increases to 22h, applying less pressure on the network
  • Managing pins can be a hassle.
    • Running and maintaining an ipfs-cluster service is extra system administration.
    • Buying a subscription at IPFS pinning services, to essentially upload data and keep it uploaded. This might not be very attractive as many of these services are branding themselves around cryptocurrencies and related.
  • Data can simply be gone one day.
    • This can be circumvented by having a .car backup of the data locally, and reviving the data somewhere on-demand (as some/most pinning services, and kubo itself, allows uploading .car directly), though this would be a last-resort case.
      • Additionally this still incurs the cost of having the data somewhere in the first place, which adds cost.
    • In lieu of having it archived, this means the video will essentially rot after a long while if nobody is interested in pinning it and keeping the records reprovided to the network, which can be the case, as running an IPFS node isn't cheap, and it doesn't get any cheaper the more data you want to store on it, the longer.

Essentially, IPFS is a loud, hyperactive, social butterfly cousin of torrent, where everyone can download anything they want with a magical key of that data, but that that data difficult to retrieve, 'inconvenient' to cache, or simply nonexistent, outside of people-with-that-key's control.

The biggest argument CIDs have (imo) is Portability and Self-verification, that anyone and everyone can take the hash and 'just download' the content, wherever it exists. Else it doesn't have much going for it.


No, I am not going to talk about Filecoin.

@ghost
Copy link

ghost commented Jan 25, 2023

Peertube doesn't even need to host its own IPFS store, it can just start streaming directly from ipfs.io or dweb.link (both ran by protocol labs) to the browser.

This is a terrible idea, we have a decentralized network now even if it's mostly just https. And you'd suggest that folks move to funneling all traffic through Protocol Labs in order to switch to a more decentralization-flavored protocol?

For IPFS to be an improvement at all, it would need to retain our ability to self-host video. Relying on an external HTTP gateway is really a no-go, and the gateways are a major limitation in deploying IPFS for general browser use. As others have noted, there's more opportunity for practical use as a server-to-server synchronization mechanism than for delivering client video. But as you and I have both noted, IPFS is quite resource intensive on the server.

The IPFS + PeerTube experiments I did some time ago were focusing on the case where you'd bundle an IPFS server (WebRTC gateway) on the same machine as PeerTube, so browsers could get a quick and reliable connection to the content while still allowing IPFS to serve files from elsewhere once we can warm up the connection. I put that project on hold though because of the high cost in latency and RAM.

@ShadowJonathan
Copy link

And you'd suggest that folks move to funneling all traffic through Protocol Labs in order to switch to a more decentralization-flavored protocol?

Please read the rest of my comment, that's just one of the options I suggested.

I forgot to enumerate it, but here's roughly the options as i see it:

  • Download directly from gateways, like i said above
    • Puts a lot of trust in the gateway to get the correct file.
  • Download from gateways, but via .car formats
    • This'd allow the gateways to not need to be trusted, buuuut...
    • ...the content can be slow to load, or time out in loading, in which case the HTTP request needs to be started over from the beginning.
    • And the client needs to reconstruct the proper file (stream) from this format, needing buffering in the browser itself.
  • Download from a sysadmin-ran gateway
    • This would add implicit trust in that gateway to fetch it, but;
    • Its extra sysadmin overhead
    • Running a gateway like that isn't cheap, as it's essentially running an IPFS node
  • Download through a browser-based IPFS client
    • This needs to have the IPFS client ran in service workers, else it doesn't have time to warm up.
    • It still needs to warm up, so content isn't availible instantly
    • It might consume a bunch of resources on the client, just to find and fetch content
  • Download through a browser-based IPFS client, peering instantly with an admin-ran IPFS node storing the content
    • Same problem as above; Sysadmin overhead and resource usage

All but the last option still needs the data pinned somewhere, either on user's computers, commercial pinning services, or a sysadmin-ran pinning service, so that comes on top of those.

@ShadowJonathan
Copy link

ShadowJonathan commented Jan 25, 2023

One trick that peertube could probably deploy is the following;

IPFS has a high overhead and idle resource mostly of two things: Acting as a DHT server, and reproviding constantly.

The first one adds a lot of inbound connections and related handling, the second one adds a lot of outbound connections and related handling.

Both can be turned off, leaving the client in a relatively-low idle state, but turning of reproviding makes it not possible to discover content properly on the network.

However, Kubo has an option to only reprovide the "roots" of pins, i.e. the head block, the first block, the one that points to all the others.

It's generally not recommended this way, as something like the following could happen;

  • Alice wants to find X, Bob has X
  • X points to Y and Z, these aren't announced by bob, but X is.
  • Alice finds out Bob has X, and connencts, and asks for X, bob provides it.
  • Alice now sees that it needs to find Y and Z, it starts "looking" for this, possibly taking the time to ask Bob, but;
  • Alice can possibly have disconnected from Bob before she can ask him if he has Y and Z, this is a problem, because since Alice has X cached, she wont need to look for it anymore, and instead keeps trying to find Y and Z, which nobody tells Alice where it is, because Bob hasn't told anyone he has Y and Z.

This makes this option less favourable, even if it has extremely low overhead (it's easier to announce 100 records than it is to announce 100_000), because of its fallibility. Normally IPFS recovers from this by finding the same node over and over again when DHT-querying the remaining content, and more-or-less stays connected to it while fetching new blocks.

What Peertube could do in this instance is something like this;

  • It sees that a client wants to play X
  • It queries through its local IPFS node which nodes have X. (ipfs dht findprovs)
  • It says Bob has it, Peertube then instructs the local node to add a peering to Bob, i.e. "Stay connected to Bob, and reconnect if things go wrong"
  • (We could take the fact that Bob has provided this CID as a signal that it has the entire file)
  • Then, Peertube instructs the local node to download X, and it starts fetching the blocks from Bob, and caches them locally.

However, this still has a few fallacies;

Kubo only lists up to 20 peers, any of these peers might not be a Peertube-ran peer, and could only have cached the root node, not any other.

(Other implementations, such as js-ipfs, give an iterable with all found-peers that have this content, possibly allowing someone to mostly-exhaustively map every node currently providing it.)

There is no real way of knowing if a peer is a peertube-ran ipfs node or not, on the surface.

However, this could be solved by providing a custom protocol through the IPFS node. kubo gives an experimental API for this with ipfs p2p, essentially injecting a new protocol in an existing node.

This could be used to simply say "hi" to another node, verifying if it is a peertube instance, and/or asking "hey, i saw you had X, do you have it all pinned as well?" for reliability, to know that "this is the peer to connect to". Through this protocol it could even be signalled that the peer would like to not be downloaded from, if possible.

Through this it might be possible to have an extremely-low-overhead IPFS peer running inside peertube, one that is mostly dedicated to storing and exchanging IPFS content for peertube primarily.

@manalejandro
Copy link

@manalejandro your suggestion seems to be interesting indeed. why not build a plugin and when it's reaching maturity why not to integrate it in peertube core.... It would be interesting to study how archive.org uses IPFS......

Hi, i'm trying to implement the plugin but i don't think it's the best idea to start a js-ipfs instance in the client side or as a server addon, i would like to do an implementation like uses OBJECT_STORAGE with S3 storage, I think it is the most economical solution in this case. I will use the latest release and we will discuss when i have something, regards.

@manalejandro
Copy link

I don't have much time to dedicate, but i think this is a good start

let ipfs: IPFS.IPFS
async function getClient () {
  if (ipfs) return ipfs

  const IPFS_STORAGE = CONFIG.IPFS_STORAGE

  ipfs = await IPFS.create({
    repo: IPFS_STORAGE.REPO_PATH || undefined,
    repoAutoMigrate: IPFS_STORAGE.REPO_AUTO_MIGRATE || false,
    peerStoreCacheSize: IPFS_STORAGE.PEER_STORE_CACHE_SIZE || 1024,
    config: {
      Profiles: IPFS_STORAGE.PROFILE || 'server'
    }
  })

  logger.info('Initialized IPFS repo path: %s.', IPFS_STORAGE.REPO_PATH, lTags())

  return ipfs
}

@hollownights
Copy link

Hi, i'm trying to implement the plugin but i don't think it's the best idea to start a js-ipfs instance in the client side or as a server addon, i would like to do an implementation like uses OBJECT_STORAGE with S3 storage, I think it is the most economical solution in this case.

@manalejandro Yes, if I got your idea right, that's also how I see it: as a middle ground between totally local storage and totally remote storage (S3). Each instance should have its own local IPFS node (and a public gateway for it) and the files locally stored would then be imported (but not duplicated) to its local IPFS node.

As others have noted, there's more opportunity for practical use as a server-to-server synchronization mechanism than for delivering client video.

@scanlime Yes, that's why I talked about sharing IPNS names of the folders stored by an instance: right now, there is no easy to contribute to an instance by re-hosting its videos. If PeerTube wasn't about to drop WebTorrent, this could be easily done by creating dumps of torrent files and making them available to the users, but WebTorrent support is going to be dropped, so something else must be done so that real decentralization can be achieved.

@ghost
Copy link

ghost commented Jan 28, 2023

right now, there is no easy to contribute to an instance by re-hosting its videos. If PeerTube wasn't about to drop WebTorrent, this could be easily done by creating dumps of torrent files and making them available to the users, but WebTorrent support is going to be dropped, so something else must be done so that real decentralization can be achieved.

Of the systems we have implemented or discussed here, few of them actually let a random contributor provide bandwidth assistance to an instance or video they want to support.

We have the redundancy system, which comes the closest. Another peertube server can follow, download content, and provide regular HTTPS URLs that will actually help clients download the video faster. This works pretty well, but it does require that the server allows redundancy, and the features for robustness and abuse prevention are currently quite limited.

WebTorrent was never especially useful for providing bandwidth assistance like this, because so few torrent clients were actually compatible with WebTorrent. The bandwidth helper would have to use one of the few torrent clients that supports WebRTC transport, and each client would still take some time to discover the available helpers rather than getting a list right away like the redundancy system provides.

The IPFS scheme you're presenting, as i understand it, would let anyone provide long-term storage with no reliability claims attached. So, not a replacement for primary storage but an optional source of additional bandwidth. Problem is, that bandwidth is only directly useful to IPFS peers, not to regular web clients. The helpfulness of this scheme depends pretty much entirely on gateway bandwidth. You'd either need more widespread adoption of the WebRTC bridge so js-ipfs can connect to helpers directly, or you'd need people to also (separately?) volunteer to provide IPFS gateway services. This might work but it seems like a lot of extra RAM and bandwidth to spend for roughly the same benefit we had in the existing redundancy setup.

@hollownights
Copy link

WebTorrent was never especially useful for providing bandwidth assistance like this, because so few torrent clients were actually compatible with WebTorrent. The bandwidth helper would have to use one of the few torrent clients that supports WebRTC transport

Yes, it's a shame that so few clients offer support for WebTorrent, but there is a WebTorrent client and it can be run on a server, so technically that could still be achieved...

The IPFS scheme you're presenting, as i understand it, would let anyone provide long-term storage with no reliability claims attached. So, not a replacement for primary storage but an optional source of additional bandwidth.

That's right.

Problem is, that bandwidth is only directly useful to IPFS peers, not to regular web clients. The helpfulness of this scheme depends pretty much entirely on gateway bandwidth. You'd either need more widespread adoption of the WebRTC bridge so js-ipfs can connect to helpers directly, or you'd need people to also (separately?) volunteer to provide IPFS gateway services. This might work but it seems like a lot of extra RAM and bandwidth to spend for roughly the same benefit we had in the existing redundancy setup.

I do understand the problems around this scenario, and so I pose this question: how could someone that is familiar with using seedboxes help a PeerTube instance? After all, someone who pays for a seedbox service isn't (necessarily) someone that know how to setup a sever, and as such the existing redundancy setup isn't of real use to such an user. How can this gap be bridged so that your P2P-heavy user can contribute in an easy way?

@ghost
Copy link

ghost commented Jan 28, 2023

how could someone that is familiar with using seedboxes help a PeerTube instance?

Seems like we need social infrastructure for this not technical, no? Anyone can install peertube and offer redundancy, if they have a computer that's reachable over https. If their computer is not reachable over https they aren't going to be able to provide data quickly to clients, so it's of limited help.

@hollownights
Copy link

Seems like we need social infrastructure for this not technical, no?

That's like saying someone should go for trade school to learn how to install solar panels in order to contribute to the grid.

Anyone can install peertube and offer redundancy, if they have a computer that's reachable over https.

I agree. Is there an .exe for that?

One line in the command line is one line too many.

@ghost
Copy link

ghost commented Jan 28, 2023

I agree. Is there an .exe for that?

i get that a lot of folks would like there to be an approximately zero-step way to contribute bandwidth, but the reality is just more complicated than that and we don't have magic solutions...

if you want to help today right now using technology that exists, you need to be reachable over https OR you need to be able to very reliably do NAT hole punching. neither of these are compatible with being a completely clueless user who wants to understand nothing about the network.

as i said, this isn't fundamentally technical it's social. people need to interact socially and trust each other to some extent. if we tried to do video streaming over a fully trustless network that's not going to be a performance improvement over the status quo.

@hollownights
Copy link

i get that a lot of folks would like there to be an approximately zero-step way to contribute bandwidth, but the reality is just more complicated than that and we don't have magic solutions...

if you want to help today right now using technology that exists, you need to be reachable over https OR you need to be able to very reliably do NAT hole punching. neither of these are compatible with being a completely clueless user who wants to understand nothing about the network.

I ask for an .exe more to create some discussion than to actually get an .exe. I get it, things just are not that simple. But if today someone who know their way around command line and networks would be to start a seedbox service for PeerTube's instances, would they be able to do it or they would need to host an entire PeerTube instance? Could they run a bunch of "PeerTube's sharing-is-caring instances" much like a seedbox runs a bunch of BitTorrent clients and then let users select which instances they would like to help (much like a user do when they add a bunch of torrents to a seedbox)?

@ghost
Copy link

ghost commented Jan 28, 2023

Seems like we need social infrastructure for this not technical, no?

That's like saying someone should go for trade school to learn how to install solar panels in order to contribute to the grid.

Anyone can install peertube and offer redundancy, if they have a computer that's reachable over https.

I agree. Is there an .exe for that?

One line in the command line is one line too many.

i get that a lot of folks would like there to be an approximately zero-step way to contribute bandwidth, but the reality is just more complicated than that and we don't have magic solutions...
if you want to help today right now using technology that exists, you need to be reachable over https OR you need to be able to very reliably do NAT hole punching. neither of these are compatible with being a completely clueless user who wants to understand nothing about the network.

I ask for an .exe more to create some discussion than to actually get an .exe. I get it, things just are not that simple. But if today someone who know their way around command line and networks would be to start a seedbox service for PeerTube's instances, would they be able to do it or they would need to host an entire PeerTube instance? Could they run a bunch of "PeerTube's sharing-is-caring instances" much like a seedbox runs a bunch of BitTorrent clients and then let users select which instances they would like to help (much like a user do when they add a bunch of torrents to a seedbox)?

This is basically how the redundancy feature already operates. The server can automatically select popular videos to mirror or an admin can select them. It's done via the peertube UI, so you'd either leave that enabled or write something simpler that's special purpose. But the simplest thing is often whatever you've already got.

@alxlg
Copy link
Author

alxlg commented Jan 28, 2023

@scanlime I have the impression that you have forgotten what my original proposal was.

@ghost
Copy link

ghost commented Jan 28, 2023

@scanlime I have the impression that you have forgotten what my original proposal was.

No, the problem here is that when we try to examine the details of who is providing what over what channels, the magic-ness evaporates and we are stuck talking about servers and clients and recurring administration duties.

As rigelk responded nearly a hundred comments ago, the data needs we have on the client side and the import side are different, so offering uploads over IPFS doesn't help in most use cases. So, that's basically a non starter unless we have the desire and infrastructure to repeat the transcodes in many places.

There are surely places we could use IPFS but I'm just trying to bring this conversation down to earth and work through the actual problems we are trying to solve and understand the limitations of the tech stacks available.

@manalejandro
Copy link

Hi, i'm trying to implement the plugin but i don't think it's the best idea to start a js-ipfs instance in the client side or as a server addon, i would like to do an implementation like uses OBJECT_STORAGE with S3 storage, I think it is the most economical solution in this case.

@manalejandro Yes, if I got your idea right, that's also how I see it: as a middle ground between totally local storage and totally remote storage (S3). Each instance should have its own local IPFS node (and a public gateway for it) and the files locally stored would then be imported (but not duplicated) to its local IPFS node.

As others have noted, there's more opportunity for practical use as a server-to-server synchronization mechanism than for delivering client video.

@scanlime Yes, that's why I talked about sharing IPNS names of the folders stored by an instance: right now, there is no easy to contribute to an instance by re-hosting its videos. If PeerTube wasn't about to drop WebTorrent, this could be easily done by creating dumps of torrent files and making them available to the users, but WebTorrent support is going to be dropped, so something else must be done so that real decentralization can be achieved.

Hi, i'm trying to implement the complete backend with IPFS, otherwise it wouldn't make sense to duplicate the data, i think an API like the one used by S3 is the cheapest way to start it, i'm not saying it's the only one or the most efficient one, besides IPFS information can not be duplicated because all have a unique hash, regards.

https://github.com/Chocobozzz/PeerTube/blob/develop/shared/models/videos/video-storage.enum.ts

export const enum VideoStorage {
  FILE_SYSTEM,
  OBJECT_STORAGE,
  IPFS_STORAGE
}

@ROBERT-MCDOWELL
Copy link
Contributor

ROBERT-MCDOWELL commented Jan 28, 2023

the best would be that every server starting a peertube instance become a node from a peertube gateway or whatever cluster gateway specified in the config....

@Pantyhose-X
Copy link

Pantyhose-X commented Apr 14, 2023

Storing videos in IPFS prevents administrators from deleting my videos,
PeerTube instance administrators from maliciously deleting videos or closing registration/shutting down the server.

@ROBERT-MCDOWELL
Copy link
Contributor

ROBERT-MCDOWELL commented Apr 14, 2023

@Pantyhose-X IPFS (kubo) needs always the original file source (or an ipfs cluster source) to keep the chunks alive. if after some time (set from the gateways you are using) the source does not respond the chunks are deleted....

@S7venLights
Copy link

S7venLights commented May 2, 2023

Would not a webRTC torrent app running in the users PC do the same thing when she/he leave it seeding the video? This also allows user to create "seedboxs" using RSS auto downloading torrent apps. Easy simple/KISS way of "distributing" the video hosting.

https://github.com/Openmedianetwork/visionOntv/wiki/Seedbox

I haven't read this whole thread but I had a similar thought to the one quoted above...

How to save PeerTube from constant dying instances

Many instances are constantly disappearing as they cant afford to maintain the server costs (Bandwidth). Torrents have existed for years and have kept most loved content up/available reliably. That's because every user is 'donating' storage and bandwidth.
PeerTube only does this with people running the same video in a browser at the same time.
I propose a PeerTube app that works essentially like any torrent app.
It can cache a user chosen amount of gigs of video and seed it constantly.

In order for it to work well, it would help to have a more centralised instance, so more seeders exist, but I guess the seeding could just work across all instances.
Additionally, if magnet links are used, a content creator could always seed their own content, ensuring it's still available even if it has low seeds or an instance goes down, since the PeerTube app could save the magnet links when someone subscribes to a channel.

As a result:
Popular videos will always be bandwidth backed.
And perhaps to solve storage issues, a content creator could opt to solely self host the video data on their machines/servers/seedbox but still have their channel hosted on the instance server. So in that case, the instance acts as the torrent website directory for the creator, but the data and bandwidth is run by creators and users.
It would also reduce overall bandwidth use, since if users re-watch a video, it will still be in their seeding cache days later, so it could play locally and update the watch count in the instance.
As to creators wanting to delete or update videos, they could do so in the app and that would signal the hosting instance to update the details and the instance can signal all other seeders to sync changes.
Honestly this would be a win...

I can't take full credit for this Idea, it was partly inspired by https://autotube.app/
Additionally we could allow people to download, watch and seed videos with regular torrent clients, thus increasing seeders or seeding up-time.

Alternative idea:
Use a similar setup to https://storj.io were people volunteer host data on a distributed P2P network with redundancy built in. (I suppose IPFS works similarly?)
This way, even if a content creator went offline forever, their videos will still be seeding/hosted with other nodes.

@Martinligabue
Copy link

I see no one mentioned that Brave has built-in ipfs support, so every user coming from Brave (in addition to those who have the extension or ipfs installed) would contribute to the reseeding of the file. (This is without counting people who have the same video on their ipfs, even if they are not connected to peertube, since the hash is unique in the entire ipfs network)

@ghost
Copy link

ghost commented May 25, 2023

I see no one mentioned that Brave has built-in ipfs support, so every user coming from Brave (in addition to those who have the extension or ipfs installed) would contribute to the reseeding of the file. (This is without counting people who have the same video on their ipfs, even if they are not connected to peertube, since the hash is unique in the entire ipfs network)

from the brave docs, looks like even if one does feel morally fine with a browser made by the Brave team it won't technically help at all by default, and with non default settings it's no better (at latency, dht discoverability, longevity) than any other instance of go-ipfs running on a random laptop:

"By default, Brave will load the URI being requested via a public HTTP gateway; however, it will also show an infobar that asks you if you’d like to use a local node to resolve IPFS URIs. If you choose to use a local node, Brave will automatically download go-ipfs as a component and will route future traffic through this node."

@drzraf
Copy link
Contributor

drzraf commented Nov 8, 2023

Torrents have existed for years and have kept most loved content up/available reliably.

  • More or less: Computing infohash from metadata (filename, ...) made torrents' contents persistence inferior to what ed2k"s epoch KAD network offered.

BT v2, 2017 attempted to somehow overcome this (with per-file hash trees and hybrid SHA256 computed metainfo files)

But IMHO a DHT (even more a media-related one) should be exclusively based on actual content's chunks of bytes
IPFS does this (although in a slower, and maybe safer way than the KAD network)

Although aMule did most of this 15 years ago, crossing DHTs wasn't an easy task (hMule) back in in 2011 nor were 2013's Javascript implementations

I propose a PeerTube app that works essentially like any torrent app.

Now, in 2023, js-libp2p provides many useful bits of websockets, IPFS, KAD & Bittorrent, and it may sound possible to create a multi-DHT browser-based client (tightly connected to HTTP-service for metadata retrieval/mapping) and blur the lines between local and remote storage (or seeding both long-term local contents and medias being currently played online)

.... but the Bittorrent experience clearly shows that if moving/renaming a file stops participation, then it's harder for the network to keep up. A successful storage (and network) distribution depends on:

  • a user willingly choosing what they save locally (storage/network contribution) and what not (🙃 GNUnet)
  • being able to move/rename their local copy to their wishes

And for this very reason, even if the protocols/languages/libraries lowered the DHT barrier, browsers' hardened security would never allow fancy local filesystem operations. Since it's impossible to be too fancy within browser context:

  • browsers are bound to store and seed short-term/runtime transitory contents
  • Long-term storage/network distribution will stay restricted to dedicated (non-browser) clients

@drzraf
Copy link
Contributor

drzraf commented Nov 15, 2023

Digging further, it happens that the solution was nicely envisioned 8+ years ago and is called... LBRY (FAQ)

It's based on two distributed data stores (a blockchain and a Kademlia DHT) and a peer-to-peer protocol for exchanging data and brings the concept of content reflector

@OrcVole
Copy link

OrcVole commented Jan 12, 2024

I think utilizing IPFS in Peertube would be a good idea.
In case people are not aware of it, here is ipfs video:

https://ipfs.video/

One thing that could help would be for Peertube to calculate the IPFS CID for files it is making available as VOD. If the relevant CID were published alongside the file, that could help. Peertube wouldn't have to provide files using IPFS to do this. It would only need to calculate their CIDs and list them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: PeerTube Plugin 📦 Features that can be developed in a plugin, but require PeerTube plugin API development Type: Feature Request ✨
Projects
None yet
Development

No branches or pull requests