Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP-81 Modeling Files / Filesharing #417

Closed
wants to merge 1 commit into from

Conversation

lovvtide
Copy link

@lovvtide lovvtide commented Apr 7, 2023

Here's my proposal for censorship resistant file sharing on nostr. The three main goals of this proposal:

  • To define standard metadata to describe a file in the nostr ecosystem
  • To make it possible for relays to index file metadata, thereby enabling clients to query for files
  • To mitigate the problem of dead links and generally increase resilience/censorship-resistance by utilizing BitTorrent as a fallback when there are no servers hosting a file

I'm currently building a file manager app the implements this spec, so I should be able to show that soon as well. What do you guys think?

Here's the proposal https://github.com/lovvtide/nips/blob/NIP-81/81.md

@lovvtide lovvtide changed the title NIP-81 Modeling Files NIP-81 Modeling Files / Filesharing Apr 7, 2023
@v0l
Copy link
Member

v0l commented Apr 7, 2023

Same as #337?

@lovvtide
Copy link
Author

lovvtide commented Apr 8, 2023

Same as #337?

I investigated using NIP-94 for my filesharing app but realized there some pieces missing:

  1. In NIP-94, neither the file hash nor the file type are single letter tags, and thus (according to NIP-12) those values are not indexed by relays. This is a big issue because it means that when using NIP-94 clients cannot effectively query relays for media.
  2. NIP-94 is missing metadata for the file name and the file size which are important for UX on the client, and so the client can know how big a file is before you start downloading it.
  3. NIP-94 has no mechanism for resilence/censorship-resistance or dealing with dead links when a server stops hosting the file

This is an alternative proposal that fixes those problems and provides greater extensibility.

@v0l
Copy link
Member

v0l commented Apr 8, 2023

I think all of the issues you raised are valid and should be included in NIP-94

@fiatjaf
Copy link
Member

fiatjaf commented Apr 8, 2023

This seems to be a very well-thought proposal, we should merge it with NIP-94 for sure, or vice-versa.

@fiatjaf
Copy link
Member

fiatjaf commented Apr 8, 2023

I think we can summarize both this and the other proposals as just pointer to files with optionally multiple ways of accessing these files and optional metadata. We just have to agree on the names of the tags for the metadata.

  • url -> http or magnet (multiple)
  • e -> NIP-95 event that contains the actual file, if anyone is that crazy (optional)
  • x -> sha256
  • m -> mimetype
  • t -> torrent infohash
  • size -> bytes (optional)
  • secret -> key and nonce for AES-GCM encryption (optional)
  • blurhash -> for cosmetic purposes (optional)

What do you think? @frbitten

@lovvtide
Copy link
Author

lovvtide commented Apr 8, 2023

We just have to agree on the names of the tags for the metadata

Some thoughts/questions:

  • I think there needs to be an m tag for the file's mime type so that clients can query based on type of file (the type tag was also suggested by NIP-94)
  • I think the size tag should be required and not optional. If the client is able to obtain the hash of the file, they also presumably can read the size of the file. Is there any situation where the client would not have access to that value when creating the event? File size seems quite important for both the client and also potentially for hosting providers, e.g. for allocating the proper amount of storage
  • Having at least an optional tag name seems like a good idea for the sake of UX when listing/managing files on the client (I'm imagining a dropbox sort of interface), so the user doesn't see a list of files that are all named "Untitled"
  • If the url tag was allowed to contain http OR a magnet link, wouldn't that put an additional burden on clients to "detect" somehow what kind of link if was before loading that link? I think if the magnet link is going to be included it should therefore be its own optional magnet tag. Or perhaps a better alternative would be to not include the magnet link at all, since it can be easily constructed from the other metadata in the event according the magnet link format like magnet:?xt=urn:btih:${infohash}&dn=${name}&ws=${url}
  • I wonder, what is the benefit of having the infohash be its own t tag and not just another x tag with a marker infohash as the third element of the array? I agree that sha256 should definitely be the default hash (which is why, as I wrote in this proposal, sha256 is required and any x tag without a marker is automatically assumed to be sha256). The main benefit I see for having only an x tag (instead of an x tag specifically for sha256 and a t tag for infohash) is that having a single indexed tag for all hashes would make the event type easily extensible for applications that for some reason needed to use a different hash and need to be able to query using that hash. This would not add any burden on clients that only care about the default since an x tag with sha256 is guaranteed to be present anyway and any clients can just safely ignore other x tags with markers they don't know about. Put another way — I think it would be worth it to avoid having to think about modifying this NIP every time some app wishes they could use a new indexed identifier for a file

@fiatjaf
Copy link
Member

fiatjaf commented Apr 8, 2023

I think there needs to be an m tag for the file's mime type so that clients can query based on type of file (the type tag was also suggested by NIP-94)

Sorry, I forgot about that one. Modified above.

I think the size tag should be required and not optional.

What if you just have a file hosted somewhere and want to use a Nostr filesharing client to broadcast it -- and your file is 10GB? You'll have to download the file to check the size?

I agree with the magnet stuff.

Also it's not a big deal to edit the NIP every time. People will certainly add new non-specified tags to the events anyway if the usage catches on. It's better to differentiate tags according to usage, I think. We lose nothing from doing that.

@lovvtide
Copy link
Author

lovvtide commented Apr 8, 2023

What if you just have a file hosted somewhere and want to use a Nostr filesharing client to broadcast it -- and your file is 10GB? You'll have to download the file to check the size?

I suppose in that case (you don't have the file locally to check the size) the same logic would apply to file hash and mime type. It seems likely that whatever source you are getting that metadata from would also provide the file size.

Also it's not a big deal to edit the NIP every time. People will certainly add new non-specified tags to the events anyway if the usage catches on. It's better to differentiate tags according to usage, I think. We lose nothing from doing that

Ok yeah. If it's no big deal to edit the NIP then I agree it's better to use tags to differentiate usage.

@fiatjaf
Copy link
Member

fiatjaf commented Apr 8, 2023

I suppose in that case (you don't have the file locally to check the size) the same logic would apply to file hash and mime type. It seems likely that whatever source you are getting that metadata from would also provide the file size.

Good point.

I guess the file size and mimetype can be fetched from a HEAD request in most cases.

@v0l
Copy link
Member

v0l commented Apr 8, 2023

url -> r ?
t is already used for hashtag?

blurhash could also be merged into x if you use the hash name in the tag ["x", "<some-blurhash>", "blurhash"]

x may contain a sha256, blurhash or btih (or all)

@v0l
Copy link
Member

v0l commented Apr 8, 2023

@lovvtide If you don't use the full magnet link then you lose out on extra data like xs and ws tags which make it basically unusable on web

void.cat supports webtorrents with WebSeed & ExactSource, which are required to bootstrap a file from web, otherwise nobody will be able to load it

@lovvtide
Copy link
Author

lovvtide commented Apr 9, 2023

url -> r ?

The reason I thought to use url instead of r is to save relays from having to index that tag. Is there a reason the url needs to be indexed? I am aware that r is standardized elsewhere as referring to a url, but as I recall the reason that convention was standardized as a single letter tag is for the sake of building a commenting system where clients can query for events based on the url of the page. Since the url in this proposal refers to just the source of a file (and not really a page per se) I wonder if it makes sense to retain that usage, or would it be better to go with non-indexed url to lessen the load on relays?

t is already used for hashtag?

Ah yes, should probably change that then. Maybe i for "infohash"?

blurhash could also be merged into x if you use the hash name in the tag ["x", "", "blurhash"]
x may contain a sha256, blurhash or btih (or all)

That was my original idea, but as @fiatjaf pointed out it might be better to have a separate tag for each type of hash, i.e. use tags to differentiate usage instead of markers

If you don't use the full magnet link then you lose out on extra data like xs and ws tags which make it basically unusable on web

Well ws (webseed) would just be the value of the url (or r ... whatever we are calling it) right? As for xs, I did not know about that — if there are clients relying on that it does make sense to include it somehow. I suppose magnet could be its own tag, but I do think that even if there is a magnet tag the infohash and file name should still have their own tags so you can query using infohash and so that the client does not have to parse the magnet link just to display basic metadata. Maybe it's simplest to just have an optional xs tag — what do you think?

@frbitten
Copy link
Contributor

frbitten commented Apr 9, 2023

As I already put it in the NIP-94, I don't see a problem in changing the tags to be single letter and indexable. I didn't put them as indexable on purpose precisely so as not to overload the relays with indexes that we are not even sure if they will be used. I don't think hash lookup is viable. It will just be a way to confirm that the download was done correctly. In most cases, I imagine you won't even have this information at the event.

The m tag really is a good thing to be indexable for searches by type. File name in NIP-94 I left open to use other tags that already exist in the protocol such as "subject" (NIP-14) instead of creating another tag.

They can suggest changes in the NIP-94 that I accept them. Any other tag that you deem useful can be added. I tried to be as minimalist as possible to avoid limiting its use. There are numerous image metadata tags that can be included for example.

@frbitten
Copy link
Contributor

frbitten commented Apr 9, 2023

Acho que podemos resumir esta e as outras propostas como apenas um ponteiro para arquivos com várias formas de acessar esses arquivos e metadados. Nós apenas temos que concordar com os nomes das tags para os metadados.

  • url-> http ou ímã (vários)
  • e-> Evento NIP-95 que contém o arquivo real, se alguém for louco (opcional)
  • x-> sha256
  • m-> mimetype
  • t-> infohash torrent
  • size-> bytes (opcional)
  • secret-> chave e nonce para criptografia AES-GCM (opcional)
  • blurhash-> para fins cosméticos (opcional)

O que você acha?@frbitten

@fiatjaf
In NIP-95 it was suggested to have a specific header for it. To avoid mixing file sharing with data storage. That's why in NIP-95 I defined a specific header event for it.
Because the e tag may not be clear when it is the definition of a NIP-95 event or a simple reference to another existing event.

For the NIP-94 to work, it should be limited to only having one event-id in the "e" tag and it must be a NIP-95 event.

I don't know what would be the best approach in this case. What do you think?

@v0l
Copy link
Member

v0l commented Apr 9, 2023

The reason I thought to use url instead of r is to save relays from having to index that tag.

Yea maybe its not necessary, in NIP-94 they use r tag

That was my original idea, but as @fiatjaf pointed out it might be better to have a separate tag for each type of hash, i.e. use tags to differentiate usage instead of markers

I think it makes more sense to use just x, we have limited single char tags (which are indexable) and we will need all the different hash values indexed. Also they are hashes so there is almost no chance of any collision with some other hash. So you're basically guaranteed to get the correct result you're looking for when you query the relay by hash (maybe except blurhash, not sure how much data they store there)

Well ws (webseed) would just be the value of the url

Maybe, trying to imagine a scenario where you would want some other URL that would be different than the web seed...

Maybe it's simplest to just have an optional xs tag — what do you think?

Idk, you remove any of the flexibility of just having a plain magnet: link with custom trackers or any other supported options in the magnet link, you're not really "supporting" magnet links if you don't support all of the options.

Here is a random image uploaded to void.cat without xs and it doesnt seem to work in Transmission or on instant.io

magnet:?xt=urn:btih:daaa43e9bf35f4fe95d4174d926e7fa9544c1e46&dn=logobtc_purpletrasparent.png&tr=wss%3A%2F%2Ftracker.btorrent.xyz&tr=wss%3A%2F%2Ftracker.openwebtorrent.com&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A6969%2Fannounce&tr=http%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&ws=https%3A%2F%2Fvoid.cat%3A443%2Fd%2FHi1YEB7LoBtnsnoHFVGkzd

You need the torrent file in order to have the piecehash data, maybe this is a problem for void.cat to solve by hosting the metadata on trackers too idk, what do you think?

@fiatjaf
Copy link
Member

fiatjaf commented Apr 9, 2023

@v0l

we have limited single char tags

But we have enough, right? If we consider that each kind has different meanings for each letter (besides some canonical ones like e, p and a all other tag names are contextual I think) I think it is ok to use different tag names for each kind of hash, and we shouldn't index blurhash.

About the magnet I don't know, I think having the full contents of the magnet split between tags looks good to me, but having the full magnet too.

Ideally file hosts (yes, voidcat.io) that provide webseed should also provide the torrent file at the same path, so while constructing the magnet we could just use the same URL pattern for the xs and clients can download it from there.

@v0l
Copy link
Member

v0l commented Apr 9, 2023

Ideally file hosts (yes, voidcat.io) that provide webseed should also provide the torrent file at the same path, so while constructing the magnet we could just use the same URL pattern for the xs and clients can download it from there.

This is how void.cat does it, do you think that searching by infohash will be a thing? Will searching by any hash ever be a thing? I dont know, seems like just putting the full magnet link is a better option

I don't think hash lookup is viable

Im in agreement with @frbitten

@frbitten
Copy link
Contributor

frbitten commented Apr 9, 2023

In what cases will someone have a hash and want to look up the file that has that hash? Either someone gave me the hash to search or I already have the file and I want to see if it is already shared. If someone passed the hash they might as well pass the event id. Knowing if the file has already been shared can be a feature external to NOSTR that the client or relay implements in another way.

What if we have the same file shared on more than one link. Are we going to have the same hash in multiple events, how will the indexing be in this case? I don't know how this databases are to know if this could be a problem or not.

@lovvtide
Copy link
Author

lovvtide commented Apr 9, 2023

Idk, you remove any of the flexibility of just having a plain magnet: link with custom trackers or any other supported options in the magnet link, you're not really "supporting" magnet links if you don't support all of the options.

@v0l This is a really good point

I think having the full contents of the magnet split between tags looks good to me, but having the full magnet too.

@fiatjaf This seems like the best solution. It may result in duplicated information sometimes, but that's a small price to pay for clients not having to parse magnet links to read metadata while at the same avoiding any constraints on the usage of the full magnet spec.

In what cases will someone have a hash and want to look up the file that has that hash?

do you think that searching by infohash will be a thing?

@frbitten @v0l Yes. I'm thinking of the case where you have the file and want to query relays for event(s) containing the hash of the file to find out who published the event and/or verify it's authenticity. I think this is quite an important use case actually. For instance, suppose that all I have is a link to a file (or someone sends me a file as an attachment, or perhaps I downloaded the file from a torrent client that doesn't know about nostr) and, in any case, I want to verify that the file can be trusted. I want a nostr client to be able to download the file, compute the hash, and then query relays using that hash to find out if anyone has created an event the vouches for the authenticity of that file I just downloaded. With deep fakes and AI and all of that I imagine in the future this capability to "verify" media will only become more important. I want it to be possible to build a "trusted torrent client" using nostr that is interoperable with existing torrent clients but uses nostr as a way to get information about credibility/authenticity of files.

Are we going to have the same hash in multiple events, how will the indexing be in this case? I don't know how this databases are to know if this could be a problem or not.

Won't be a problem at all, for the same reason that many events contain, for example, the same e tag and that's fine. If a client queries relays for a hash they will get back an array of events that contain a tag with that hash.

@frbitten
Copy link
Contributor

frbitten commented Apr 9, 2023

@lovvtide But a file already published does not guarantee its reliability. You will have to do a social analysis of who published it to see if it is a trusted user or not.

If all this is automatic by the application, I understand the usefulness. But I don't see a common user calculating a hash of a file to search if it already exists in the nostr.

But if it's important for your project, I don't see a problem with being an indexable tag.

@lovvtide
Copy link
Author

lovvtide commented Apr 9, 2023

But a file already published does not guarantee its reliability. You will have to do a social analysis of who published it to see if it is a trusted user or not.

Yes, that's an important distinction. Being able to look up the author of an arbitrary file is not sufficient, on its own, to being able to trust the file.

However (regardless of how people and up doing the social analysis part) it is still a necessary component of any such system.

Like, I'm thinking of the file hash being sort of like a "primary key" that makes nostr more easily interoperable with other ways of distributing files, i.e. leaving the door open for applications in different contexts to talk to each other about that file by referencing that hash without all the applications having to know about nostr.

@fiatjaf
Copy link
Member

fiatjaf commented Apr 24, 2023

This functionality has been incorporated by NIP-94.

@fiatjaf fiatjaf closed this Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants