Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP-96 - HTTP File Storage Integration #547

Merged
merged 51 commits into from
Jan 8, 2024

Conversation

arthurfranca
Copy link
Contributor

@arthurfranca arthurfranca commented May 21, 2023

Read NIP text here

This uses http with NIP-98 http auth instead of websockets and regular nostr infrastructure. It does so to avoid using base64 .content on events. There is previous discussion regarding positive and negative points on the original NIP-95 PR.

Check line of thinking here, here, here and here.

@arthurfranca arthurfranca force-pushed the nip-95-contender branch 2 times, most recently from cddb7a0 to dc8b808 Compare May 21, 2023 22:28
@Egge21M
Copy link
Contributor

Egge21M commented May 22, 2023

This is great! I wonder if we can weave in ZapGates. Right now uploading flow is left to the implementers.

@arthurfranca
Copy link
Contributor Author

arthurfranca commented May 22, 2023

NIP-95 [...] Others custom form data fields may be added depending on specific relay support

@Egge7 for your ZapGates implementation add to "Creating Zap Gated Resources (Creator)" that on upload (using NIP-95) the client would send an extra form data field nip60 with the following stringified json as value:

{
  "pubkey": <public key that will be used to sign the 1211 event, if not the same of the uploader>,
  "tags": [
    ["m", <MIME type>], // optional, if can't be extracted from file
    ["amount", <price in SATS>],
    ["relays", <list of relays>], // list of relays that should be used for zapping
    ["preview", <...>], // this is needed to generate the unsigned 1211 event with specific id
    ["??", ??] // payment destination according to NIP-57
  ]
  "content": <description> // this is needed to generate the unsigned 1211 event with specific id
}

And the success response would have the unsigned 1211 event inside an extra nip60 key like this:

{
  nip95: {
    // SHA-256 hash of the file, as normal for NIP-95
    x: "719171db19525d9d08dd69cb716a18158a249b7b3b3ec4bbdec5698dca104b7b"
  },
  nip60: {
    unsigned_event: {
      // ...1211 event without sig key
      // it will include the "u/url" tag just like NIP-95 expects it to be (https://relay.domain/nip95/<sha256-file-hash>)
    }
  }
}

NIP-95 [...] Relays must make available the route path /nip95/<sha256-file-hash> with GET method for file download.

For "Accessing a Zap Gated Resource (User)" the url of the file would be the same as the NIP-95 one. The only difference is that NIP-98 Authorization header would be required.

@fiatjaf
Copy link
Member

fiatjaf commented May 22, 2023

@vitorpamplona

95.md Outdated Show resolved Hide resolved
95.md Outdated Show resolved Hide resolved
95.md Outdated Show resolved Hide resolved
95.md Outdated Show resolved Hide resolved
@vitorpamplona
Copy link
Collaborator

A few questions/observations:

  1. How do we tag these files in kind1's? And how do we zap/report these files directly?
  2. Looks like files uploaded here still need a FileHeader with the metadata from NIP95. NIP-94's header is not ideal, since NIP-94 is bound to a fixed URL. Here, clients need to search for the file in multiple relays. A different kind for the header is helpful to make that distinction.
  3. Files are not signed Nostr Events, correct? How do we know which Nostr pubkey is the owner of this file?
  4. Why are we calling these servers relays if we are not re-using anything from the usual relay infrastructure?
  5. Delete should be a regular Nostr Deletion event with a tag to the hash of the file.
  6. Without an owner, no idea how this can be achieved: The relay should reject deletes from users other than the original uploader.

These tags don't make much sense. The whole point of NIP-95 is to not be bound to URLs.

[
  ["url", "https://relay.domain/nip95/719171db19525d9d08dd69cb716a18158a249b7b3b3ec4bbdec5698dca104b7b"],
  ["x", "719171db19525d9d08dd69cb716a18158a249b7b3b3ec4bbdec5698dca104b7b"]
]

Do this, instead:

[
  ["nip98", "719171db19525d9d08dd69cb716a18158a249b7b3b3ec4bbdec5698dca104b7b", "relay.domain"]
]

At first glance:

  1. This NIP feels like just a standardization of uploading and downloading APIs for any file. It fails to provide/participate in the usual event graph of Nostr, which certainly reduces its usability/composability within the network.
  2. Since it won't use NIP-42 AUTH, we might need another auth procedure.
  3. Clients should use multiple relays to download these files to avoid pixel/IP tracking by file relays.
  4. Lack of zapping and reporting tools for image hosts makes this NIP less interesting than NIP-95.

@staab
Copy link
Member

staab commented May 22, 2023

100% agree with Vitor, and adding a new file header event kind should solve all mentioned problems. This PR is a step in the right direction though. Would you like me to draft one? I have a pretty concrete idea of what would synthesize the two approaches.

@vitorpamplona
Copy link
Collaborator

100% agree with Vitor, adding a new file header event kind should solve all mentioned problems.

Keep in mind that just a header adds a loophole for illegal files. You can store data without the header and without an owner. I would suggest doing a usual Nostr event with signature, pubkey, etc to make sure we can always trace signatures. Relays can then filter uploads from reputable keys and drop random ones from the system.

This PR is a step in the right direction though. Would you like me to draft one? I have a pretty concrete idea of what would synthesize the two approaches.

I am happy to support if somebody wants to take this approach forward and actually code it.

But I completely disagree that this is the "right direction" to go. NIP-95, as is, is better than this approach (even with the new header).

@staab
Copy link
Member

staab commented May 22, 2023

But I completely disagree that this is the "right direction" to go.

Interesting, can you elaborate? Maybe I'm not being clear, I see this (i.e. the header event + file) as very similar to NIP 95, just with some HTTP endpoints to support usage better.

@vitorpamplona
Copy link
Collaborator

can you elaborate?

  1. Since I coded both, I don't agree an HTTP API is better than Websockets, even for "larger" video files.
  2. I prefer to have a Nostr event for it. Hashed files alone miss the composability of Nostr.

However, don't let my experience close the door in this if you folks feel strongly about it.

@staab
Copy link
Member

staab commented May 22, 2023

I agree with point #2, have you written about what you discovered about serving files over websockets? Have you figured out caching (especially multi-AZ) using that approach?

@arthurfranca
Copy link
Contributor Author

@vitorpamplona

I think you missed the NIP-98 requirement. The file will be owned by a pubkey because of it (the server must link them). But you are right in that I must edit the delete part to make it clear that the relay should soft-delete (mark as deleted) if other users "own" the same file.

As the hash is guaranteed to be the same if it is the same picture for example, nothing stops another user from uploading the same file to the same relay which will end up with the same identifying hash which is good to save space.

Lack of zapping and reporting tools for image hosts makes this NIP less interesting than NIP-95.

In addition the the user being able to create NIP-94s, nothing stops the relay from creating NIP-94s too. I think this step should be optional, as users creating NIP-94s is already a way of spreading the file existence. It allows it to be zapped etc

@vitorpamplona
Copy link
Collaborator

Have you figured out caching (especially multi-AZ) using that approach?

We are playing with a simple AWS S3 Cross-Region Replication (CRR) scheme. It seems to be working. Each region has a relay that connects to its own S3 instance. But I am sure Cloud Architects can do better.

@arthurfranca
Copy link
Contributor Author

Today we have 3 nostr focused HTTP file storage services that I know: void.cat, nostrimg and nostr.build. They don't have any websockets code as far as I know.

We should get their feedback regarding what they prefer. I will poke them at their repos.

@arthurfranca
Copy link
Contributor Author

Keep in mind that just a header adds a loophole for illegal files. You can store data without the header and without an owner. I would suggest doing a usual Nostr event with signature, pubkey, etc to make sure we can always trace signatures.

The NIP-98 authorization event should be enough.

Clients should send a GET request to the relay url in the format https://relay.domain/nip95/<sha256-file-hash>?r=relay.domain2&r=relay.domain3.
If the relay doesn't have the file anymore, it should issue a 302 http redirect to the next relay included as r query param. For example, issue a redirect to https://relay.domain2/nip95/<sha256-file-hash>?r=relay.domain3

@vitorpamplona what's your opinion on this? This redirect feature is a way to make an url be useful for longer, making relays cooperate automatically in an easy way. How would something similar be possible with original NIP-95?

95.md Outdated Show resolved Hide resolved
95.md Outdated Show resolved Hide resolved
@arthurfranca
Copy link
Contributor Author

@v0l a question for you as a file storage provider author: Which NIP-95 version you may implement support to? This NIP-95 v2 using HTTP or original NIP-95 that needs websockets and nostr event handling or neither of them?

Same question to @michaelhall923 and @ng5jr if you see this.

@arthurfranca
Copy link
Contributor Author

I re-read what @vitorpamplona said about being able to zap the server/report the file hosted on server and that he thinks NIP-94 event created by the server, that of course would require the server have a nostr account and use websockets if it wishes, isn't enough for that.

I think he means it isn't good to make a GET url available for the file. Instead, he wants the file server to just respond with an event id (a mixture of NIP-94 and NIP-95 one, with the proof of who uploaded embedded). No static url will be available for download ever. The server is the gate keeper.

Then this said event (with base64 .content - or a temporarily available dynamically generated url each time if a kind+dtag instead of event id was used when requesting) would be the only way to access the file.

So although the user sent the file, he can only mention the nevent/naddr of the event whose author is the file storage server. Also, the user can delete it (which would need this PR approved as the user isn't the event author). Is that correct?

@vitorpamplona
Copy link
Collaborator

The server can create the event with all the data, the author of the image can create the header event. Or both users can be tagged on both events so that a zap gets split between the two. I am not sure which one is best yet.

But I do think there must be a PubKey for the data event.

@arthurfranca
Copy link
Contributor Author

@vitorpamplona ok got it all. Your monetizing scheme is different than @staab's way. The former stores files for free and then gets eventually paid through optional zaps from viewers, while the latter may ask for money upfront before authorizing the upload.

I will rename this PR, NIP number and stop using "relay" word as it will just be a spec for an unified API to follow for connecting to nostr focused HTTP Storage Servers and hopefully for the redirect cooperation between them. These servers will know the pubkey of the uploader (through NIP-98 authentication) for accountability but other than that, they won't be required to learn how to implement a nostr relay.

I think your idea could be either:

  • an extension to this NIP (a new NIP that adds an alternative upload url "/nipXX" with rules regarding creation of the nostr event you want etc)
  • OR use original NIP-95 directly as the server is already expected to be a relay (knows websockets)

@v0l
Copy link
Member

v0l commented May 23, 2023

@v0l a question for you as a file storage provider author: Which NIP-95 version you may implement support to? This NIP-95 v2 using HTTP or original NIP-95 that needs websockets and nostr event handling or neither of them?

Same question to @michaelhall923 and @ng5jr if you see this.

I prefer this and would implement it for void.cat very easily

@arthurfranca arthurfranca force-pushed the nip-95-contender branch 2 times, most recently from 1a6f8d4 to eab5c61 Compare May 23, 2023 21:38
@arthurfranca
Copy link
Contributor Author

I've added some details gathered on feedback. Added mostly extra explanation and also optional file extension to the download/delete routes which is helpful for clients consuming these urls.

NIP-98 merge is a requirement before merging this.

@arthurfranca arthurfranca changed the title NIP-95 - File Storage v2 NIP-96 (Former NIP 95 - File Storage v2) May 23, 2023
@melvincarvalho

This comment was marked as spam.

@vitorpamplona
Copy link
Collaborator

@fishcakeday

I am not sure what happened but since the blurhash became available, the x now doesn't match anymore. For the file below, I am getting the hash e38f98432dd19a7eec7737d036c461039ebe445e33f81297b27395b1e1887c56

{
  "tags": [
    [
      "url",
      "https://image.nostr.build/4af6804ee79aec6ca752a8d2b5651574451dc62a1116f17fa3cb28332166e707.png"
    ],
    [
      "ox",
      "4af6804ee79aec6ca752a8d2b5651574451dc62a1116f17fa3cb28332166e707",
      "https://nostr.build"
    ],
    [
      "x",
      "4af6804ee79aec6ca752a8d2b5651574451dc62a1116f17fa3cb28332166e707"
    ],
    [
      "m",
      "image/png"
    ],
    [
      "dim",
      "24x24"
    ],
    [
      "bh",
      "LLQ7RM$%R-$%_wjbR+ah4ff80iR+"
    ],
    [
      "blurhash",
      "LLQ7RM$%R-$%_wjbR+ah4ff80iR+"
    ]
  ],
  "content": ""
}

@fishcakeday
Copy link

@vitorpamplona this is due to you uploading the same file and deduplication returning what we have in DB, original hash. If you were to try to upload a new file that we do not already have, you would have gotten a correct “x”. We do not store hash for the optimized version of the file in DB, and it would require for us to do round-trips to S3 to give you that on duplicate uploads.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Dec 10, 2023

Can we change that? I know it sounds weird but a lot of people upload the same image twice. They start drafting something, upload an image, then cancel because they need to do something else, then they write again and upload the same image again. This seems to happen a lot. It would be nice if the API replied with the right hash or none at all. In that way, it avoids the confusion of just pasting the result into a non-valid NIP-94.

Also, I think I shouldn't generate random images to put into my test suite. It would just fill you with useless files. So, I keep using the same one in the expectation the server replies a result similar to a new one.

@fishcakeday
Copy link

Can we change that? I know it sounds weird but a lot of people upload the same image twice. They start drafting something, upload an image, then cancel because they need to do something else, then they write again and upload the same image again. This seems to happen a lot. It would be nice if the API replied with the right hash or none at all. In that way, it avoids the confusion of just pasting the result into a non-valid NIP-94.

Also, I think I shouldn't generate random images to put into my test suite. It would just fill you with useless files. So, I keep using the same one in the expectation the server replies a result similar to a new one.

Let me take a look and see how easily I can do that. I sure can remove “x” when we don't have one, but I would also prefer to return the correct hash. Thanks.

@fishcakeday
Copy link

@vitorpamplona it's implemented. Any new uploads and subsequent duplicates will get sha256 of the transformed file. Anything duplicates from the uploads prior to now, will/should get nothing.

@arthurfranca
Copy link
Contributor Author

Guys I turned ["ox",<Hash SHA-256>, <HTTP server URL 1>] into just ["ox",<Hash SHA-256>] ok?

Because I now think the kind:10096 events are a better tool to discover new upload servers, similar to how kind:10002 events from NIP-65 are better to discover new relays than checking e tag relay hints. Considering the download url is present, we don't need the hint inside the ox tag.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Dec 11, 2023

I wonder if we should upload every image to all servers in the kind:10096 event. That would create some nice redundancy to our users.

Then the NIP-94 event can have the following structure:

["nip96", "<server1 base address (not the download URL)>", "<m>", "<x>", "<dim>", "<size>"]
["nip96", "<server2 base address (not the download URL)>", "<m>", "<x>", "<dim>", "<size>"]
["nip96", "<server3 base address (not the download URL)>", "<m>", "<x>", "<dim>", "<size>"]
["ox", "<original hash>"]
...

This would allow 3 servers to store 3 different image sizes in 3 different formats and the client can just pick the first or the best one among them. They should be all similar, but each one could be hash-checked at will.

@arthurfranca
Copy link
Contributor Author

arthurfranca commented Dec 12, 2023

Don't know it feels overkill. Allow me to give my looong opinion:

As you said, multiple uploads of the same file are great for redundancy when a url is broken. Focusing on redundancy, the goal isn't to give a client the possibility to choose a file version based on byte size/mime type/etc or else we won't know e.g. to what picture a user is reacting or commenting.

Are the xs for the alternative file versions that important? I mean, when the image xs are different there is no way to tell they are the same picture. So its probably better that all clients consider just a single x as the real one.

The dim(ensions) are used just once as fast as possible to prepare a placeholder and it shouldn't change to other dim (to prevent layout shift) if the client end up needing to fallback to another image version. Atleast the aspect ratio tend to be the same and the fallback image won't get distorted (maybe just blurry if it has smaller dimensions).

After changing my mind many times, for multiple uploads I think it may be good to add just 1 url tag for speed. The ox tag paired with author's kind:10096 event is enough to dynamically generate other alternative download urls if the url value gets rotten.

@vitorpamplona
Copy link
Collaborator

Are the xs for the alternative file versions that important?

The idea is that the author would have reviewed and agreed to the server modifications before sending the event. So, if x matches the individual URL, anyone reading the NIP-94 can be confident this is the same version the author saw and approved when writing the event. The server has not made further unapproved changes.

Yes, the client can still seek the same image in other servers, but it wouldn't be able to verify if the modified version has been approved by the author or not. For instance, an Ad server can just create a version of every ox with their logo on every image.

@arthurfranca
Copy link
Contributor Author

anyone reading the NIP-94 can be confident this is the same version the author saw and approved when writing the event.

Right. If it is the only problem, adding multiple x tags to represent the only hashes the author approved would be enough. If the loaded image version matches any of them its all good.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Dec 12, 2023

Amethyst is now NIP-96 exclusive: No other image server apis are used (NIP-95 is still available for files < 100KB).

@vitorpamplona
Copy link
Collaborator

I think this should be merged. We have several implementations using it already.

Any last changes @staab @v0l @fiatjaf @arthurfranca @fishcakeday ?

@quentintaranpino
Copy link

For my part all good (even though you didn't quote me 😂😂)

@v0l
Copy link
Member

v0l commented Jan 8, 2024

All good imo, already implemented on void.cat and snort.social

@fiatjaf fiatjaf merged commit b0e6c01 into nostr-protocol:master Jan 8, 2024
@pcfreak30
Copy link

I am doing research right now on P2P protocols, and I'm glad to see a spec specifically for file storage.

The only concern I have with this as-is, is that it doesn't seem to handle multipart uploads well/large data. Think of things like the tus.io upload protocol.

My project is creating data storage support for many P2P nets (and planning for social nets too).

A simple post upload creates limitations on large data.

I found this just now while researching, so late to the party, so-to-speak, but that is my feedback based on my R&D efforts this year, so far.

@vitorpamplona @arthurfranca @fiatjaf

@quentintaranpino
Copy link

All good imo, already implemented on void.cat and snort.social

Hi Kieran

Do you think it would be possible to implement it in snort but allowing the user to choose which server he wants? NIP96 exists precisely to let the user decide, otherwise I don't see much point.

@quentintaranpino
Copy link

All good imo, already implemented on void.cat and snort.social

Hi Kieran

Do you think it would be possible to implement it in snort but allowing the user to choose which server he wants?
NIP96 exists precisely to let the user decide, otherwise I don't see much point.

@vitorpamplona do you think you could do the same in Amethist, everyone who installs the server asks me the same question, I can only say Coracle and Nostur.

In my opinion the nature of NIP96 was to give freedom to the user, do you see it possible?

Thank you! 😀

@arthurfranca arthurfranca deleted the nip-95-contender branch May 9, 2024 15:48
@pcfreak30
Copy link

Hello, following up since no one seemed interested in responding to my request for feedback @arthurfranca.

This spec does not seem to answer how large files are handle in a multipart fashion, which would also cause problems with uploading if you need excessively large files as a server could not hold GB's to a TB of data in ram or temp storage.

Ideally you could just support TUS, which is working on to standardize things with the IETF. But as it stands, this would definitely limit things unless you deviated from the spec.

Thoughts? Thanks!

@fishcakeday
Copy link

This is a very good point, and TUS is not the only way to do it, too. TUS is definitely wide-spread and used across many places though, but there is also S3 like multipart upload. Maybe we could add an extension that “tells” the client how to upload a file based on the size and type? Pre-query of a sort or part of the JSON in the well-known path?

@pcfreak30
Copy link

This is a very good point, and TUS is not the only way to do it, too. TUS is definitely wide-spread and used across many places though, but there is also S3 like multipart upload. Maybe we could add an extension that “tells” the client how to upload a file based on the size and type? Pre-query of a sort or part of the JSON in the well-known path?

This is a very good point, and TUS is not the only way to do it, too. TUS is definitely wide-spread and used across many places though, but there is also S3 like multipart upload. Maybe we could add an extension that “tells” the client how to upload a file based on the size and type? Pre-query of a sort or part of the JSON in the well-known path?

I have already implemented TUS in my project, and as TUS is trying to contributing a standard upstream, not sure why we should try to do https://xkcd.com/927/ 🙃 .

And im well aware on how s3 supports multipart, but again, there is an entire tus ecosystem that you can plug in. I have implemented tus for another network and doing so here would be easy given that.

My opinion is you just need to define how tus should be integrated, api wise. One thing another proto did was pass a metadata with tus for the hash it cared about, so the sha256 can be provided that way since TUS creates its own identifiers.

Thoughts? @fishcakeday.

@arthurfranca
Copy link
Contributor Author

@pcfreak30 here is why we didn't add tus support at that time.

I think it could be a good addition.

@pcfreak30
Copy link

@pcfreak30 here is why we didn't add tus support at that time.

I think it could be a good addition.

Cool, how would you like to move forward with adding to this?

@arthurfranca
Copy link
Contributor Author

I haven't dug into TUS enough. NIP-96 server builders (@fishcakeday @v0l @quentintaranpino) should know better if it is a good idea to pursue this. Don't remember how easy it would be for clients and servers to implement it.

But from what I remember, it could be good adding it for a new "advanced" (and optional) route to upload multiple files at once, with resumability support. But would it be more of a paid account feature? I mean, resumability of upload is a good feature when dealing with big files, which would be probably restricted to paying users (if user is not hosting their own NIP-96 server).

Openning a new PR with above mentioned route addition to NIP-96 and asking for those 3 guys opinion is what I suggest.

@pcfreak30
Copy link

I haven't dug into TUS enough. NIP-96 server builders (@fishcakeday @v0l @quentintaranpino) should know better if it is a good idea to pursue this. Don't remember how easy it would be for clients and servers to implement it.

But from what I remember, it could be good adding it for a new "advanced" (and optional) route to upload multiple files at once, with resumability support. But would it be more of a paid account feature? I mean, resumability of upload is a good feature when dealing with big files, which would be probably restricted to paying users (if user is not hosting their own NIP-96 server).

Openning a new PR with above mentioned route addition to NIP-96 and asking for those 3 guys opinion is what I suggest.

As a future hosting provider, There is really no reason to make it paid IMHO. only thing that matters is actual data usage.

But thanks, I will come back to this then when my project actually needs to put the R&D time onto it. For now its more of pre-research since any funding for me doing this would be at 2025 minimum.

Kudos!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.