Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Suggestion] - CID deny / allow API #7871

Closed
obo20 opened this issue Jan 20, 2021 · 24 comments
Closed

[Feature Suggestion] - CID deny / allow API #7871

obo20 opened this issue Jan 20, 2021 · 24 comments
Assignees
Labels
kind/feature A new feature need/analysis Needs further analysis before proceeding need/maintainers-input Needs input from the current maintainer(s)

Comments

@obo20
Copy link

obo20 commented Jan 20, 2021

During the "IPFS / IPLD Security & Encryption Workshop" that came out of ipfs/roadmap#65, it was discussed that many projects would benefit from a generic API that IPFS can call before serving a block over Bitswap. The purpose of this API would be to allow project to define their own privacy / permission controls within IPFS.

The proposed solution would simply be an option within IPFS that allows the node owner to provide an API to call before each block gets served (possibly also advertised). This API would simply return "true" or "false" depending on whether or not the block should be served. This API could be local or external depending the node's performance needs.

The API would take in the following as parameters:

  • the CID being requested to serve
  • any headers that were provided with the content request. These headers would need to be relayed from the original requesting node to the host node with the content (see https://www.npmjs.com/package/ipfs-http-client#custom-headers for an example of how these headers could be passed in initially).

It would be up to the node owner to figure out how to return "true" or "false" to this request.

The idea here is that with a generic API, the IPFS dev team doesn't have to dedicate time / resources to determining an IPFS content permission system that works for every possible use case. They can instead let teams decide what works best for them and keep the complexity out of the go-ipfs repo.

Created at the request of @willscott in: ipfs/roadmap#65. cc @aschmahmann

Additional Resources:

@obo20 obo20 added kind/feature A new feature need/triage Needs initial labeling and prioritization labels Jan 20, 2021
@willscott
Copy link
Contributor

In the cases where a node joins a bitswap session with one peer and in the process learns about additional connections to make, it's connection to the node asking for bitswap content may not directly have HTTP headers. What headers (and where have the originally come from?) are you hoping to use for making this decision?

@obo20
Copy link
Author

obo20 commented Jan 20, 2021

For context, our backend currently uses the js-ipfs-http-client for our requests to go-ipfs. With this library you can pass in custom headers: https://www.npmjs.com/package/ipfs-http-client#custom-headers

The concept is that these headers could take the form of any string.

  • It would be up to the host node's deny / allow api to decide what key / value form these headers need to be (they may not even require headers and only care about the CID).
  • It would be up to the requester to know what form the headers should take depending on where their content is stored.

I suppose there could be a situation where two nodes have the same content and have separate policies for what's required to serve that content. In those scenarios, I wouldn't expect the host node that cannot serve the content to either:

  • Say they don't have the content that's being requested
  • Say they have the content but that they're not authorized to serve that content

How this would work within the context of bitswap is a little outside my scope of expertise, but those are the general flows I had imagined.

@carsonfarmer
Copy link

This proposal is interesting from the perspective of IPFS hosts and others provider services on top of IPFS. I think the main benefit here is that it provides a means for an IPFS host provider to "plug in" their own Bitswap "rules" that can protect them, while also being useful for implementing private networks and other security conscious scenarios. While I'm not yet married to any particular implementation or API spec, something that allows one to run a node with a custom Bitswap "interceptor" would be ideal. The interceptor function would then receive some context, be it Bitswap request, some headers, whatever. Headers might not be the right wording here. In fact, it might require updates to the Bitswap protocol to accept an additional "extentions" map of some kind?

@Stebalien Stebalien added need/review Needs a review need/maintainers-input Needs input from the current maintainer(s) and removed need/triage Needs initial labeling and prioritization need/review Needs a review labels Feb 12, 2021
@aschmahmann aschmahmann self-assigned this Mar 15, 2021
@BigLep BigLep added the need/analysis Needs further analysis before proceeding label Mar 22, 2021
@BigLep
Copy link
Contributor

BigLep commented Mar 22, 2021

Notes from 2021-03-22 discussion:

This should maybe be two issues?

  1. I refuse to server the following illegal content
  2. I only want to serve this content to you.

This will likely get closed and split.

@hughsie
Copy link

hughsie commented Mar 22, 2021

I only want to serve this content to you.

My use case is "I can't allow connections from Syria or Iran due to export control restrictions"

At the moment I'm using GeoIP blocking of the entire machine, but this really should be configured per-object as not all of them fall under export control restrictions.

@RubenKelevra
Copy link
Contributor

This is also required to share data between nodes of friends and own nodes, as described in ipfs/roadmap#78

@RubenKelevra
Copy link
Contributor

RubenKelevra commented Mar 22, 2021

The question comes to mind, how should a node publish this information to the DHT? If there's any restrictions, say to share a file only with one other node, but the file is very popular, the node might get a lot of requests which will be denied.

Maybe add a flag to the DHT entries if a CID has any access restrictions, to allow other nodes to choose first the nodes which might not restrict the access.

@aschmahmann
Copy link
Contributor

aschmahmann commented Mar 22, 2021

Yep, @BigLep I'm pretty sure this needs to be broken out into separate issues. For example,

  • go-ipfs + go-bitswap feature request for an API to be able to accept/deny Bitswap requests based on information we have about the request (e.g. peerID, connection info, CID)
  • Bitswap spec request to be able to pass some access token along with requests (that could then be used by the API above)

Maybe add a flag to the DHT entries if a CID has any access restrictions, to allow other nodes to choose first the nodes which might not restrict the access.

Sure, adding more features to provider records makes sense. Although it's mostly independent from this issue since we already need more information. For example, what if someone only has part of a large DAG instead of the whole thing? We might want to mark that in the provider record entry. IMO it's related to, and would likely occur at the same time as, libp2p/go-libp2p-kad-dht#584.


Note: I've also seen requests (although I'm having trouble locating the issue ATM) for client-side deny lists. For example, users or public gateways that just want to avoid downloading certain content even as a transitive dependency of another graph. That would be a separate feature request, this one seems to be focused on server-side filtering.

@csuwildcat
Copy link

The ability to have a hook between request of a CID and serving the data out for that CID is something that would help us a lot at Microsoft, as well as others in the Decentralized Identity Foundation. Even something as simple as an inbound async hook where you can get the CID being requested, do some eval in a function, and return true/false as to whether it should be released would be a big help.

@ianopolous
Copy link
Member

We've thought through how we would use this in Peergos more now, and and we're pretty sure we can get full post-quantum fine-grained capability-based access control to ciphertext with a single (cid specific) auth string that bitswap sends with each request. The receiver then passes (cid, requestor nodeId, auth string) to the API as above. This should also be pretty general as long as the length limit on the auth string isn't too low (you could always encode multiple headers into it if you wanted).

@ianopolous
Copy link
Member

@momack2 Would Protocol Labs be interested in funding this? If so, we could do the work to extend bitswap, and add an external allow(cid, nodeId, auth) call. @csuwildcat Would Microsoft be interested in funding it?

@momack2
Copy link
Contributor

momack2 commented Oct 14, 2021

Thanks for the ping, Ian! Adding @autonome on that question. Note the Cloudflare team recently OSS'd their gateway operations tooling for handling allow/deny lists on their infra, which may also be pushing in similar directions

@ianopolous
Copy link
Member

Thank you, @momack2 Yep, I've looked at Cloudflare's work, though I think it's mostly orthogonal to this work.

@csuwildcat
Copy link

@momack2 I also took a look at that part of Cloudflare's stack and it's not really the same thing as what this API would be. What we want here is a simple, native IPFS API that runs every incoming CID resolution through an async function that can run user defined logic to determine whether or not it wants to respond with the data that backs a given CID. It's a rather fundamental config/core hook that would enable the creation of an unlimited number of boundaries between logical groups of CIDs in a node and the outside world. For our personal datastore use, it's an absolute must that we have a way to virtually group and filter the decision to service requests for CIDs/data to the wider network.

@ianopolous
Copy link
Member

@momack2 @autonome Don't worry about this any more. We've finished implementing it - including the bitswap extension to add an auth string and the customisable allow(cid, peerID, auth) function. The bitswap extension happens to be backwards and forwards compatible with existing instances too.

Finally, block level privacy on IPFS! This (used correctly) removes the biggest legitimate criticism of private storage applications on IPFS compared to centralised services.

@csuwildcat
Copy link

@ianopolous will you be doing a PR that includes this?

@momack2
Copy link
Contributor

momack2 commented Oct 31, 2021 via email

@aschmahmann
Copy link
Contributor

aschmahmann commented Oct 31, 2021

@ianopolous thanks for your work here! Would definitely love to have an implementation of Bitswap that works well with auth/token based access beyond just peerIDs.

If there's some open source repos I can look at, or generally a proposal for how you'd like to modify the Bitswap wire protocol that'd be great.

The bitswap extension happens to be backwards and forwards compatible with existing instances too.

I'm a little concerned about what this means. For example, if you're taking advantage of protobuf optional fields for your extensions then compatibility requires agreement on which field numbers are reserved and what they mean. I wouldn't want there to be some collision in a future iteration of Bitswap that makes upgrading painful for users of your fork.

If we can figure out an upstream then we won't have to worry too much about that though 😄.

@ianopolous
Copy link
Member

Hi @momack2 , sorry for the delay - I have a newborn to look after. Thank you for the generous offer! I think it makes sense to fully integrate it into Peergos first to make sure it 100% satisfies our needs before considering the extra work and re-licensing necessary to upstream it.

@aschmahmann , My understanding is that there aren't any major changes planned for bitswap on the protobuf level, and that the plan is rather to migrate to graph-sync? If there are changes then we can just agree not to overlap each other's protobuf indices as you say. It's also not the end of the world if we just end up in a fork either. We have also changed some of the core interfaces around getting blocks (so we can use the type system to make some security guarantees) which would be much harder to integrate into go-ipfs than ipfs-nucleus (our super-minimal drop-in ipfs replacement), which took only 3 hours.

@aschmahmann
Copy link
Contributor

aschmahmann commented Nov 9, 2021

and that the plan is rather to migrate to graph-sync?

I don't think it's a good model to think of graphsync as "the next gen bitswap" but as a different data transfer protocol focused on moving DAGs around rather than blocks. Both types of protocols have their utility. As a simple (if slightly esoteric) example block based transfer can work without the server side understanding the IPLD codecs of the data, whereas DAG based transfer cannot.

My understanding is that there aren't any major changes planned for bitswap on the protobuf level

I don't know what "major" means. There are currently no PRs to the specs repo proposing modifications to the Bitswap protocol., but as you've noticed in this issue, and some previous ones, having authentication within the Bitswap protocol is something that gets requested. So having a protocol change here seems like fair game.

We have also changed some of the core interfaces around getting blocks (so we can use the type system to make some security guarantees) which would be much harder to integrate into go-ipfs than ipfs-nucleus (our super-minimal drop-in ipfs replacement), which took only 3 hours.

I'm less worried about this. The important thing is the protocol change, if plumbing through the existing libraries is hard then that's ok we're allowed to have multiple implementations of the same protocol 😄.

I suspect the spec change's here are pretty easy/straightforward to discuss in a spec PR. A little bit of iteration on this with some folks last week generated some simple options like:

  1. Adding a bytes token field to the WantList Entries
  2. Making a new version of the WantList Entries that are grouped by token

and adding new responses to the BlockPresenceType the divulge the correct amount of information to the client rather than just Have or DontHave.

@ianopolous
Copy link
Member

I don't think it's a good model to think of graphsync as "the next gen bitswap"

@aschmahmann Isn't the degenerate case of graph-sync exactly bitswap? The case where the selector is just for a single block?

We've gone with the first option which is to include an optional auth string with each want request (happy to share the protobuf). This is the most powerful and allows us to keep the whole thing totally stateless so that requests can be authorised or denied with nothing but the block itself. This means we can also maintain the auto-scaling properties of IPFS where anyone who has a block can also serve it up, applying the same auth scheme.

We decided not to add any new return types to not leak whether or not we have a block (this clearly needs to be coupled with only providing the roots).

There are some subtle vulnerabilities specific to the bitswap architecture that need to be guarded against too.

@ianopolous
Copy link
Member

We've fully integrated this into Peergos now, and it works great! I've submitted a corresponding spec change (after discussion with @aschmahmann) to what we settled on here:
ipfs/specs#260

Our auth is 89 bytes, post-quantum and capability-based using S3 V4 signatures. Note that our allow function ended up requiring the block data as a parameter as well, though you could make that optional for other auth schemes.

@lidel
Copy link
Member

lidel commented Oct 30, 2023

A minimal implementation of IPIP-383 from #10161 landed in master branch and is scheduled to be released in Kubo 0.24-rc1 for feedback. More details in /docs/content-blocking.md and IPIP-383

@lidel lidel closed this as completed Oct 30, 2023
@ianopolous
Copy link
Member

For other people looking for this, it is also implemented in Nabu now (both authed bitswap and the http allow/deny API).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A new feature need/analysis Needs further analysis before proceeding need/maintainers-input Needs input from the current maintainer(s)
Projects
None yet
Development

No branches or pull requests