Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Handle Symbolic links in the Gateway #3508

Closed
wants to merge 1 commit into from

Conversation

kevina
Copy link
Contributor

@kevina kevina commented Dec 14, 2016

Per @Kubuxu @lgierth request.

Closes https://github.com/ipfs/go-ipfs-gateway/issues/6

This is one way to handle it.

Another would be to just have the Cat API call resolve the symbolic link. Another would be to have the gateway resolve the link instead of using a HTTP Redirect.

I like the HTTP redirect because it is simple and better interacts with the browsers cache. It also avoids any possible security problems from a malicious symbolic link.

@kevina kevina added the status/in-progress In progress label Dec 14, 2016
@whyrusleeping
Copy link
Member

While I do like this approach, i don't think its quite right. This will solve the problem of reading paths like foo/bar/symlink, but I don't think it correctly handles foo/symlink/bar. To do that I believe we will have to make changes to the path.Resolver code.

@kevina
Copy link
Contributor Author

kevina commented Dec 14, 2016

@whyrusleeping yeah your right, this was a first stab at the problem. Do you think coreapi.Cat should just resolve all symbolic links? Or should we be required to expand all symbolic links before calling coreapi.Cat?

@kevina
Copy link
Contributor Author

kevina commented Dec 15, 2016

I think the answer to the question is if we fix the resolver to follow the links, coreapi.Cat would have to follow them. That will solve this problem, but then how do we actually read a symbolic links contents?

@kevina
Copy link
Contributor Author

kevina commented Dec 15, 2016

So it looks like to do this in the resolver ResolveLinks (in path/resolver.go) will need to made aware of symbolic links so it can follow them. There leads to several issues:

  1. The resolver will need to know about the UnixFS Protobuf Nodes, when right now it only cares about the links.
  2. We we need some way to prevent cycles, it not just that two symbolic links can get in a cycle, but there can be an cycle with a combination of symbolic links and directories since in IPFS we can have the unix equivalent of hard links for directories.
  3. How should we handle ".."? Since there is no notion of a parent directory in the IPFS UnixFS dir, what ".." resolves too will depend on the path.
  4. Should we always follow symbolic links in the resolver? If not then we need to figure out how to make this configurable.

I see (1) is unavoidable unless we want to introduce some abstract notion of a symbolic link. For (2) we can either explicitly detect cycles or use a length limit, for (3) for now, the safest thing would be to refuse to resolve symbolic links with ".." in it. For (4) I think it should be configurable, otherwise we might resolve symbolic links in places where it might not be desirable.

@kevina
Copy link
Contributor Author

kevina commented Dec 15, 2016

Note, if we disallow ".." is the symbolic name, I think it will avoid cycles, but i am not 100% sure. Especially if the path is "/ipns" or something else where the id is not the hash of the content. If the id is the hash of the content I don't think there is a way to create a cycle as back links are highly improbable (but I suppressible possible, just next to impossible to create).

@whyrusleeping
Copy link
Member

The resolver will need to know about the UnixFS Protobuf Nodes, when right now it only cares about the links.

I was moving towards this with my refactor of the resolver that added the ResolveOnce field: https://github.com/ipfs/go-ipfs/blob/master/path/resolver.go#L40
We're going to need to do this anyways for sharding, so i think this is a good time to make that leap.

We we need some way to prevent cycles, it not just that two symbolic links can get in a cycle, but there can be an cycle with a combination of symbolic links and directories since in IPFS we can have the unix equivalent of hard links for directories.

For this, the resolver should simply count the number of symlinks its resolved so far, and error out after a certain number (I think MAXSYMLINKS is generally 32, but i'm not sure)

How should we handle ".."? Since there is no notion of a parent directory in the IPFS UnixFS dir, what ".." resolves too will depend on the path.

We should allow '..', it should be relative to the paths being resolved, and the 'root' should be /ipfs/, /ipfs/../../../../ == /ipfs/, similarly to / on unix filesystems.

Should we always follow symbolic links in the resolver? If not then we need to figure out how to make this configurable.

I think we should always follow symlinks in the resolver. (at least in a unixfs context, via the ResolveOnce function discussed above)

@kevina
Copy link
Contributor Author

kevina commented Dec 15, 2016

@whyrusleeping rather than limit the number of symbolic links, what about limiting the number of path components resolved to say 255. It might be easier to implement that way.

@whyrusleeping
Copy link
Member

On ResolveOnce:

The intent here was to provide a contextual transformation of the graph. For unixfs sharding, the raw graph looked weird, the raw paths would look something like:

foo/AB/CDshardedThing/baz
foo/D9/4Efile.mp3
foo/F2/E1otherThing/stuff.txt
foo/F2/E1otherThing/readme.md

But when browsing this in a unixfs context, we wanted it to look like:

foo/shardedThing/bar
foo/file.mp3
foo/otherThing/stuff.txt
foo/otherThing/readme.md

So to transparently perform this transformation, we are going to add context to the resolver in the form of the ResolveOnce function that defines how to resolve the next step at each layer of the path.

The dummy 'unixfs resolve once' function is here: https://github.com/ipfs/go-ipfs/blob/master/unixfs/io/resolve.go

Thinking about this in the context of symlinks makes me realize that each resolution is very likely going to want some degree of state, so that we can do things like keep track of the number of symlinks traversed during this resolution, and total path length, and so on.

To that end, i think we should change it from a ResolveOnce function, to a GetResolveStepper that returns an object (that can contain state) with a ResolveOnce method on it. Then in ResolveLinks we can create a new resolveStepper and use it iteratively to resolve the path and follow symlinks.

@kevina
Copy link
Contributor Author

kevina commented Dec 16, 2016

@whyrusleeping if we change ResolveOnce to GetResolveStepper with state then I think things will work out.

@kevina
Copy link
Contributor Author

kevina commented Dec 16, 2016

.. in symbolic links

@whyrusleeping (cc @Kubuxu and maybe @jbenet ) I think supporting .. in symbolic links is a bad idea because an IPFS directory has no notation of a parent directory, so .. is not deterministic.

Here is how it could lead to unexpected results. Let say we have the following directories:

  QmA12:
    QmA14 content/ 
    QmA15 webpage/

  QmA14:
    QmE12: executable.exe

  QmA15:
    QmW12: index.html
    QmS13: executable.exe -> ../context/executable.exe

With the idea that index.html contains a link to the executable.exe symbolic link. Something like

  ...
  <a href="executable.txt">Awesome Program</a>
  ...

Now lets say we have some more directories:

  QmA20:
    QmA21 context/
    QmA15 webpage/

  QmA21:
    QmX17: executable.exe
    

That is the directory QmA20 points to alternate content. Now then if we retrieve the web page via:

  /ipfs/QmA12/webpage/index.html

we will get the intended excutable (QmE12), but if we retrieve the web page via

  /ipfs/QmA20/webpage/index.html

we will get an alternative (QmX17, maybe with malware) executable.

If we instead use:

  /ipfs/QmA15/index.html

Then the excitable link just won't work.

Absolute paths in symbolic links

If we decide an absolute link to be rooted at /ipfs/QmHASH/ the we run into the same problem that we do with ...

If instead we define an absolute link to be rooted at / or /ipfs then we run into a problem that it will be impossible to refer to the current directory hierarchy as the hash is not known yet.

Conclusions

I am not necessary saying we should not support .. but I want to make sure everyone is aware of these potential problems.

@whyrusleeping
Copy link
Member

If Qm20 is constructed to be malicious, but some user trusts it anyways, then there are easier ways to make them download things (like linking directly to the executable).

We are supporting ".." in symbolic links.

@kevina
Copy link
Contributor Author

kevina commented Dec 16, 2016

@whyrusleeping I am fine with that.

@kevina
Copy link
Contributor Author

kevina commented Dec 16, 2016

We still need to decide if we want to handle absolute and what the semantics of them will be if we do.

@whyrusleeping
Copy link
Member

@kevina Yeah, i think @Kubuxu was saying we should error if the symlink linked above the root (i.e. /ipfs/.. is invalid). And i think that makes sense to me, sound good to you?

@kevina
Copy link
Contributor Author

kevina commented Dec 16, 2016

@whyrusleeping I'm fine with that. But I was more referring to absolute paths in symbolic links and what it should mean. For example should a link be /ipfs/HASH/afile.txt, /HASH/afile.txt', /afile.txt` or just not supported. See my notes above.

@whyrusleeping
Copy link
Member

Ah! that part, I think a symlink should be able to contain any valid ipfs path, or relative path. We should probably disallow symlinks containing ipns paths, that could definitely end poorly.

I think /ipfs/HASH is valid, but /HASH shouldnt be.

cc @Kubuxu and @lgierth for input too

@Kubuxu
Copy link
Member

Kubuxu commented Dec 16, 2016

It should be /ipfs/CID, we don't support just HASH in many places (gateway is one of them) so I think it should be the same for CID.

@ghost ghost added the topic/gateway Topic gateway label Dec 26, 2016
@mib-kd743naq
Copy link
Contributor

Hi folks!

What is the status of this PR ( and the corresponding https://github.com/ipfs/go-ipfs-gateway/issues/6 )?

I put together an example of an ipfs-ified tarball: https://ipfs.io/ipfs/QmbWphVroGYPVAkRGNHdzYJCrJHcwbd35ygvy32gBUKNpB, and the following still won't work as of writing this message: https://ipfs.io/ipfs/QmbWphVroGYPVAkRGNHdzYJCrJHcwbd35ygvy32gBUKNpB/linkz.tar.extracted/linkz/pointer/sym

Thanks!

@jcaesar
Copy link

jcaesar commented May 17, 2018

I just wanted to ask: Would it be generally possible that if a path /A/B/Sym/C/D is requested, to only resolve it up to /A/B/Sym and then throw out a redirect to /A/B/Symtarget/C/D? That would leave the problem of circles to the client. What would be the security implications of that?

License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
@kevina kevina requested a review from Kubuxu as a code owner May 20, 2018 22:32
@ghost ghost assigned kevina May 20, 2018
@kevina
Copy link
Contributor Author

kevina commented May 22, 2018

@jcaesar that could work, but it doesn't really simply much, it might makes things more complicated

I am working on this now. Getting the resolver to support symbolic links is messy and I had several false starts. Getting it to return enough information so that the symbolic links can be handed outside of the resolver is equally messy. Will keep working on it until I get something I am happy with.

@kevina
Copy link
Contributor Author

kevina commented May 22, 2018

As far as absolute paths, by forcing a /ipfs/CID prefix it makes a it impossible for a symbolic link to refer its own, or a parent directory without resorting to using ...

I still think .. is problematic for reasons stated in an earlier comment: #3508 (comment) (but to be clear will implement them anyway).

As a longer term solution there should be a way for a symbolic link to refer the a parent directly. One idea is to assign each parent a random id that is stored with the directory, when traversing a directory the resolver remembers this id so you can refer to it using a special namespace, maybe /self/ID/....

@kevina
Copy link
Contributor Author

kevina commented May 30, 2018

After talking with @whyrusleeping we agreed this will likely involve some API changes, I will figure out what those all and likely propose it separately.

@kevina kevina mentioned this pull request Jun 1, 2018
8 tasks
@lidel lidel marked this pull request as draft September 15, 2021 15:02
@lidel
Copy link
Member

lidel commented Sep 15, 2021

Closing as this was created before we had proper security isolation on the web (subdomain gateways, unique origin per content root).

Continued in #8437 (comment)

@lidel lidel closed this Sep 15, 2021
@lidel lidel deleted the kevina/gateway-symlink branch September 15, 2021 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/in-progress In progress topic/gateway Topic gateway
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants