Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic handling of Windows-style path prefixes #91

Closed
wants to merge 1 commit into from

Conversation

kubkon
Copy link
Contributor

@kubkon kubkon commented Aug 4, 2019

I'm hoping this PR will trigger an interesting discussion on handling of Windows-style path prefixes and paths in WASI. I didn't know how to best approach this topic, so to make a point I've decided to go ahead and propose one of many (draft) solution to this problem. A bit of context before digging in; the problem was originally spotted by @mrowqa and part of the discussion can be found here bytecodealliance/wasmtime#235, and essentially deals with the (by now probably infamous) problem of how to best handle Windows paths.

From what I understand, libpreopen was written specifically for POSIX-style hosts and sticks to the convention that / is the path separator and the root. This is all good on Windows as long as we stick to either of the following for the preopens:

  • C:/some/path
  • /some/path

The more traditional format of C:\some\path unfortunately triggers "capabilities insufficient" error as it doesn't recognize C:\ as prefix and \ as path separator. The way I see it, we can approach the problem in 3 ways:

  1. as suggested by @tschneidereit, at the WASI spec level, we could perhaps enforce the convention that / is the only viable path separator
  2. we could transform input paths into POSIX paths upon insertion of the preopen and then when matching input paths against it (i.e., rewriting Windows prefix and every \ into /; this PR favours this approach)
  3. we could modify po_find and po_isprefix to also match for Windows-style prefixes and separators

I'm really curious to see what you guys reckon about all this!

Oh and please note that the draft solution submitted here is, well, just a draft, and I'm sure it can be done in a much better way. But, if we are willing to go down this road, I'll be more than happy to iterate through it until everyone is satisfied that it's done right.

@kubkon
Copy link
Contributor Author

kubkon commented Aug 4, 2019

cc @alexcrichton @mrowqa

@mrowqa
Copy link

mrowqa commented Aug 5, 2019

Option 1 looks great - it simplifies the system. However, Rust stdlib may still use backslashes when calling join on Windows, so it's something to investigate.

Option 2 & 3 are troublesome. Windows also supports paths like \\.\device\, \\unc\share\, \\?\C:\really\long\path, c:rel\path\file.txt and maybe more formats. Without \\?\ Windows usually limits path length to MAX_PATH which is pretty small. And, when using \\?\ the kernel doesn't alter the path too much. One thing is, it doesn't map slashes to backslashes, so \\?\C:/some/path is actually not found. Other fact is that \\?\ doesn't work with relative paths. More details here.

So we should decide where we'll draw the line. Option 1 seems cleaner and much easier to maintain.

PS I couldn't find it now, but there's an open issue for a long time for Rust stdlib to automatically insert \\?\ on Windows.

Edit: And when standardizing the paths, it's worth to have in mind other resources like network connections. And case sensitivity https://github.com/CraneStation/wasi-common/issues/44.

@sbc100
Copy link
Member

sbc100 commented Aug 5, 2019

I support option (1) if possible. Keeps it simple.

I don't see why we should support code that assumes windows-style paths (maybe I'm missing something). Code that thinks its running on windows is going to be broken in many other ways, so to target wasm one would first need to port away from windows towards POSIX.

@mrowqa I'm curious about your concern regarding Rust stdlib? Do you mean that rust that is running in wasm? Presumably not since that would not be compiled for windows. In which case I guess you are referring to the rust stdlib used by wasi embedders written in rust (such as cranelift). My understanding is that if we go with option (1) then any embedder targeting windows would have to have a separate set of path functions for handling host (windows) paths and wasi paths and a way to convert between them.

@kubkon
Copy link
Contributor Author

kubkon commented Aug 6, 2019

I support option (1) if possible. Keeps it simple.

I don't see why we should support code that assumes windows-style paths (maybe I'm missing something). Code that thinks its running on windows is going to be broken in many other ways, so to target wasm one would first need to port away from windows towards POSIX.

@sbc100 I agree that option 1 if possible would indeed be best. However, I do have a couple of concerns.

First of all, the fact that currently parsing Windows paths with forward slashes of the form C:/some/path works is a by-product of libpreopen where this path is incorrectly parsed as a relative POSIX path. Is that a problem? I'm not sure but definitely something we should consider.

Secondly, the fact that paths like /some/path currently work on Windows is, as @mrowqa explained in bytecodealliance/wasmtime#235, the result of Windows auto-magically filling in the blanks for us where /some/path -> C:/some/path.

And lastly, if we decided to use only POSIX style paths on all hosts, do we then let Windows fill in the blanks for us like in the paragraph above, or do we go with Cygwin style paths like /c/some/path? If the latter, then I guess it will be up to the embedder to correctly map /c/some/path to an actual host path of C:\some\path, preopen it and store it as a tuple (fd, /c/some/path). And this is all fine, except that the Wasm app and the user will have to remember to always follow this convention as otherwise libpreopen will fail at extracting the common path prefix. So, using Wasmtime as an example, it would look like this on a Windows host:

wasmtime --dir=/c/some/path cp.wasm -- /c/some/path/source /c/some/path/destination

Does it make sense, or am I way off base here with my worries? :-)

@mrowqa I'm curious about your concern regarding Rust stdlib? Do you mean that rust that is running in wasm? Presumably not since that would not be compiled for windows. In which case I guess you are referring to the rust stdlib used by wasi embedders written in rust (such as cranelift). My understanding is that if we go with option (1) then any embedder targeting windows would have to have a separate set of path functions for handling host (windows) paths and wasi paths and a way to convert between them.

@tschneidereit
Copy link
Member

I'd argue that trying to preopen paths with drive letters in them is a mistake in and of itself, as that means that you're explicitly and only targeting Windows. So is, of course, trying to preopen, say, /usr/local/etc/gitconfig.

Really, developers shouldn't use any absolute paths, even with libpreopen in the mix.

OTOH, we probably do want to support absolute paths, if only to decrease the porting burden.

One possible option to handle all this might be to effectively treat anything that starts with either a /, or with a drive letter (i.e. [a-zA-Z]: as opaque strings that WASI doesn't even try to interpret in any further way. It'd be up to the runtime to provide a preopened mapping for these paths.

As for the wasmtime example:

wasmtime --dir=/c/some/path cp.wasm -- /c/some/path/source /c/some/path/destination

The first of these would be interpreted as a path by wasmtime, and WASI content would only ever see a file handle, if a file can be found. I guess this exact syntax would work iff

  1. the content tries to preopen a directory with the exact opaque string /c/some/path.
  2. wasmtime runs in an environment where the path /c/some/path is resolved successfully, e.g. Cygwin.

If 2 doesn't hold, you'd probably use something like

wasmtime --mapdir=/c/some/path::some/other/path

The two arguments following the -- wouldn't be interpreted as paths to begin with—they'd just be passed to the content as string arguments, and the content would have no way to turn them into file handles (because they're not relative paths) even now.

One final point: we should definitely not make any decisions here based on the current behavior of libpreopen! If we conclude that we need different behavior, we'll change things to that.

@kubkon
Copy link
Contributor Author

kubkon commented Aug 6, 2019

I'd argue that trying to preopen paths with drive letters in them is a mistake in and of itself, as that means that you're explicitly and only targeting Windows. So is, of course, trying to preopen, say, /usr/local/etc/gitconfig.

Really, developers shouldn't use any absolute paths, even with libpreopen in the mix.

OTOH, we probably do want to support absolute paths, if only to decrease the porting burden.

One possible option to handle all this might be to effectively treat anything that starts with either a /, or with a drive letter (i.e. [a-zA-Z]: as opaque strings that WASI doesn't even try to interpret in any further way. It'd be up to the runtime to provide a preopened mapping for these paths.

As for the wasmtime example:

wasmtime --dir=/c/some/path cp.wasm -- /c/some/path/source /c/some/path/destination

The first of these would be interpreted as a path by wasmtime, and WASI content would only ever see a file handle, if a file can be found. I guess this exact syntax would work iff

1. the content tries to preopen a directory with the exact opaque string `/c/some/path`.

2. wasmtime runs in an environment where the path `/c/some/path` is resolved successfully, e.g. Cygwin.

If 2 doesn't hold, you'd probably use something like

wasmtime --mapdir=/c/some/path::some/other/path

The two arguments following the -- wouldn't be interpreted as paths to begin with—they'd just be passed to the content as string arguments, and the content would have no way to turn them into file handles (because they're not relative paths) even now.

@tschneidereit Excellent! This is exactly the point I was trying to make in my comment above, and IMHO I think it'd the best solution to the problem at hand :-) Oh, and FWIW, the arguments passed from Wasmtime to the Wasm app are indeed passed as plain UTF-8 strings.

One final point: we should definitely not make any decisions here based on the current behavior of libpreopen! If we conclude that we need different behavior, we'll change things to that.

Agreed, however, I'm always extra cautious when tweaking critical libraries from the security perspective such as libpreopen.

@mrowqa
Copy link

mrowqa commented Aug 6, 2019

@mrowqa I'm curious about your concern regarding Rust stdlib? Do you mean that rust that is running in wasm? Presumably not since that would not be compiled for windows. In which case I guess you are referring to the rust stdlib used by wasi embedders written in rust (such as cranelift).

My understanding is that if we go with option (1) then any embedder targeting windows would have to have a separate set of path functions for handling host (windows) paths and wasi paths and a way to convert between them.

Yes, that was my concern - dealing with different styles of paths at the same time. I just wanted to point it out as a potential obstacle.

@sbc100
Copy link
Member

sbc100 commented Aug 6, 2019

So IIUC, we are all in agreement that dealing with mapping of wasi paths to host paths (whatever format they might exist in with or without drive letters) is a burden for the embedder an not the wasm module or libpreopen?

If we agree and host paths (absolute or not) are not something the wasm module needs to worry about then we can close this?

@sunfishcode
Copy link
Member

I agree. (At least, for now. WASI in the future may evolve and we may want to revisit this, but we can cross that bridge when we come to it.)

@kubkon
Copy link
Contributor Author

kubkon commented Aug 7, 2019

SGTM! Closing for now then!

@kubkon kubkon closed this Aug 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants