Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character Set #86

Closed
kg opened this issue Mar 29, 2019 · 3 comments
Closed

Character Set #86

kg opened this issue Mar 29, 2019 · 3 comments
Labels
wasi:api Issues pertaining to the WASI API, not necessarily specific to Wasmtime.

Comments

@kg
Copy link

kg commented Mar 29, 2019

Is the character set / encoding used by WASI specified anywhere? I don't see it in the docs.

It would be nice to see an advisory comment or even requirement that all the char* textual arguments are UTF-8 unless otherwise specified, along with specificity about the encoding of textual arguments - like is environ a list of variables, each terminated by a nul and then terminated by a double nul? etc.

@sunfishcode sunfishcode added the wasi:api Issues pertaining to the WASI API, not necessarily specific to Wasmtime. label Mar 29, 2019
@sunfishcode
Copy link
Member

It isn't explicitly documented yet, but the intention is that strings are UTF-8 in general. The contents of argv and environ follow POSIX as far as NUL string termination and NULL pointer array termination go.

Using UTF-8 does create some problems when interfacing with various existing host platforms, and we don't have all the answers yet.

One idea for filenames is to say that a file with a name that can't be translated to UTF-8 can't be accessed by name in WASI. We'd then add an API for iterating over a directory that would allow the file to be accessed without requesting it by name. Similarly, the corresponding idea for command-line arguments is to say that you can't launch a WASI program if the arguments can't be translated to UTF-8.

These tradeoffs obviously aren't ideal for all use cases. And, I haven't described case insensitivity, Unicode normalization, reserved characters, name length limits or other issues yet. So there's clearly more work to be done here. If people have ideas about how we should address these issues, we'd be happy for the help :-).

@kg
Copy link
Author

kg commented Mar 30, 2019

My suggestion for dealing with the platform specific stuff like normalization would be to expose separate APIs for path normalization. On platforms with no requirements those APIs can be no-ops, but on platforms like Win32 or OS X they can apply any relevant normalization rules. Then software can both detect normalization rules through probing and also reliably perform normalization in advance instead of paying the cost on every I/O operation.

For UTF-8 as the standard representation, I think that's the best option. Many runtime environments and libraries can auto-convert to UTF8. In the case of C#, there is a mechanism to specify a custom string marshaler to handle UTF8, and there are plans to make it a built-in interop format in the future.

@sunfishcode
Copy link
Member

Continuing the discussion in https://github.com/WebAssembly/WASI/issues/8.

grishasobol pushed a commit to grishasobol/wasmtime that referenced this issue Nov 29, 2021
pchickey added a commit to pchickey/wasmtime that referenced this issue May 12, 2023
…wrap()` (bytecodealliance#86)

this is just aesthetic, but prefixed unwraps are a lot harder on the
eyes than postfixed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasi:api Issues pertaining to the WASI API, not necessarily specific to Wasmtime.
Projects
None yet
Development

No branches or pull requests

2 participants