-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Character Set #86
Comments
It isn't explicitly documented yet, but the intention is that strings are UTF-8 in general. The contents of Using UTF-8 does create some problems when interfacing with various existing host platforms, and we don't have all the answers yet. One idea for filenames is to say that a file with a name that can't be translated to UTF-8 can't be accessed by name in WASI. We'd then add an API for iterating over a directory that would allow the file to be accessed without requesting it by name. Similarly, the corresponding idea for command-line arguments is to say that you can't launch a WASI program if the arguments can't be translated to UTF-8. These tradeoffs obviously aren't ideal for all use cases. And, I haven't described case insensitivity, Unicode normalization, reserved characters, name length limits or other issues yet. So there's clearly more work to be done here. If people have ideas about how we should address these issues, we'd be happy for the help :-). |
My suggestion for dealing with the platform specific stuff like normalization would be to expose separate APIs for path normalization. On platforms with no requirements those APIs can be no-ops, but on platforms like Win32 or OS X they can apply any relevant normalization rules. Then software can both detect normalization rules through probing and also reliably perform normalization in advance instead of paying the cost on every I/O operation. For UTF-8 as the standard representation, I think that's the best option. Many runtime environments and libraries can auto-convert to UTF8. In the case of C#, there is a mechanism to specify a custom string marshaler to handle UTF8, and there are plans to make it a built-in interop format in the future. |
Continuing the discussion in https://github.com/WebAssembly/WASI/issues/8. |
…wrap()` (bytecodealliance#86) this is just aesthetic, but prefixed unwraps are a lot harder on the eyes than postfixed
Is the character set / encoding used by WASI specified anywhere? I don't see it in the docs.
It would be nice to see an advisory comment or even requirement that all the char* textual arguments are UTF-8 unless otherwise specified, along with specificity about the encoding of textual arguments - like is environ a list of variables, each terminated by a nul and then terminated by a double nul? etc.
The text was updated successfully, but these errors were encountered: