Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect handling of unpaired surrogates in JS strings #331

Open
Pauan opened this issue Mar 14, 2019 · 1 comment
Open

Incorrect handling of unpaired surrogates in JS strings #331

Pauan opened this issue Mar 14, 2019 · 1 comment

Comments

@Pauan
Copy link
Contributor

Pauan commented Mar 14, 2019

It was brought to my attention in Pauan/rust-dominator#10 that JavaScript strings (and DOMString) allow for unpaired surrogates.

When using TextEncoder, it will convert those unpaired surrogates into U+FFFD (the replacement character). According to the Unicode spec, this is correct behavior.

The issue is that because the unpaired surrogates are replaced, this is lossy, and that lossiness can cause serious issues.

You can read the above dominator bug report for the nitty gritty details, but the summary is that with <input> fields (and probably other things), it will send two input events, one for each surrogate.

When the first event arrives, the surrogate is unpaired, so because the string is immediately sent to Rust, the unpaired surrogate is converted into the replacement character.

Then the second event arrives, and the surrogate is still unpaired (because the first half was replaced), so the second half also gets replaced with the replacement character.

This has a lot of very deep implications, including for international languages (e.g. Chinese).

I don't see any easy solutions for stdweb, since it always converts JS strings into Rust Strings. This is different from wasm-bindgen which has a separate JsString type (which is specifically for JS strings).

@Pauan
Copy link
Contributor Author

Pauan commented Mar 14, 2019

(wasm-bindgen also suffers from this issue, here is the bug report: rustwasm/wasm-bindgen#1348 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant