-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode hook not adequate #235
Comments
FWIW, encoding_rs provides "html" as a built-in capability and Firefox uses this for form submission. Then the URL implementation in Firefox uses the lower-level API that exposes the error code point instead of performing replacement. In that sense, leaving "html" in the spec would leave software and the spec as corresponding to each other better than removing "html" and only providing the low-level abstraction. Within the spec, though, "html" should layer on top of the low level. |
Notes:
|
This way you can invoke its handler directly without going the process algorithm as is needed to fix #235. This also avoids the need for "prepend" in the "process" algorithm. Additionally, this commit adds two clarifying asserts to the "process" algorithm documenting what error modes can be in effect when.
I think I have a proposal that works:
Thoughts appreciated! |
Makes sense to me. For non-browser users of the spec, it would be prudent to document that ISO-2022-JP may be left in the Roman state, so the safe bytes available for replacement are the ones that have the ASCII interpretation even in the Roman state. |
This avoids the need for "prepend" in the "process" algorithm as is needed to fix #235. Additionally, this commit adds two clarifying asserts to the "process" algorithm documenting what error modes can be in effect when. Related tests: web-platform-tests/wpt#26158.
Encode or fail would not work as the encoder can be in one of two states when returning an error. So I think we need to do the alternative proposal or a variant of that whereby the caller keeps the encoder alive so it can retain state. |
While working on whatwg/url#557 I realized that the URL standard will have to invoke the Encoding Standard at a lower level of abstraction as it needs to deal with erroneous output (
&#...;
) differently from non-erroneous output. I.e., an error that results in&#...;
might have to be percent-encoded, but a non-error&#...;
sequence might not have to be.So URL basically wants to invoke the encoder's handler directly I think. I don't really see a better abstraction as it needs to deal with errors in a very different way. I suppose we could make error handling a caller defined set of steps, but I don't really like the complexity of that.
text/plain
form submission could in theory still use the current high-level encode hook, but I'm not sure it's worth saving just for that.It also seems there's potentially quite a lot of other potential cleanup that would result from this (e.g., https://encoding.spec.whatwg.org/#concept-encoding-process no longer needs to handle "html").
Having "html" in the Encoding standard as well as this high-level hook was intentional as a way of limiting the amount of badness that could be introduced by consumers, but we will have to use review rather than abstractions for that instead (UTF-8 or die). It's unfortunate, but it might also make the Encoding standard a little leaner.
cc @andreubotella @hsivonen @achristensen07 @JKingweb
The text was updated successfully, but these errors were encountered: