Skip to content
This repository has been archived by the owner on Aug 17, 2022. It is now read-only.

Refresh, rescope and rename #21

Merged
merged 21 commits into from
Apr 5, 2019
Merged

Refresh, rescope and rename #21

merged 21 commits into from
Apr 5, 2019

Conversation

lukewagner
Copy link
Member

@lukewagner lukewagner commented Mar 22, 2019

This PR updates the Host Bindings proposal as discussed at TPAC and after a bunch of follow-up discussions.

The PR has an overview that I won't restate but one additional proposed change that I'd like to make (if we accept this PR) is to rename this repo from "host-bindings" to "webidl-bindings", to reflect the new scope of the proposal (for reasons given in the FAQ of the PR).

To make sure we have proper visibility and discussion, I'll add an agenda item to the April 2 CG meeting for discussion.

@lukewagner lukewagner changed the title Refresh Refresh, rescope and rename Mar 22, 2019
@lukewagner
Copy link
Member Author

(Accidentally posted before filling out the PR, somehow. Updated with title/description.)

Copy link
Contributor

@jgravelle-google jgravelle-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Wait until after the CG meeting to merge?

Copy link
Contributor

@littledan littledan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I'm a big fan of the design decisions to not include all of WebIDL and just start with some of the most important types (especially given @Ms2ger's ongoing work to remove unused legacy features from WebIDL) and to fall back to JS conversions when the WebIDL types don't match (given how, sometimes, specifications are refactored over time to have different IDL).

I'm looking forward to all of this proposal moving forward and becoming even more concrete. I can imagine many additional types of conversions making sense to add in conjunction with the GC proposal (e.g., passing strings in and out without copying).

@lukewagner
Copy link
Member Author

Great to hear, thanks for the feedback. I'll hold off merging until a chance to discuss in the next CG meeting.

Copy link
Member

@binji binji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this looks really good. It's much easier now to see how this all fits together, and where we can add more functionality to provide more expressive bindings. It'd be great to see more example bindings, including cases that aren't representable with the current binding expressions.

proposals/webidl-bindings/Explainer.md Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
@annevk
Copy link
Member

annevk commented Mar 29, 2019

Very exciting!

(given how, sometimes, specifications are refactored over time to have different IDL).

This is probably worth calling out explicitly as this is indeed somewhat common, including changing long to double and such). I take it this works for both input and return values? Really cool tactic, though I guess that means there will be a little bit more overhead.

Am I correct that the intention here is to always handle DOMString via a UTF-8 buffer? What happens if the DOMString contains lone surrogates?

It might be worth considering to map ByteString to a buffer (which I'm assuming is a byte sequence). The type is primarily used for byte sequences that happen to look a lot like strings and can be manipulated as such to some extent (e.g., HTTP header values). I guess the question is to what extent that distinction is meaningful for Wasm. Concretely, if a header value is 0xFF, do you want to see that as 0xFF or U+00FF expressed as two UTF-8 bytes?

@lukewagner
Copy link
Member Author

@annevk Thanks for the comments; PTAL to see if they were addressed correctly.

This is probably worth calling out explicitly as this is indeed somewhat common, including changing long to double and such).

I added a third bullet to "Question 2" in the "JS API Integration" section.

I take it this works for both input and return values? Really cool tactic, though I guess that means there will be a little bit more overhead.

Yes, input and return values (both being part of the "Web IDL Signature" which can mismatch. This is an instantiation-time check that only takes the slower, roundabout path on mismatch, so hopefully no overhead in the common case.

Am I correct that the intention here is to always handle DOMString via a UTF-8 buffer? What happens if the DOMString contains lone surrogates?

Ah, good question; I glossed over this. So I think this only comes up for the alloc‑utf8‑str incoming binding operator. My default assumption is that this case should trap (in general there are a number of cases where binding operators can trap). I'll add a note to alloc-utf8-str.

It might be worth considering to map ByteString to a buffer (which I'm assuming is a byte sequence). The type is primarily used for byte sequences that happen to look a lot like strings and can be manipulated as such to some extent (e.g., HTTP header values). I guess the question is to what extent that distinction is meaningful for Wasm. Concretely, if a header value is 0xFF, do you want to see that as 0xFF or U+00FF expressed as two UTF-8 bytes?

Good point! Thinking about it some more, ByteString is practically a BufferSource; its only essential difference is that the ECMAScript Binding layer spits out a JS string instead of an ArrayBuffer (for obvious historical reasons). Thus, I'll remove ByteString from the string ops and add it to the buffer ops.

@annevk
Copy link
Member

annevk commented Mar 30, 2019

Trapping is some kind of error? So if you store a lone surrogate in a Text node's data in JavaScript you cannot get it out through Wasm? Mapping the lone surrogates to U+FFFD seems preferable if so.

cc @hsivonen

@alexcrichton
Copy link

We've been discussing the problem of lone surrogates recently with wasm-bindgen, and our default usage of TextEncoder to convert a JS string to a Rust utf-8 string does use the replacement character today. What it means though is that we didn't realize that the conversion from a JS string to a Rust string was lossy initially, although we do know now!

Our current thinking is that we will add a method to detect if a JS string is invalid utf-16 (aka has a lone surrogate), and then code that wants to be robust will do the moral equivalent of taking anyref and doing runtime checking.

In that sense I think that I might agree with @annevk that using replacement characters on lone surrogates might make more sense since it matches TextEncoder's behavior. It would be important to document though!

@Pauan
Copy link

Pauan commented Mar 30, 2019

To clarify a bit, when the user types a single character in an <input> on Windows, it will send two input events (one for each surrogate pair).

So when the first input event happens, it sends the string from JS to Rust (which uses TextEncoder to convert from UTF-16 to UTF-8), and TextEncoder replaces the unpaired surrogate with the replacement character.

That's pretty terrible since now the string doesn't match what the user typed (and this then leads to other bugs).

So personally I would prefer a hard error, but I agree with @alexcrichton that matching existing TextEncoder behavior is probably correct (albeit unfortunate).

@annevk
Copy link
Member

annevk commented Mar 30, 2019

@Pauan is there an issue tracking that particular way of dealing with user input? That seems like something that ought to be fixed in implementations (and specification, if it's unclear).

(And yeah, TextEncoder expects the caller to not pass in lone surrogates (because it uses USVString, which replaces lone surrogates with U+FFFD, not DOMString, which preserves them). We could potentially add a static or some such that tells you whether your string contains them. Please file something at https://github.com/whatwg/encoding if that would be useful.)

@Pauan
Copy link

Pauan commented Mar 30, 2019

@annevk is there an issue tracking that particular way of dealing with user input?

On our side we have rustwasm/wasm-bindgen#1348 (linked earlier by @alexcrichton).

The solution we decided upon was to just completely ignore input events which contain unpaired surrogates. This of course requires detecting (in JS) whether a string contains unpaired surrogates or not.

As for a tracking issue in browsers or the spec, not that I'm aware of. But then again, I haven't really looked.

That seems like something that ought to be fixed in implementations (and specification, if it's unclear).

If it could be fixed in browsers, then that would be fantastic! Right now we have to ignore input events which contain unpaired surrogates.

We could potentially add a static or some such that tells you whether your string contains them.

Yeah, that would be useful! Right now we have to use a function like this, but it can probably be implemented faster in browsers.

@lukewagner
Copy link
Member Author

lukewagner commented Apr 1, 2019

Oops, right. Ultimately, I assumed the spec here would delegate to TextEncoder, so I'll just say that explicitly instead of trying to incorrectly guess what it would do.

proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
proposals/webidl-bindings/Explainer.md Outdated Show resolved Hide resolved
@binji binji mentioned this pull request Apr 4, 2019
statement:

```wasm
(webidl-type $Contact (dict (field "name" DOMString) (field "age" long)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the text format of this should be as close as possible to the 'official' syntax for webIDL itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think consistency with the existing text format is also a useful property, but probably we can discuss this in a follow-up issue and iterate on the explainer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to uniformly render custom sections in the text format by means of the proposed custom annotation syntax? That would prevent all sorts of compatibility and tooling issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot all about that. So, iiuc, the request is to make sure that any new syntax is contained by some (@webidl ...), and then otherwise the ... can be anything (as long as it follows the basic paren-matching rules)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within limits. It has to be lexible as Wasm tokens. That covers a wide range of lexical syntax (because Wasm identifiers can be almost anything), but we'd need to extend the proposal slightly if we wanted to allow other kinds of brackets as well or "foreign" punctuation like commas and semicolons. Those wouldn't be a problem. (Though it would be a problem to allow completely different lexing rules, mainly because of parens in comments and strings.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(That said, I wouldn't recommend going wild with annotation syntax but rather keep the spirit of the surrounding Wasm syntax.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, that makes sense, but for the two practical options we're considering here (wat-style or Web IDL), it sounds like we're fine, so the only real requirement is, whatever it is, it is seated in a (@webidl ...), not a (webidl...), right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Righto, will fix, thanks for pointing that out.

@lukewagner
Copy link
Member Author

Following the unanimous poll at the last CG meeting and no objections since, I'll merge, rename the repo, and we can continue to iterate on the design going forward.

@lukewagner lukewagner merged commit e52ecda into master Apr 5, 2019
@lukewagner lukewagner deleted the refresh branch April 5, 2019 00:36
@lukewagner lukewagner restored the refresh branch April 5, 2019 00:45
@lukewagner lukewagner deleted the refresh branch April 5, 2019 00:46
@fgmccabe
Copy link
Contributor

fgmccabe commented Apr 5, 2019 via email

@rossberg
Copy link
Member

rossberg commented Apr 5, 2019

FWIW, the custom annotation syntax should probably be able to embed existing WebIDL syntax:

(@webidl
  dictionary Contact {
    DOMString name;
    long age;
  };
)

would be syntactically well-formed. Whether it's desirable stylistically I'll leave for others to decide.

@lukewagner
Copy link
Member Author

Good to know; probably best to open a new issue to discuss Web IDL vs s-expression syntax.

@littledan
Copy link
Contributor

I'd suggest not using the WebIDL surface syntax, as this has changed over time and is continuing to change.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.