-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeking feedback on Clipboard Pickling APIs. #334
Comments
I'm having a trouble understanding what's being proposed here. Conceptually there are two things:
Per prior discussion, for (1), native applications must opt-in to expose information to web apps directly. So, solving this problem requires some kind of agreement between native applications and web browsers to do this. Looks like this proposal is saying that we'd let web app specify For (2), native applications need some mechanism to access the data which isn't usually available to them. The proposal also suggests adding ability to write unsanitized version of things. Again, I'm unsure why this is needed. Why can't the browser always make unsanitized version available to native apps which request it? Why does a website explicitly need to request this? So again, I'm failing to see any need to add Web API for both use cases. Like we've repeatedly stated in the past, Apple's WebKit team believes this is a domain of the operating system design. If these use cases are important, then we'd likely introduce new API at OS level (I'm neither confirming nor denying we may or may not do this). |
Thank you @rniwa for the feedback! I've addressed your comments below. Please let me know if you have further questions!
That is correct. We want native apps and sites to explicitly opt-in to reading the custom formats from the clipboard so we don't expose potentially harmful content without the sites/apps being aware of the security implications.
The |
Again, I'm getting really confused by mixing up reading & writing of system pasteboard content. What problem exactly are we solving by explicitly requesting unsanitized format for reading or writing pasteboard content in a website / webapp. Please define the threat model of each scenario separately, and explain why an explicit request for unsanitized content is required.
Why would the browser need to read both versions without this option?
I really don't follow the description there. When the user copies something on a website, the website shouldn't be in control of whether a given MIME type should be exposed to another app or not. Namely, if a website rewrite "unsanitized" HTML markup, then the browser SHOULD provide both sanitized version & unsanitized version to other native applications at least on Apple platforms. |
Just wanted to quickly reply to some of your concerns now. Will post another reply to the below question describing the threat model in more details:
If we make the process of reading/writing unsanitized content more explicit, then there is an implicit expectation that the developers are aware of the security implications and would have mitigations in place(e.g use intense fuzzing, such as provided by OSSFuzz). Quoting @dway123 here as I think this response captures lot of details about why we are proposing the
In Chromium implementation at least, when clipboard read method is called, we query all the standard formats from the clipboard that are supported by the Browser. This would be very expensive if we have to read all custom formats as well even if the sites haven't requested for any custom formats.
Well, in clipboard read, we use the
Well, async clipboard write method gives complete control to web authors as to what content should be written to the clipboard. This is being achieved by providing the MIME types in |
I really don't follow. When we say developers, are we talking about web developers, or native app developers? Surely, browsers should have to write both versions to the system pasteboard because we don't know when or if the user pastes the content to another browser instance of the same origin or to some other native applications. So it doesn't seem like there is an option left for web developers to say, I want to only write a version of content that didn't go through sanitization process.
I'm really confused here. Why would a website want to read the sanitized version of content from the system pasteboard if a version of the content that's unsanitized is available to them? Is the concern that we want to make sure we don't end up giving them potentially dangerous content? That doesn't seem like a kind of assumption websites should be making in the first place. There is nothing browsers can do to ensure that whatever content read from the system pasteboard won't result in some kind of XSS or even remote server exploits since we have no idea how a website is processing it. e.g. a plain text in the pasteboard could result in XSS if it's inserted inside a script or style tag or some attributes.
I don't see the need for reading the sanitized version. What is the scenario in which a website wants to read unsanitized version of the content?
Right, standard ones. But websites shouldn't be in control of, say, exposing a PSD file unsanitized. Similarly, if a website writes HTML, then the browser needs to provide both sanitized HTML and unsanitized HTML for other browsers and native apps because we don't know at the time of writing to the system pasteboard what the receiver is capable of. |
In that response I was mainly referring to native app developers, but it applies equally to web devs as well. Web developers could use the Sanitizer APIs to decide what elements/attributes to drop and would have control over what content gets pasted.
I'm guessing you are referring to standard formats here as custom formats are always written by sites/apps if they opt-in to reading/writing custom formats. For standard formats, we want sites and native apps to explicitly opt-in to read the unsanitized version so they are aware of the security implications. Some legacy sites and native apps don't receive frequent updates and are not really designed to properly process unsanitized HTML content from the web. This is why we want to always write sanitized version of standard formats so we don't regress the paste behavior in these sites/apps.
There are sites and legacy apps that don't receive updates often. These sites/apps depend on the standard formats(which is predefined by the OS) being available on the pasteboard. Currently we sanitize the standard formats by-default so we don't want to regress that behavior.
Sanitized version is always needed to support sites/apps that don't want to make any changes to their copy/paste code. Legacy native apps (at least on Windows) don't receive frequent updates so we don't want to regress copy/paste behavior in those apps.
Excel Online would want to read and process unsanitized version of the HTML format to preserve rich formats like table cells color. Here is a GIF that shows the difference between unsanitized & sanitized version of the HTML formats. Note how the sanitized version loses styles when pasted into Excel online compared to the unsanitized one.
I don't think the site needs to specify any particular format. They can just serialize the payload and write it under a custom format that can only be interpreted by either the site or the native apps that are aware of this custom format's content. That way both the site and native app have complete control over the content of the payload. This also addresses some security concerns regarding unsanitized custom formats where only the site and the native app know how to parse the content of the custom format and are also able to trust the content by adding some security tokens or something to uniquely identify the payload present in the custom format. |
@rniwa if it would help there is a Web Editing Working Group meeting next Friday, 9/10/2021 (details on the GitHub site) where we'd be happy to discuss this issue in greater depth. We can also schedule a separate meeting dedicated to the topic if that would work better. Let us know if you think we should continue the conversation in one of those higher bandwidth environments. Thanks, Bo |
@BoCupp-Microsoft : I won't be able to attend that meeting since I only work Mon-Wed these days, and I'm barely awake at 11am PDT due to various medications I'm taking. @whsieh can probably attend one of those meetings though. I'm open to scheduling a specific meeting but offline discussiosn over GitHub might be the fastest way given the very restrictive working hours I have these days. |
Tagging folks from FF & Chromium as we are going to discuss this issue in this week's Editing WG meeting. @mkruisselbrink @a-sully @annevk @evilpie |
September WG meeting: @snianu presented an overview of the pickling design, and showed a demo. Discussion to continue in this issue. |
Here is the PPT that we presented today: pickling-api.pdf |
After @snianu's clipboard pickling presentation last week in the Web Editing WG meeting, we had a discussion that resulted in two action items:
|
Tagging @mkruisselbrink to see if he has any concerns with what is suggested in the first point below:
If we are just writing the standard well known formats, then this approach sounds good to me. But, for custom pickled formats, we still need a way for the web authors to provide the custom format name. The For read, I think it is better to have an |
What does this mean? MIME type specifies exactly what type a given format is.
We object to this proposal. |
Talked with @snianu offline and we agree that when writing specifying an
@rniwa there's no objection to the first point as I've outlined it is there?
@rniwa can you clarify what part you are objecting to? And why? :-) |
@whsieh I wrote this above as an action item for you. :-) I'm wondering if you can also bring your findings to our next Web Editing WG meeting on 9/24/2021. Thanks!
|
As a follow-up to this action item from the Web Editing WG meeting from 9/24/2021, members from Apple, Google and Microsoft met last Friday (10/1/2021) to discuss two topics:
For point 1, we concluded that we are going to include the proposed format for the Web Custom Format Map and clipboard format naming conventions as outlined in the current explainer as non-normative notes in the Clipboard API spec. Additional details... we debated for a while whether what will surely become a de facto standard should be included as part of the actual standard using normative text but decided that alternative implementations are possible and that we would leave room for those by using non-normative text. One hypothetical example is that Apple could introduce new platform APIs for the pasteboard that could read and write the proposed pickled format in addition to some legacy pickled formats that vary across browsers today to hide the implementation details from native app authors including browser implementors. As a counterpoint it was argued that to facilitate interchange between native and web apps, the shape of the platform-specific format written to the clipboard must be documented somewhere, and it was better to have it as part of the standard than to require that it be reverse engineered from the apps that happen to implement it first. Our compromise was to include it in the spec using non-normative language. For point 2, Microsoft pointed out that the ClipboardEvent's getData method already provides unsanitized access to the HTML on the clipboard (in Firefox, IE, Edgehtml-based Edge, Chromium-based Edge and Chrome), but that navigator.clipboard.read only returns a sanitized fragment of the HTML on the clipboard. This loss of fidelity creates feature gaps for Microsoft Office apps (and likely many others) and prevents them from adopting the async clipboard API. The proposal is to provide unsanitized access to HTML on the clipboard by using the navigator.clipboard.read method with a new unsanitized option. As a counterpoint, Apple suggested that native apps can't be trusted to write data to the clipboard without revealing document metadata and that the browser should sanitize it to prevent exposure before allowing it to be read by a website. Microsoft expressed skepticism as to whether it was the browser's responsibility to restrict what native apps could place in their HTML data. We agreed to continue discussing point 2 in our Web Editing WG meeting this Friday (10/8/2021). |
The Web Editing Working Group just discussed The full IRC log of that discussion<Travis> Topic: Continue discussion on clipboard APIs<Travis> github: https://github.com/w3c/clipboard-apis/issues/150 <Travis> BoCupp: Not sure this is the right issue... <BoCupp> https://github.com//issues/334 <Travis> .. Ah, in a different repo: https://github.com//issues/334 <Travis> github: https://github.com//issues/334 <Travis> BoCupp: we were able to resolve half the discussion. Agreed to add a non-normative note on the format that is used to communicate with native apps (and vice-versa) <Travis> .. from native->web: current read behavior of nav.clipboard, you get sanitized content. <Travis> .. for exchange with office apps (or similar) the fidelity is too low (loss of formatting, for example) <Travis> .. so we want to add an "unsanitized" option to the read API. <Travis> .. (trying to match ctrl+v) <whsieh> q+ <Travis> .. we want raw content from the pickle jar (if exist) or from well-known HTML format. <Travis> .. if you get that raw data, then native->web works great even for apps not yet updated. <Travis> .. (also want to talk about web->native) <Travis> .. when writing, there is also sanitization happening today. <Travis> .. if we can't support well-known HTML format write, then our partners won't be able to support the API because it cuts off existing support already provided by the setData legacy API. <Travis> .. today in all browser setData is a raw-write for HTML to clipboard. If they lose that in async clipboard it blocks them. <Travis> .. (it's a downgrade for existing apps already having migrated to async clipboard) <Travis> ack whsieh <Travis> whsieh: As discussed there are a few privacy/security issues at play (not fully address from last time) <Travis> .. in webkit the getData/setData, for these webkit treated as a security fix--was surprising to hear this was only limited to one of them. <Travis> .. on copy/paste of content to native apps, they can reach into the pickle jar. So this can already work without a sanitized write. <Travis> .. Without any explicit Api changes <Travis> .. there are privacy issues native->web copy/paste. <BoCupp> q? <Travis> .. e.g., Word does add some things (like filepaths) into the clipboard and could expose directory structure to the web, and would be a non-starter for unsanitized read. <Travis> BoCupp: Problem: some existing native apps would take a long time to update (they only read from the well-known format today). They can't simultaneously use the new API (when avaiable) and the old one. (They have to pick one.) <Travis> .. This then creates a blocker until apps updated to read from new pickle jar. <GameMake_> q+ <Travis> whsieh: So native app side, they would try to read from pickle jar? <Travis> BoCupp: an existing app--doesn't know about the pickle jar yet. <Travis> .. today they read and get full fidelity. <Travis> whsieh: problem is... web pages adopt new API, old version of native apps would get ? <Travis> .. they can't use both setData AND nav.clipboard.write (it's one or the other)? <Travis> johanneswilm: What would happen in that case? <Travis> BoCupp: Maybe last write wins? But code is not written to handle that (one overwrites anything else). <Travis> .. the writes are basically atomic for a write (and the results that get generated) <Travis> .. might be able to specify how they work together? <Travis> .. but want to have unsanitized write using the new API. <Travis> ack GameMake_ <Travis> GameMake_: I understand that new version is not backward compatible... how is this new version back compatible? <Travis> .. so how does old version of Word work? <Travis> .. Given a new API (unsanitized write).. How does the old version of word get this today? <Travis> BoCupp: Word is coded to read a well-known HTML format. WebApps fills this format today with whatever they want (and it goes through unsanitized). <Travis> .. using setData API. <Travis> .. on async write, we started sanitizing content and putting it into the same "slot" that Word reads from. <whsieh> q+ <Travis> .. am proposing that the write on async clipboard fill the same slot in the same way. <Travis> GameMake_: So, when on web, when using new API with unsanitized version, then only the unsanitized version is being written? <Travis> .. thinking that sanitization was to improve security. So, how does this not become a problem. <whsieh> q- <Travis> BoCupp: threat was for arbitrary (new) formats (never accessible from the web) before. And Apps wouldn't be prepared and would be vulnerable to this. (The web being able to write to those format names.) <Travis> .. Those were the threats we were concerned about. <Travis> .. I don't think apps (like Word) aren't aware of the risk, as they are already using the HTML format. <whsieh> q+ <Travis> .. I think native apps, getting HTML should be ready. <Travis> GameMake_: So the new sanitized format flag only work for well-known formats? <Travis> BoCupp: the behavior of write not only allows arbitrary mime-types to be written (sanitized), pre-existing well-known types to flow through in the way they did in the past. <Travis> .. am primarily focused on HTML format write now. (But may want to expand to other well-known, existing types.) <Travis> .. For read, there must be an explicit option to "give me raw". <Travis> .. (restating) we built a new API, but customers won't move to it (because it's a loss of functionality available today). <Travis> GameMake_: Didn't know why we chose to open the whole... <Travis> s/whole/hole <Travis> BoCupp: Agree that we did too much (sanitizing on write for well-known) <Travis> GameMake_: Not sure I see the complete problem <Travis> .. if we were solving for a problem, we can't just ignore that problem... <johanneswilm> +q <Travis> .. but if we bringing this back to a prior state, I could be more supportive. <whsieh> q- <Travis> BoCupp: In the security process, we fix/patch the holes. They've supposedly been addressed in the past (or continue to be so). <Travis> .. when the async clipboard write was proposed in the prior group (with garykac's proposal), I opposed the change. <Travis> .. was there a real threat, we would have tried to solve that. But the change wasn't based on a threat, it was just a suggestion. <Travis> .. If there was a motivation for the change, I'd really like to know what it was. <whsieh> q+ to: mention that we really can't expect all native apps to sanitize `script` tags, for instance <Travis> GameMake_: I definitely want to know what the reason was. <Travis> q? <Travis> ack johanneswilm <Travis> johanneswilm: If Safari was the only one that removed it, then maybe they can tell us what the issue was? <Travis> whsieh: Don't know of a specific native app that might have been taken by the exploit. <Travis> .. Just don't want to expect all native apps (moving forward) to be able to do the sanitization steps. <Travis> .. It's hard to get that right. <BoCupp> why would native Word need to worry about stripping onmouseover events? <BoCupp> that's a web app concern <Travis> .. the Browser is in the middle, and should be responsible for sanitizing. <Travis> .. is also at odds with the compatibility story. <BoCupp> q? <Travis> whsieh: to BoCupp, maybe not native word, but perhaps an electron app? <Travis> .. there are some corner cases we wouldn't expect them to catch. <Travis> BoCupp: For electron, they have access to sanitization on read (default) (if they are web-based). <Travis> whsieh: this requires them to use the web API...but they aren't limited to that given they are native? <Travis> .. my understand would be the only way to access the data would be through opt-in unsanitized read. <Travis> BoCupp: special meeting to continue this discussion? <Travis> .. given we're out of time <Travis> johanneswilm: I propose we meet again on the 15th. <Travis> BoCupp: Sounds fine. <Travis> .. I do want to make progress... if we're running in circles I don't want to waste folks time. <Travis> action: add a special meeting on October 15th (same time/place) <Travis> travis: will just cover this topic on the 15th. <Travis> whsieh: I'm sure we can come to some consensus ;-) <Travis> Thanks everyone! I think we covered a lot of ground today. |
The Web Editing Working Group just discussed The full IRC log of that discussion<Travis> topic: Seeking feedback on Clipboard Pickling APIs<Travis> github: https://github.com//issues/334 <tilgovi> bo: right now we have a GitHub issue continuing security review led by some GitHub engineers. The last consensus that we had was that we want to document a format for interchange between native apps and web application. We agreed we would write that in a non-normative note. <tilgovi> ... we had some disagreement on the sanitization procedures for both read and write <tilgovi> ... Our proposal has been that when we read, we could supply a new option, "unsanitize", that takes a list of content types to read without sanitization <tilgovi> ... We maintain that it's unnecessary to sanitize on write to the clipboard. Certainly for clipboard pickling, you can't sanitize a format you don't understand. More importantly, for the well-known format text/html, we want to be able to write it unsanitized. We detailed the impact that it has on applications if we lose the fidelity of writing unsanitized HTML. <tilgovi> ... We had resolved that we would allow user agents to sanitize or not, at the last special meeting. I don't think we have anything new. <tilgovi> Anne: It sounds quite bad for web developers, everything that is optional. <tilgovi> Wenson: it would be nice to reiterate why sanitize is necessary on read. <tilgovi> Bo: At least with Chromium implementation, if you Ctrl+V, if you don't do anything, we process HTML before we put it into the document. We make "insert ready" HTML. <tilgovi> Bo: Unlike the legacy API, that returns unsanitized content, Chromium browsers produce an insert ready fragment today. This has a side effect of degrading the fidelity of the HTML that the application can see. <tilgovi> ... the web application for Word online is not able to change to the new API because it needs style rules inserted by MS Word. In a nutshell, supporting unsanitized is backwards compatibility. <snianu> Some context on the HTML sanitization issue: https://github.com/w3c/clipboard-apis/issues/150 <tilgovi> Wenson: When we shipped sanitization for clipboard, at lest for cross-origin, all data is sanitized. Including native to web app <tilgovi> ... We have certain quirks in the algorithms for sanitization, some of them for things like MS Office, to make some workflows work. <Travis> q? <tilgovi> Bo: I think it might be hard to specify the quirks we would need. <johanneswilm> q+ <Travis> ack johanneswilm <tilgovi> johanneswilm: The thing that you're saying about exceptions is not really accessible to developers making applications. It would be good if this doesn't just happen to work in MS Word. <tilgovi> Wenson: There are security goals we need to uphold, but workflows we acknowledge will be useful for unsanitized content. For those we want to expose native APIs that allow apps to expose raw content if they want to do so, knowing that it's going to be read by arbitrary web content. That is something we want to support, but I don't think we can do it and uphold security goals at the same time. <BoCupp> q+ <tilgovi> annevk: I have a question about security goals. <tilgovi> ... say there is a cross origin exchange, web->web or native->web. Either they could agree to use a pickled type, or they could agree to opt into unsanitized behavior. I'm not sure what the difference is between using a new MIME type or a MIME type with unsanitized. <tilgovi> BoCupp: the difference is for native apps that are already writing the well-known HTML format, can web apps read that existing HTML. If they both opt into using text/html2, they can. But it should be acceptable for them to read text/html as they exist today. <tilgovi> ... Native -> web, Wenson railed a concern that native apps might put in comments author metadata that might reveal private information. <tilgovi> Wenson: there are real world examples of this. MS Word will put a user's file paths within attributes that get copied. <tilgovi> johanneswilm: one question here. One example I've heard is MS word file paths and this is what all this about. But MS word is also the one application you happen to make exceptions for in the paste rules. But this affects other applications, too, right? <Travis> q? <tilgovi> Wenson: that's currently the case, yes. <Travis> ack BoCupp <tilgovi> BoCupp: this is just a recap of the special meeting. We have scenarios we need to enable. We differ in behavior today, and we haven't come to consensus except to say this is all optional. <tilgovi> annevk: there was agreement to make it optional? <tilgovi> BoCupp: we did agree we could write "browser MAY do this". <tilgovi> Wenson: that is correct <tilgovi> annevk: that was instead of an "unsanitized" options bag that some implementation might ignore? <tilgovi> BoCupp: can someone propose a step for moving forward? <tilgovi> BoCupp: from our perspective, it's not locked down with the legacy APIs and it's unfortunate that the new APIs don't allow the same support. Our customers can't use it as it's currently authored. <tilgovi> annevk: I see it as Chromium has a privacy bug in its legacy APIs. <tilgovi> BoCupp: well, the idea that HTML as a format isn't meant to be shared with the web is strange. <tilgovi> Anupam: Firefox also returns unsanitized content. <tilgovi> ... If there was privacy or security concern it's been there for decades. <snianu> The issue that I linked above shows how FF, Chrome, Edge(old and new) return unsanitized HTML content via DataTransfer APIs. <tilgovi> rrsagent, bookmark <RRSAgent> See https://www.w3.org/2021/10/29-editing-irc#T16-32-41 |
Here is the explainer: https://github.com/w3c/editing/blob/gh-pages/docs/clipboard-pickling/explainer.md#pickling-for-async-clipboard-api
Would love to hear feedback on the design of the API & the OS format naming proposal. Tagging few folks to get some attention as we have just completed the implementation in Chromium and looking to request for origin trail to experiment with our partners.
@rniwa @whsieh @annevk @BoCupp-Microsoft @garykac @mkruisselbrink
The text was updated successfully, but these errors were encountered: