Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the HTML parser's scripting flag not depend on "scripting was enabled" #9426

Open
zcorpan opened this issue Jun 14, 2023 · 5 comments
Open

Comments

@zcorpan
Copy link
Member

zcorpan commented Jun 14, 2023

https://html.spec.whatwg.org/multipage/parsing.html#other-parsing-state-flags says

The scripting flag is set to "enabled" if scripting was enabled for the Document with which the parser is associated when the parser was created, and "disabled" otherwise.

This flag affects parsing of noscript, and "scripting was enabled" is disabled for an <iframe sandbox> document (without allow-scripts) as well as these:

scripts in XMLHttpRequest's responseXML documents, scripts in DOMParser-created documents, scripts in documents created by XSLTProcessor's transformToDocument feature, and scripts that are first inserted by a script into a Document that was created using the createDocument() API.

It's also disabled for <template>'s template contents owner document, at least as implemented in WebKit and Chromium (but Gecko parses with scripting flag enabled), see http://software.hixie.ch/utilities/js/live-dom-viewer/saved/11796

This has caused mutation XSS issues in the past, see https://www.acunetix.com/blog/web-security-zone/mutation-xss-in-google-search/

I think it's not great that noscript is parsed differently for different documents that can be accessed from script. noscript was intended to show content to users when they had scripting disabled in the browser. Introducing parsing differences in different places invites XSS bugs.

Would it be web compatible to make the HTML parser's scripting flag only depend on whether the user has disabled scripting?

Alternatively, drop the scripting flag completely and always parse as if scripting is enabled, but this might regress the UX for users who disable script.

@whatwg/html-parser @whatwg/security

@gsnedders
Copy link

Alternatively, drop the scripting flag completely and always parse as if scripting is enabled, but this might regress the UX for users who disable script.

That would be a significant UX regression, given it would mean noscript becomes meaningless, no?

The even more radical suggestion would be to parse as if scripting is disabled (so noscript still gets parsed correctly!) but with script execution. I doubt we can manage to do this (given it'll cause a performance regression for every noscript in the common case, and it has difficult-to-judge web compatibility implications), but it would be pretty great if the only thing that scripting being enabled changed was script execution itself. (And of course you can still have differential parser attacks in that case, thanks to document.write, but from a sanitizer point-of-view that's "just" a matter of removing all scripts.)

As a mostly-absentee maintainer of a library containing a sanitizer (html5lib), the noscript parsing has often scared me (because it is practically inevitable it will lead to attacks like this), and it feels like it would be a decent security win to the overall ecosystem.

@zcorpan
Copy link
Member Author

zcorpan commented Jun 15, 2023

Right it would be mostly useless. Maybe the display could still be flipped depending on scripting disabled (by the user?), but showing markup as text seems broken.

Always parsing noscript as if scripting is disabled seems like it would break web content e.g. using a meta refresh in noscript, or show duplicate content (scripted and non-scripted widgets), etc.

@mozfreddyb
Copy link
Contributor

It would be nice if <noscript> had different display based on a flag (instead of different parsing). Not sure this can be made compatible though..

@zcorpan
Copy link
Member Author

zcorpan commented Jun 15, 2023

@mozfreddyb it does that today as well as different parsing. https://html.spec.whatwg.org/multipage/rendering.html#hidden-elements

It's not clear to me if @media (scripting) is 1:1 with HTML's "scripting is enabled"

@annevk
Copy link
Member

annevk commented Jun 15, 2023

I could see making HTML fragment parsing better account for the state of the context node's node document. If we want to completely revisit the way this works this needs some more historical context. E.g., I'm pretty sure it was fairly recently that we did the work around sandboxing and that making that work the same as scripting being disabled was an intentional design choice.


Note that changing the way <noscript> parses has other side effects beyond rendering. E.g., it can impact whether a subresource gets fetched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants