Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow self-closing tags everywhere #9491

Closed
jakearchibald opened this issue Jul 6, 2023 · 43 comments
Closed

Allow self-closing tags everywhere #9491

jakearchibald opened this issue Jul 6, 2023 · 43 comments
Labels
addition/proposal New features or enhancements needs compat analysis needs implementer interest Moving the issue forward requires implementers to express interest topic: parser

Comments

@jakearchibald
Copy link
Contributor

People really seem to like self-closing tag syntax (see the replies to https://twitter.com/jaffathecake/status/1676843832284004353).

Maybe a switch should be added to allow them to be used on all elements?

<!doctype html allow-self-closing-or-whatever>
<!-- Now self-closes: -->
<div/>
<my-component/>
<script src=""/>

Right now, documents can be a mix of rules where /> is largely meaningless, except in SVG and MathML. Making everything consistent seems… good?

@keithamus keithamus added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: parser needs compat analysis labels Jul 6, 2023
@WebReflection
Copy link
Contributor

WebReflection commented Jul 6, 2023

I have "a tiny Déjà vu"

(this is discussed and desired since at least 2016 btw ... glad we keep desiring this in 2023 - and rightly so)

@zcorpan
Copy link
Member

zcorpan commented Jul 6, 2023

I think we shouldn't introduce new parser flags that change parsing behavior. They cause XSS issues. (In #9426 we're investigating if we can remove the scripting enabled flag.)

cc @whatwg/html-parser

@WebReflection
Copy link
Contributor

it's opt in, it won't cause issues to developers opting in + it's not about scripting neither, it's just a "don't ignore that /> ever" desired feature which, instead, could lead to XSS or any other kind of issue if people believe that /> meant the end of that tag.

As half a joke though:

<!doctype x-html>

would be lovely

@zcorpan
Copy link
Member

zcorpan commented Jul 6, 2023

Sites that opt in will expose themselves to XSS issues if they also use a sanitizer that doesn't support this.

Even if the sanitizer supports this, it can be confused by the different parsing in different documents, which again can cause XSS issues.

Example: https://bugzilla.mozilla.org/show_bug.cgi?id=1615315

An exploit here could be something like:

<style/><img src onerror=alert(1)><style></style>

@WebReflection
Copy link
Contributor

WebReflection commented Jul 6, 2023

Sites that opt in will expose themselves to XSS issues if they also use a sanitizer that doesn't support this.

it's like saying, if you switch to Python 3 but you use Python 2 tools to lint your code expect issues ... not sure I am following.

An exploit here could be something like: ...

not sure I am following that neither ... you are creating invalid layout on purpose, see my previous Python v3 VS v2 analogy.

P.S. in case my "half joke" hint wasn't clear, if there's any way to enable this, parser should throw if meant to be void elements are not self-closed as those are not welcome in the parser with that flag on so I don't see issues or any extra XSS that's not possible already in HTML5.

@keithamus
Copy link
Contributor

it's like saying, if you switch to Python 3 but you use Python 2 tools to lint your code expect issues ... not sure I am following.

The difference is that current sanitizer libraries will have no mechanisms to detect whether or not they're in "non-self closing mode" vs "self closing mode". Declaring self closing mode on a page and using a library which does not support/detect will expose authors to vulnerabilities. There's no reasonable expectation that those libraries would support such a mode. A constraint for implementing this is to avoid such a scenario. Simon is saying that an opt-in does not avoid the scenario, therefore fails to meet the constraint.

@zcorpan
Copy link
Member

zcorpan commented Jul 6, 2023

if there's any way to enable this, parser should throw

You can use XML.

@WebReflection
Copy link
Contributor

WebReflection commented Jul 6, 2023

Declaring self closing mode on a page and using a library which does not support/detect will expose authors to vulnerabilities

we have parseFromString where if you pass the wrong mime you're subject to the same issue you're mentioning ... right? if I use a library that doesn't support that syntax I should change library or contribute to make it compatible? ... as a new flag? ... like all parsers / transpilers / linters have ?

You can use XML.

to see no image and have no HTML at all on the page, even with correct layout? can we keep the conversation focused, please?

@sideshowbarker
Copy link
Contributor

sideshowbarker commented Jul 6, 2023

Worth adding a reminder here that the only effect the doctype has in browsers is to prevent browsers from using quirks mode to render the document: Without the doctype, browsers use quirks mode; with the doctype, they don’t. Ideally, we’d not want to have the doctype at all — because it has zero purpose other than preventing quirks mode — but it’s one of those legacy misfeatures that we’re now stuck with forever for backward-compat reasons.

So, given that, using the doctype as a way to opt into causing any particular other behavior in browsers would likely cause a side effect of leading people to have the wrong mental model of what the doctype is — it could mislead people into thinking the doctype in HTML has some general meaning and purpose in browsers that it doesn’t actually have, and that the allow-self-closing-or-whatever token is just adding to the intended purpose of the doctype in browsers, but which it actually isn’t.

@sideshowbarker

This comment was marked as resolved.

@sideshowbarker
Copy link
Contributor

sideshowbarker commented Jul 6, 2023

I’ll also add that, from previous discussions we’d had with implementors about other proposals that require changes to the parsing algorithm and to HTML parsers in browsers: Implementers are very unlikely to support/implement further changes to parsing behavior except for very compelling reasons. And I think we’d find that implementors won’t judge this to be a compelling reason to make further changes to the parsing behavior.

@WebReflection
Copy link
Contributor

WebReflection commented Jul 6, 2023

the doctype has in browsers is to prevent browsers from using quirks mode to render the document

which is a wonderful feature and it happens to play a wonderful role ... see AMP also using it to grant no quirks and bootstrap itself (let's not talk about AMP in general, it was meant as example where you avoid quirks and you enable features in one step). (edit: just remembered they added an attribute to <html> instead, sorry

but it’s one of those legacy misfeatures that we’re now stuck with forever for backward-compat reasons.

I don't think it's that bad ... it's like a she-bang on top of executable and it serves a nice purpose ... the alternative is to go through a new mime-type, a new file extension that can't be .xhtml and in doing so, we'll go through years of back and forward in the making.

The never ignored self-closing tags has been desired for already 7+ years and if some lovely legacy artifact could help everyone move forward faster, I'd say "why not" ... but then again, literally any way to have this landed would (personally) work to me.

@keithamus
Copy link
Contributor

Declaring self closing mode on a page and using a library which does not support/detect will expose authors to vulnerabilities

we have parseFromString where if you pass the wrong mime you're subject to the same issue you're mentioning ... right?

It wouldn't. parseFromString, being a browser API, would be concordant with the documents parse modes. That is to say browsers which acknowledge self closing tags would also make parseFromString do the same, and browsers which don't will not. However, the problem lies with user land libraries. If the document has "modes" that I can opt into, current versions of user land libraries (just as an example, DOMPurify) have no way to detect that mode and so we enter a state where I am using a sanitizer to sanitize content that includes self closing tags, while the sanitizer does not have the capability to parse them.

@WebReflection
Copy link
Contributor

However, the problem lies with user land libraries

I wonder if these concerns were raised when HTML5 saw the light ... but again, the argument about "user land libraries" being outdated has never been an issue for the entirety of the TC39 or CSS story so I wonder why this is being raised in here.

@keithamus
Copy link
Contributor

However, the problem lies with user land libraries

I wonder if these concerns were raised when HTML5 saw the light ... but again, the argument about "user land libraries" being outdated has never been an issue for the entirety of the TC39 or CSS story so I wonder why this is being raised in here.

It's being raised here because changes to the parsing algorithm can introduce XSS vulnerabilities. It is raised in any discussion about changing the parsing algorithm. It is something that each change to the parsing algorithm must navigate.

@WebReflection
Copy link
Contributor

OK, but the only example is a malformed layout with an old XSS thing from the 90s' ... does anyone else has a compelling XSS story / example to show and, if that's the case, what are the parsing libraries we should notify about this eventual change as "opt-in flag" to allow/consider?

@keithamus
Copy link
Contributor

OK, but the only example is a malformed layout with an old XSS thing from the 90s' ... does anyone else has a compelling XSS story / example to show and, if that's the case, what are the parsing libraries we should notify about this eventual change as "opt-in flag" to allow/consider?

Notifying parsing libraries to update does not solve the issue. A library can be updated but all prior versions will be vulnerable. Those older versions and their installations do not disappear. Changes to the parser must not introduce security vulnerabilities in existing software.

@WebReflection
Copy link
Contributor

WebReflection commented Jul 6, 2023

I agree with what you are saying but I am also hearing HTML as it is won't ever change from now on ... is that the future of the Web as seen by browser vendors?

@zcorpan
Copy link
Member

zcorpan commented Jul 6, 2023

You can use XML.

to see no image and have no HTML at all on the page, even with correct layout? can we keep the conversation focused, please?

You can use HTML elements in XML (which you might also call XHTML).

@zcorpan
Copy link
Member

zcorpan commented Jul 6, 2023

not sure I am following that neither ... you are creating invalid layout on purpose

The snippet is what an attacker would use as user-generated content that is allowed by the page's sanitizer (if it allows style elements, but this is not limited to the style element). The issue with the snippet isn't "layout" but that it executes attacker-controlled script (i.e., it's XSS).

@zcorpan
Copy link
Member

zcorpan commented Jul 6, 2023

I agree with what you are saying but I am also hearing HTML as it is won't ever change from now on

HTML changes quite a bit, but changes need to not introduce new security risks for users. Changes to the HTML parser are particularly security-sensitive.

@WebReflection
Copy link
Contributor

You can use HTML elements in XML (which you might also call XHTML).

imagine it's 22 years I am doing this and XSLT is probably the next thing you'll tell me about ... still, I can't use just XML parser for HTML content, and I trust you know that too.

The snippet is what an attacker would use as user-generated content that is allowed by the page's sanitizer

Not to my understanding. What I'm expecting is that once an explicit opt-in flag is used, a non closing <img> would fail at the parser level. It's all in or nothing, or this won't go anywhere indeed as proposal.

HTML changes quite a bit

I need to scroll a lot to see any HTML change in there ... it's all about Babel involved folks or JS APIs so I am not sure what you mean there ... what I meant was in term of parsing abilities, as this thread underlines it's nobody intent to change that.

Changes to the HTML parser are particularly security-sensitive.

So, imagine your example either has issue already, so it's not a point, or it would throw with this flag on because the image tag is not self-closing, what are your real-world concern here? Do you have any example that is not already failing with current status-quo around this proposal?

@WebReflection
Copy link
Contributor

WebReflection commented Jul 6, 2023

maybe this is worth clarifying for the sake of this discussion, and I don't know if @jakearchibald had a different idea, but a flag to enable exact same XHTML parser that would fail if void elements are not self closing is what I am after, without all the complications that XHTML needs (impossibly to remember doctype, special content-type, and so on).

Every single browser is already capable of that, and if you read carefully the Jake's mentioned thread, everybody is using linters, tools, parsers, to allow and want self closing tags everywhere it's needed, which is not like 20 years ago when React and JSX didn't exist, everyone writes self closing tags even out of a mistake / rather habit, but that's normal.

Accordingly, all arguments for something already available as a STRICT DTD XHTML parser for when such flag is used, would be what I am after ... if anyone wants instead a parsing for HTML that sometimes is OK with <br />, sometimes understands <span /> and sometimes it doesn't, that's not at all what I am personally after, and not what I'd ever want to see on the Web (edit: ironically, that's the current misleading status-quo, producing more XSS vectors - by accident - it's aiming to solve).

People use JSX these days, they write self-closing tags daily and they get the result they want ... that's (imho) what this change should be about, bring it back XHTML in a lightweight way that doesn't require server-side, mime types, and all that stuff to exist, as opt-in feature.

Thank you, I think I've nothing else to add in here.

P.S. this <!doctype x-html> is not a joke anymore to me.

@bogger33
Copy link

bogger33 commented Jul 6, 2023

An exploit here could be something like:

<style/><img src onerror=alert(1)><style></style>

The easy solution for a sanitizer is to just resolve the auto-closed tags, so it changes <style/> to <style></style>, and then it can treat it like any other piece of HTML, without having to pay mind to context.

@zcorpan
Copy link
Member

zcorpan commented Jul 6, 2023

The snippet is what an attacker would use as user-generated content that is allowed by the page's sanitizer

Not to my understanding. What I'm expecting is that once an explicit opt-in flag is used, a non closing <img> would fail at the parser level. It's all in or nothing, or this won't go anywhere indeed as proposal.

I think there's literally zero interest from browsers to introduce a mode of HTML parsing that aborts on syntax errors. But @jakearchibald didn't ask for that, so it seems out of scope for this issue.

But as for the img, imagine that it also has a slash.

HTML changes quite a bit

I need to scroll a lot to see any HTML change in there ... it's all about Babel involved folks or JS APIs so I am not sure what you mean there ... what I meant was in term of parsing abilities, as this thread underlines it's nobody intent to change that.

They are changes to the HTML standard. HTML is more than the parser.

So, imagine your example either has issue already, so it's not a point, or it would throw with this flag on because the image tag is not self-closing, what are your real-world concern here? Do you have any example that is not already failing with current status-quo around this proposal?

The concern is explained in #9491 (comment)

@WebReflection
Copy link
Contributor

if parsers are already available I don't get the concern ... enable a lightweight, not throwing, XHTML parser (as it's already there and surely available in all tools?) and let opt-in people deal with gotcha, behind more robust tools that will ensure no gotcha happens?

@keithamus
Copy link
Contributor

If you wish to use xhtml you can set the content-type of your server responses to application/xhtml+xml. It requires a properly formed XML with a proper DTD. If you're proposing some alternate standard of xhtml that doesn't do those things, that's probably worth filing another issue for.

In this issue I think Jake has made a clear enough proposal, and I'm worried we're derailing the conversation with talk about xhtml and other formats. The chief questions (IMO) that should be answered based on Jake's original proposal:

  1. Are there any XSS concerns that block this?
  2. Is there implementer interest?
  3. Are there any backwards compatibility issues?

If you're unable to present XSS concerns, or backwards compatibility issues, then others may be able to. Allowing other's the space to formulate those in this issue thread would be the most productive step to resolving this issue. Minimising concerns from implementers will be counter productive and serves to make threads like this more difficult for other implementers to catch up on.

I'm not trying to silence healthy discussion but let's keep focussed so we can resolve the explicit concerns around the OP.

@WebReflection
Copy link
Contributor

My XHTML point was an answer to outdated tools that, if legacy enough, won’t have issues with Jake’s proposal. But I’ll stay away as observer as it’s clear none of my point is being considered. Good luck Jake

@cunlic
Copy link

cunlic commented Jul 6, 2023

I'm not really bothered about the ability to self-close a div, but it would be very nice if self-closing a script tag worked, when 90% of the time it is going to link to external content.
e.g. this should be allowed:

<script src="file.js"/>

@WebReflection
Copy link
Contributor

WebReflection commented Jul 6, 2023

@cunlic add every single custom element that doesn't need children in it to the equation, but it requires a long name to disambiaguate by standard specs (registry) 👍

@zealvurte
Copy link

Just to clarify, as I don't think it's made clear in the proposal yet, would it be:

  1. If the / is preceded by an unquoted attribute value with no space, continue to treat is as part of the attribute
  2. If it's a void element, continue to ignore the /
  3. Otherwise, self-close the element

I imagine there could be some desire for the first point to change too, which sounds unwise for compatibility or security; although, not changing it will probably support the continued favour of preceding /> with a space in all cases.

The issues arising from the parsing of a self-closing element that causes the parsing of the rest of the document to differ depending on support sounds like a potential blocker unless all likely exploits in both directions can be avoided or mitigated (it wouldn't surprise me to learn that some of the native elements most desired to self-close are the ones that would have to still be excluded from doing so). Even if that is overcome, it seems willingness to change the parser has long been low, and the short-term incompatibilities between browsers, servers and tools deemed too high a burden, which has killed past related proposals.

Unfortunately, this is a breaking change even with a switch, and wouldn't be backwards compatible. An old parser would not be able to parse a new document in a graceful manner, so most uses would need to support both versions and do content negotiation for many transitional years. A new parser would have to support both old documents without the switch, and new ones with it, indefinitely, which is above and beyond something like quirks mode. There's little to no appetite for that, especially after XHTML, so I expect this won't go anywhere again.

Having said that, I'd love if this would be possible without all the issues.

@jakearchibald
Copy link
Contributor Author

jakearchibald commented Jul 7, 2023

  • If the / is preceded by an unquoted attribute value with no space, continue to treat is as part of the attribute

Agreed.

  • If it's a void element, continue to ignore the /

I think this could be a parse error if the / is omitted, but yeah, ultimately it's ignored.

  • Otherwise, self-close the element

Agreed.

@hsivonen
Copy link
Member

It's pretty clear that we can't make the change proposed here in a way that would affect all existing HTML, which the OP even acknowledges by proposing an opt-in. As for making it opt-in as proposed, I think the lesson we should have learned from the implicit big switch between parsers for innerHTML depending on a flag on the document itself as well as smaller switches with the HTML parser for fragment vs. whole document, scripting enabled vs. disabled, and table-closes-p vs. does not close it is that parsing mode switches at distance are bad.

I think it would be incongruous to introduce new switch a time when we wish we could remove some of the existing mode axes of the HTML parser and are trying to make the successor for innerHTML (setHTML) not to rely on modes inferred from the document.

I think we should acknowledge that it's not great that the list of void elements needs to be hard-coded but that changing the language on that point would be worse than keeping having the characteristic of the language that predates the DOM, etc. Therefore, I think we should close this request as rejected.

@saschanaz
Copy link
Member

A terrible random idea during my PTO: can we do "strict self closing" with double slashes: <div //>?

@cunlic
Copy link

cunlic commented Aug 29, 2023

The double slash would be weird... as you'd likely have to enforce the leading whitespace as well.

Since attributes do not need to be quoted (if they don't contain spaces)... and multiple trailing slashes might be in a URL, you get weird stuff like this:

All 3 URLs work, but if they 'self-closed' there would be no text to click on to initiate the links:
https://digg.com/news/

https://digg.com/news///

https://digg.com/news///////

(image added showing Firefox view source, highlighting where it interprets the end of the links)
image

@zcorpan
Copy link
Member

zcorpan commented Aug 29, 2023

@saschanaz no, that would have similar issues with web compat and XSS and also make the HTML syntax even more complex.

Per @hsivonen's comment, Mozilla is opposed to the change proposed in OP and I see no evidence of interest from other browser vendors. Closing as wontfix.

@zcorpan zcorpan closed this as not planned Won't fix, can't repro, duplicate, stale Aug 29, 2023
@Tristan971
Copy link

Tristan971 commented Sep 7, 2023

It’s understandable to mark this as wontfix, but it’s still a bit sad to collectively shrug at the fact that we gave up on having a simpler and intuitive element syntax (ie with self-closing tag support across the board like nearly everyone expects) essentially just to support unquoted attributes… (which in contrast look like they’re only allowed because of relaxed parsing rules)

@RReverser
Copy link
Member

Wonder if it would be possible to allow self-closing at least for the custom elements (those with - in their tag). They're relatively new, so hopefully no web compat to break yet with such change.

@WebReflection
Copy link
Contributor

@RReverser that's a bit of a slippery slope because custom elements can't be known AOT so that any element with a - in the name should allow that even if not practically a custom element ... inevitably leading people to abuse the feature and reduce further any semantic meaning of the layout: <s-div />, <ic-on /> and so on.

@RReverser
Copy link
Member

any element with a - in the name should allow that

I thought - was chosen exactly for that reason - to statically distinguish custom elements from other ones, as it was determined that legacy web content normally doesn't use - so it can be a good marker.

@WebReflection
Copy link
Contributor

WebReflection commented Sep 7, 2023

The - is imposed as mandatory only in customElements.define(name, ...args) (it's a global registry constrain) but you can write since about ever <a-div> without ever registering that name ... after all, custom elements definitions can be lazy too so a - doesn't provide any guarantees that element will be a custom one, and specially template literals with simple CSS companion libraries that style a-div{} or anything else could, and will, benefit from that self-closing tag ... JSX (or ESX) users wouldn't care anyway, but the rest of the people producing HTML might because self-closing tags is absolutely desired and handy so "brace yourselves" if - becomes the only self-closing capable way.

@zcorpan
Copy link
Member

zcorpan commented May 15, 2024

Wonder if it would be possible to allow self-closing at least for the custom elements (those with - in their tag). They're relatively new, so hopefully no web compat to break yet with such change.

That was proposed in #721

@WebReflection
Copy link
Contributor

@zcorpan that never moved forward since 2020 though ... not sure it's going to change now as that requires a different parsing goal ad-hoc for CE only and I think that's even worse than asking parsers to not ignore self-closing in the wild 😥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs compat analysis needs implementer interest Moving the issue forward requires implementers to express interest topic: parser
Development

No branches or pull requests