Allow self-closing tags everywhere #9491

jakearchibald · 2023-07-06T09:26:20Z

People really seem to like self-closing tag syntax (see the replies to https://twitter.com/jaffathecake/status/1676843832284004353).

Maybe a switch should be added to allow them to be used on all elements?

<!doctype html allow-self-closing-or-whatever>
<!-- Now self-closes: -->
<div/>
<my-component/>
<script src="…"/>

Right now, documents can be a mix of rules where /> is largely meaningless, except in SVG and MathML. Making everything consistent seems… good?

The text was updated successfully, but these errors were encountered:

WebReflection · 2023-07-06T13:06:00Z

I have "a tiny Déjà vu"

_{(this is discussed and desired since at least 2016 btw ... glad we keep desiring this in 2023 - and rightly so)}

zcorpan · 2023-07-06T13:12:12Z

I think we shouldn't introduce new parser flags that change parsing behavior. They cause XSS issues. (In #9426 we're investigating if we can remove the scripting enabled flag.)

cc @whatwg/html-parser

WebReflection · 2023-07-06T13:23:27Z

it's opt in, it won't cause issues to developers opting in + it's not about scripting neither, it's just a "don't ignore that /> ever" desired feature which, instead, could lead to XSS or any other kind of issue if people believe that /> meant the end of that tag.

As half a joke though:

<!doctype x-html>

would be lovely

zcorpan · 2023-07-06T13:40:26Z

Sites that opt in will expose themselves to XSS issues if they also use a sanitizer that doesn't support this.

Even if the sanitizer supports this, it can be confused by the different parsing in different documents, which again can cause XSS issues.

Example: https://bugzilla.mozilla.org/show_bug.cgi?id=1615315

An exploit here could be something like:

<style/><img src onerror=alert(1)><style></style>

WebReflection · 2023-07-06T13:45:08Z

Sites that opt in will expose themselves to XSS issues if they also use a sanitizer that doesn't support this.

it's like saying, if you switch to Python 3 but you use Python 2 tools to lint your code expect issues ... not sure I am following.

An exploit here could be something like: ...

not sure I am following that neither ... you are creating invalid layout on purpose, see my previous Python v3 VS v2 analogy.

P.S. in case my "half joke" hint wasn't clear, if there's any way to enable this, parser should throw if meant to be void elements are not self-closed as those are not welcome in the parser with that flag on so I don't see issues or any extra XSS that's not possible already in HTML5.

keithamus · 2023-07-06T13:49:53Z

it's like saying, if you switch to Python 3 but you use Python 2 tools to lint your code expect issues ... not sure I am following.

The difference is that current sanitizer libraries will have no mechanisms to detect whether or not they're in "non-self closing mode" vs "self closing mode". Declaring self closing mode on a page and using a library which does not support/detect will expose authors to vulnerabilities. There's no reasonable expectation that those libraries would support such a mode. A constraint for implementing this is to avoid such a scenario. Simon is saying that an opt-in does not avoid the scenario, therefore fails to meet the constraint.

zcorpan · 2023-07-06T13:51:39Z

if there's any way to enable this, parser should throw

You can use XML.

WebReflection · 2023-07-06T13:56:18Z

Declaring self closing mode on a page and using a library which does not support/detect will expose authors to vulnerabilities

we have parseFromString where if you pass the wrong mime you're subject to the same issue you're mentioning ... right? if I use a library that doesn't support that syntax I should change library or contribute to make it compatible? ... as a new flag? ... like all parsers / transpilers / linters have ?

You can use XML.

to see no image and have no HTML at all on the page, even with correct layout? can we keep the conversation focused, please?

sideshowbarker · 2023-07-06T14:02:10Z

Worth adding a reminder here that the only effect the doctype has in browsers is to prevent browsers from using quirks mode to render the document: Without the doctype, browsers use quirks mode; with the doctype, they don’t. Ideally, we’d not want to have the doctype at all — because it has zero purpose other than preventing quirks mode — but it’s one of those legacy misfeatures that we’re now stuck with forever for backward-compat reasons.

So, given that, using the doctype as a way to opt into causing any particular other behavior in browsers would likely cause a side effect of leading people to have the wrong mental model of what the doctype is — it could mislead people into thinking the doctype in HTML has some general meaning and purpose in browsers that it doesn’t actually have, and that the allow-self-closing-or-whatever token is just adding to the intended purpose of the doctype in browsers, but which it actually isn’t.

sideshowbarker · 2023-07-06T14:14:19Z

I’ll also add that, from previous discussions we’d had with implementors about other proposals that require changes to the parsing algorithm and to HTML parsers in browsers: Implementers are very unlikely to support/implement further changes to parsing behavior except for very compelling reasons. And I think we’d find that implementors won’t judge this to be a compelling reason to make further changes to the parsing behavior.

WebReflection · 2023-07-06T14:15:44Z

the doctype has in browsers is to prevent browsers from using quirks mode to render the document

which is a wonderful feature and it happens to play a wonderful role ... ~~see AMP also using it to grant no quirks and bootstrap itself (let's not talk about AMP in general, it was meant as example where you avoid quirks and you enable features in one step).~~ (edit: just remembered they added an attribute to <html> instead, sorry

but it’s one of those legacy misfeatures that we’re now stuck with forever for backward-compat reasons.

I don't think it's that bad ... it's like a she-bang on top of executable and it serves a nice purpose ... the alternative is to go through a new mime-type, a new file extension that can't be .xhtml and in doing so, we'll go through years of back and forward in the making.

The never ignored self-closing tags has been desired for already 7+ years and if some lovely legacy artifact could help everyone move forward faster, I'd say "why not" ... but then again, literally any way to have this landed would (personally) work to me.

keithamus · 2023-07-06T14:20:15Z

Declaring self closing mode on a page and using a library which does not support/detect will expose authors to vulnerabilities

we have parseFromString where if you pass the wrong mime you're subject to the same issue you're mentioning ... right?

It wouldn't. parseFromString, being a browser API, would be concordant with the documents parse modes. That is to say browsers which acknowledge self closing tags would also make parseFromString do the same, and browsers which don't will not. However, the problem lies with user land libraries. If the document has "modes" that I can opt into, current versions of user land libraries (just as an example, DOMPurify) have no way to detect that mode and so we enter a state where I am using a sanitizer to sanitize content that includes self closing tags, while the sanitizer does not have the capability to parse them.

WebReflection · 2023-07-06T14:22:40Z

However, the problem lies with user land libraries

I wonder if these concerns were raised when HTML5 saw the light ... but again, the argument about "user land libraries" being outdated has never been an issue for the entirety of the TC39 or CSS story so I wonder why this is being raised in here.

keithamus · 2023-07-06T14:24:30Z

However, the problem lies with user land libraries

I wonder if these concerns were raised when HTML5 saw the light ... but again, the argument about "user land libraries" being outdated has never been an issue for the entirety of the TC39 or CSS story so I wonder why this is being raised in here.

It's being raised here because changes to the parsing algorithm can introduce XSS vulnerabilities. It is raised in any discussion about changing the parsing algorithm. It is something that each change to the parsing algorithm must navigate.

WebReflection · 2023-07-06T14:26:09Z

OK, but the only example is a malformed layout with an old XSS thing from the 90s' ... does anyone else has a compelling XSS story / example to show and, if that's the case, what are the parsing libraries we should notify about this eventual change as "opt-in flag" to allow/consider?

keithamus · 2023-07-06T14:35:22Z

OK, but the only example is a malformed layout with an old XSS thing from the 90s' ... does anyone else has a compelling XSS story / example to show and, if that's the case, what are the parsing libraries we should notify about this eventual change as "opt-in flag" to allow/consider?

Notifying parsing libraries to update does not solve the issue. A library can be updated but all prior versions will be vulnerable. Those older versions and their installations do not disappear. Changes to the parser must not introduce security vulnerabilities in existing software.

WebReflection · 2023-07-06T14:42:57Z

I agree with what you are saying but I am also hearing HTML as it is won't ever change from now on ... is that the future of the Web as seen by browser vendors?

zcorpan · 2023-07-06T14:54:41Z

You can use XML.

to see no image and have no HTML at all on the page, even with correct layout? can we keep the conversation focused, please?

You can use HTML elements in XML (which you might also call XHTML).

zcorpan · 2023-07-06T14:59:30Z

not sure I am following that neither ... you are creating invalid layout on purpose

The snippet is what an attacker would use as user-generated content that is allowed by the page's sanitizer (if it allows style elements, but this is not limited to the style element). The issue with the snippet isn't "layout" but that it executes attacker-controlled script (i.e., it's XSS).

zcorpan · 2023-07-06T15:08:08Z

I agree with what you are saying but I am also hearing HTML as it is won't ever change from now on

HTML changes quite a bit, but changes need to not introduce new security risks for users. Changes to the HTML parser are particularly security-sensitive.

WebReflection · 2023-07-06T15:17:42Z

You can use HTML elements in XML (which you might also call XHTML).

imagine it's 22 years I am doing this and XSLT is probably the next thing you'll tell me about ... still, I can't use just XML parser for HTML content, and I trust you know that too.

The snippet is what an attacker would use as user-generated content that is allowed by the page's sanitizer

Not to my understanding. What I'm expecting is that once an explicit opt-in flag is used, a non closing <img> would fail at the parser level. It's all in or nothing, or this won't go anywhere indeed as proposal.

HTML changes quite a bit

I need to scroll a lot to see any HTML change in there ... it's all about Babel involved folks or JS APIs so I am not sure what you mean there ... what I meant was in term of parsing abilities, as this thread underlines it's nobody intent to change that.

Changes to the HTML parser are particularly security-sensitive.

So, imagine your example either has issue already, so it's not a point, or it would throw with this flag on because the image tag is not self-closing, what are your real-world concern here? Do you have any example that is not already failing with current status-quo around this proposal?

WebReflection · 2023-07-06T15:25:07Z

maybe this is worth clarifying for the sake of this discussion, and I don't know if @jakearchibald had a different idea, but a flag to enable exact same XHTML parser that would fail if void elements are not self closing is what I am after, without all the complications that XHTML needs (impossibly to remember doctype, special content-type, and so on).

Every single browser is already capable of that, and if you read carefully the Jake's mentioned thread, everybody is using linters, tools, parsers, to allow and want self closing tags everywhere it's needed, which is not like 20 years ago when React and JSX didn't exist, everyone writes self closing tags even out of a mistake / rather habit, but that's normal.

Accordingly, all arguments for something already available as a STRICT DTD XHTML parser for when such flag is used, would be what I am after ... if anyone wants instead a parsing for HTML that sometimes is OK with <br />, sometimes understands <span /> and sometimes it doesn't, that's not at all what I am personally after, and not what I'd ever want to see on the Web (edit: ironically, that's the current misleading status-quo, producing more XSS vectors - by accident - it's aiming to solve).

People use JSX these days, they write self-closing tags daily and they get the result they want ... that's (imho) what this change should be about, bring it back XHTML in a lightweight way that doesn't require server-side, mime types, and all that stuff to exist, as opt-in feature.

Thank you, I think I've nothing else to add in here.

P.S. this <!doctype x-html> is not a joke anymore to me.

bogger33 · 2023-07-06T15:41:59Z

An exploit here could be something like:
<style/><img src onerror=alert(1)><style></style>

The easy solution for a sanitizer is to just resolve the auto-closed tags, so it changes <style/> to <style></style>, and then it can treat it like any other piece of HTML, without having to pay mind to context.

zcorpan · 2023-07-06T16:47:46Z

The snippet is what an attacker would use as user-generated content that is allowed by the page's sanitizer

Not to my understanding. What I'm expecting is that once an explicit opt-in flag is used, a non closing <img> would fail at the parser level. It's all in or nothing, or this won't go anywhere indeed as proposal.

I think there's literally zero interest from browsers to introduce a mode of HTML parsing that aborts on syntax errors. But @jakearchibald didn't ask for that, so it seems out of scope for this issue.

But as for the img, imagine that it also has a slash.

HTML changes quite a bit

I need to scroll a lot to see any HTML change in there ... it's all about Babel involved folks or JS APIs so I am not sure what you mean there ... what I meant was in term of parsing abilities, as this thread underlines it's nobody intent to change that.

They are changes to the HTML standard. HTML is more than the parser.

So, imagine your example either has issue already, so it's not a point, or it would throw with this flag on because the image tag is not self-closing, what are your real-world concern here? Do you have any example that is not already failing with current status-quo around this proposal?

The concern is explained in #9491 (comment)

WebReflection · 2023-07-06T16:55:05Z

if parsers are already available I don't get the concern ... enable a lightweight, not throwing, XHTML parser (as it's already there and surely available in all tools?) and let opt-in people deal with gotcha, behind more robust tools that will ensure no gotcha happens?

keithamus · 2023-07-06T17:04:57Z

If you wish to use xhtml you can set the content-type of your server responses to application/xhtml+xml. It requires a properly formed XML with a proper DTD. If you're proposing some alternate standard of xhtml that doesn't do those things, that's probably worth filing another issue for.

In this issue I think Jake has made a clear enough proposal, and I'm worried we're derailing the conversation with talk about xhtml and other formats. The chief questions (IMO) that should be answered based on Jake's original proposal:

Are there any XSS concerns that block this?
Is there implementer interest?
Are there any backwards compatibility issues?

If you're unable to present XSS concerns, or backwards compatibility issues, then others may be able to. Allowing other's the space to formulate those in this issue thread would be the most productive step to resolving this issue. Minimising concerns from implementers will be counter productive and serves to make threads like this more difficult for other implementers to catch up on.

I'm not trying to silence healthy discussion but let's keep focussed so we can resolve the explicit concerns around the OP.

WebReflection · 2023-07-06T18:21:15Z

My XHTML point was an answer to outdated tools that, if legacy enough, won’t have issues with Jake’s proposal. But I’ll stay away as observer as it’s clear none of my point is being considered. Good luck Jake

cunlic · 2023-07-06T18:59:25Z

I'm not really bothered about the ability to self-close a div, but it would be very nice if self-closing a script tag worked, when 90% of the time it is going to link to external content.
e.g. this should be allowed:

<script src="file.js"/>

WebReflection · 2023-07-06T19:45:20Z

@cunlic add every single custom element that doesn't need children in it to the equation, but it requires a long name to disambiaguate by standard specs (registry) 👍

zealvurte · 2023-07-06T21:59:20Z

Just to clarify, as I don't think it's made clear in the proposal yet, would it be:

If the / is preceded by an unquoted attribute value with no space, continue to treat is as part of the attribute
If it's a void element, continue to ignore the /
Otherwise, self-close the element

I imagine there could be some desire for the first point to change too, which sounds unwise for compatibility or security; although, not changing it will probably support the continued favour of preceding /> with a space in all cases.

The issues arising from the parsing of a self-closing element that causes the parsing of the rest of the document to differ depending on support sounds like a potential blocker unless all likely exploits in both directions can be avoided or mitigated (it wouldn't surprise me to learn that some of the native elements most desired to self-close are the ones that would have to still be excluded from doing so). Even if that is overcome, it seems willingness to change the parser has long been low, and the short-term incompatibilities between browsers, servers and tools deemed too high a burden, which has killed past related proposals.

Unfortunately, this is a breaking change even with a switch, and wouldn't be backwards compatible. An old parser would not be able to parse a new document in a graceful manner, so most uses would need to support both versions and do content negotiation for many transitional years. A new parser would have to support both old documents without the switch, and new ones with it, indefinitely, which is above and beyond something like quirks mode. There's little to no appetite for that, especially after XHTML, so I expect this won't go anywhere again.

Having said that, I'd love if this would be possible without all the issues.

jakearchibald · 2023-07-07T08:21:41Z

If the / is preceded by an unquoted attribute value with no space, continue to treat is as part of the attribute

Agreed.

If it's a void element, continue to ignore the /

I think this could be a parse error if the / is omitted, but yeah, ultimately it's ignored.

Otherwise, self-close the element

Agreed.

hsivonen · 2023-08-28T11:18:37Z

It's pretty clear that we can't make the change proposed here in a way that would affect all existing HTML, which the OP even acknowledges by proposing an opt-in. As for making it opt-in as proposed, I think the lesson we should have learned from the implicit big switch between parsers for innerHTML depending on a flag on the document itself as well as smaller switches with the HTML parser for fragment vs. whole document, scripting enabled vs. disabled, and table-closes-p vs. does not close it is that parsing mode switches at distance are bad.

I think it would be incongruous to introduce new switch a time when we wish we could remove some of the existing mode axes of the HTML parser and are trying to make the successor for innerHTML (setHTML) not to rely on modes inferred from the document.

I think we should acknowledge that it's not great that the list of void elements needs to be hard-coded but that changing the language on that point would be worse than keeping having the characteristic of the language that predates the DOM, etc. Therefore, I think we should close this request as rejected.

saschanaz · 2023-08-29T01:54:57Z

A terrible random idea during my PTO: can we do "strict self closing" with double slashes: <div //>?

cunlic · 2023-08-29T14:14:58Z

The double slash would be weird... as you'd likely have to enforce the leading whitespace as well.

Since attributes do not need to be quoted (if they don't contain spaces)... and multiple trailing slashes might be in a URL, you get weird stuff like this:

All 3 URLs work, but if they 'self-closed' there would be no text to click on to initiate the links:
https://digg.com/news/

https://digg.com/news///

https://digg.com/news///////

(image added showing Firefox view source, highlighting where it interprets the end of the links)

zcorpan · 2023-08-29T15:38:50Z

@saschanaz no, that would have similar issues with web compat and XSS and also make the HTML syntax even more complex.

Per @hsivonen's comment, Mozilla is opposed to the change proposed in OP and I see no evidence of interest from other browser vendors. Closing as wontfix.

Tristan971 · 2023-09-07T11:14:28Z

It’s understandable to mark this as wontfix, but it’s still a bit sad to collectively shrug at the fact that we gave up on having a simpler and intuitive element syntax (ie with self-closing tag support across the board like nearly everyone expects) essentially just to support unquoted attributes… (which in contrast look like they’re only allowed because of relaxed parsing rules)

RReverser · 2023-09-07T11:41:07Z

Wonder if it would be possible to allow self-closing at least for the custom elements (those with - in their tag). They're relatively new, so hopefully no web compat to break yet with such change.

WebReflection · 2023-09-07T11:46:51Z

@RReverser that's a bit of a slippery slope because custom elements can't be known AOT so that any element with a - in the name should allow that even if not practically a custom element ... inevitably leading people to abuse the feature and reduce further any semantic meaning of the layout: <s-div />, <ic-on /> and so on.

RReverser · 2023-09-07T11:49:59Z

any element with a - in the name should allow that

I thought - was chosen exactly for that reason - to statically distinguish custom elements from other ones, as it was determined that legacy web content normally doesn't use - so it can be a good marker.

WebReflection · 2023-09-07T11:57:00Z

The - is imposed as mandatory only in customElements.define(name, ...args) (it's a global registry constrain) but you can write since about ever <a-div> without ever registering that name ... after all, custom elements definitions can be lazy too so a - doesn't provide any guarantees that element will be a custom one, and specially template literals with simple CSS companion libraries that style a-div{} or anything else could, and will, benefit from that self-closing tag ... JSX (or ESX) users wouldn't care anyway, but the rest of the people producing HTML might because self-closing tags is absolutely desired and handy so "brace yourselves" if - becomes the only self-closing capable way.

zcorpan · 2024-05-15T13:07:41Z

Wonder if it would be possible to allow self-closing at least for the custom elements (those with - in their tag). They're relatively new, so hopefully no web compat to break yet with such change.

That was proposed in #721

WebReflection · 2024-05-15T13:22:19Z

@zcorpan that never moved forward since 2020 though ... not sure it's going to change now as that requires a different parsing goal ad-hoc for CE only and I think that's even worse than asking parsers to not ignore self-closing in the wild 😥

keithamus added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: parser needs compat analysis labels Jul 6, 2023

sideshowbarker closed this as completed Jul 6, 2023

sideshowbarker reopened this Jul 6, 2023

This comment was marked as resolved.

Sign in to view

hsivonen mentioned this issue Aug 28, 2023

Provide more context around the permissibility of the XHTML slash on void elements #9642

Open

zcorpan closed this as not planned Won't fix, can't repro, duplicate, stale Aug 29, 2023

Serator mentioned this issue Apr 3, 2024

Svelte parses HTML all wrong sveltejs/svelte#11052

Closed

Allow self-closing tags everywhere #9491

Allow self-closing tags everywhere #9491

Comments

jakearchibald commented Jul 6, 2023

WebReflection commented Jul 6, 2023 • edited Loading

zcorpan commented Jul 6, 2023

WebReflection commented Jul 6, 2023

zcorpan commented Jul 6, 2023

WebReflection commented Jul 6, 2023 • edited Loading

keithamus commented Jul 6, 2023

zcorpan commented Jul 6, 2023

WebReflection commented Jul 6, 2023 • edited Loading

sideshowbarker commented Jul 6, 2023 • edited Loading

This comment was marked as resolved.

sideshowbarker commented Jul 6, 2023 • edited Loading

WebReflection commented Jul 6, 2023 • edited Loading

keithamus commented Jul 6, 2023

WebReflection commented Jul 6, 2023

keithamus commented Jul 6, 2023

WebReflection commented Jul 6, 2023

keithamus commented Jul 6, 2023

WebReflection commented Jul 6, 2023 • edited Loading

zcorpan commented Jul 6, 2023

zcorpan commented Jul 6, 2023

zcorpan commented Jul 6, 2023

WebReflection commented Jul 6, 2023

WebReflection commented Jul 6, 2023 • edited Loading

bogger33 commented Jul 6, 2023

zcorpan commented Jul 6, 2023

WebReflection commented Jul 6, 2023

keithamus commented Jul 6, 2023

WebReflection commented Jul 6, 2023

cunlic commented Jul 6, 2023

WebReflection commented Jul 6, 2023 • edited Loading

zealvurte commented Jul 6, 2023

jakearchibald commented Jul 7, 2023 • edited Loading

hsivonen commented Aug 28, 2023

saschanaz commented Aug 29, 2023

cunlic commented Aug 29, 2023

zcorpan commented Aug 29, 2023

Tristan971 commented Sep 7, 2023 • edited Loading

RReverser commented Sep 7, 2023

WebReflection commented Sep 7, 2023

RReverser commented Sep 7, 2023

WebReflection commented Sep 7, 2023 • edited Loading

zcorpan commented May 15, 2024

WebReflection commented May 15, 2024

WebReflection commented Jul 6, 2023 •

edited

Loading

WebReflection commented Jul 6, 2023 •

edited

Loading

WebReflection commented Jul 6, 2023 •

edited

Loading

sideshowbarker commented Jul 6, 2023 •

edited

Loading

sideshowbarker commented Jul 6, 2023 •

edited

Loading

WebReflection commented Jul 6, 2023 •

edited

Loading

WebReflection commented Jul 6, 2023 •

edited

Loading

WebReflection commented Jul 6, 2023 •

edited

Loading

WebReflection commented Jul 6, 2023 •

edited

Loading

jakearchibald commented Jul 7, 2023 •

edited

Loading

Tristan971 commented Sep 7, 2023 •

edited

Loading

WebReflection commented Sep 7, 2023 •

edited

Loading