Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser thinks that <= is a valid open tag. #220

Closed
ELadner opened this issue Sep 11, 2017 · 6 comments
Closed

Parser thinks that <= is a valid open tag. #220

ELadner opened this issue Sep 11, 2017 · 6 comments

Comments

@ELadner
Copy link

ELadner commented Sep 11, 2017

In vanilla NodeBB (via the sanitize-html plugin, which is built using htmlparser2), any combination of <, >, <=, >= can be entered in a post and the results are rendered correctly.

In NodeBB, sanitize-html installed (which uses htmlparser2 internally), < and > are handled correctly, but using <= in a post will treat it as an HTML start tag and not include anything beyond the symbol combination.

This, for example: "this <= is a >= test" renders as "this = test" (the > after 'is a' is treated as the close of the <= ).

Reported to sanatize-html initially and the developer there pointed out the htmlparser2 involvement.

@boutell
Copy link

boutell commented Sep 11, 2017

The HTML5 spec seems to indicate that this is a point where an unexpected character like = should cause the interpretation as a tag to be revoked and the < and = to be consumed as text instead, when aiming to be tolerant, as htmlparser2 generally is. https://www.w3.org/TR/html5/syntax.html#tag-open-state

@ELadner
Copy link
Author

ELadner commented Sep 11, 2017

Thanks for the additional info.

I'm all for tolerance and flexibility, but <= shows up a lot when you're pasting code blocks. Hopefully, the benevolent developers will grant an exception for that bit of flexibility and receive it as plain text.

Note there was a recent issue with <[ being recognized as a start tag, and that issue was handled nicely.

@boutell
Copy link

boutell commented Sep 11, 2017 via email

@ELadner
Copy link
Author

ELadner commented Sep 11, 2017

Well, if htmlparser2 is being more lenient than the HTML5 spec (and I agree with your assessment of the spec that 8.2.4.8 says any character other than ! or / should dump the parser back in character mode), removing that leniency would be moving the parser back to less tolerance and less flexibility, no? (i.e. stricter adherence to the spec = less flexibility and tolerance for the parser).

I guess it depends on which side you're looking at the issue from, though.

@boutell
Copy link

boutell commented Sep 11, 2017 via email

@fb55
Copy link
Owner

fb55 commented Jan 20, 2018

Please refer to inikulin/parse5 if you need a spec-compliant parser.

@fb55 fb55 closed this as completed Jan 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants