< parsing bug #306

eldiablolives · 2014-12-09T04:01:26Z

The parser with just hang and explode without any message when it encounters < in text (not <)

so for example if you have a statement in html text
..some html...
hello
< world
.. some html...

it will crash with no notification.

I'm testing it on NodeJS

kangax · 2014-12-09T11:02:13Z

Probably because it tries to parse it as a start tag. Better error notifications are on radar (we're relying on old-ish but heavily modified html parser by John Resig). Please feel free to contribute.

eldiablolives · 2014-12-09T11:31:34Z

yeah, i figured, I think its due to not being able to find a closing tag >, and then other tags come in and the whole thing explodes (but silently, which is the problem in fact) I'd contribute some code but I'm late on my delivery. I got the system to run btw, by removing all the <'s the problem is if ever it bombs along the line I won't have a way to know. I'd suggest instead of returning a parsed text as var res = myFunc(), you do var = res myFunc(..., callback -> (err, html)) then you're backward compatible and ready for error handling (even ugly one) just as long as it doesn't explode

kangax · 2014-12-09T11:46:14Z

Well, technically <, >, etc. should simply be escaped. That's always the best way to go (it's just plain better for compatibility among any HTML environments, parsers, etc.)

eldiablolives · 2014-12-09T11:59:22Z

that's a theological debate, of course it should and if you're doing simple html pages that's easy to run but if you're doing a server script that includes dozens of templates, then you add some DB content in the mix and you pull external html-ready content (syndication) you get a hodgepodge of html, which is the key reason why one would want it all minimised as it looks like dogs dinner once is all built and riddled with comments, spaces and other crap. We can't just assume coders (like me) are not idiots, that's wishful thinking on the other end as all programs should be able to exit gracefully and handle their own shit, i think your library is the big daddy, it does the job amazingly i've got 50ms/page end to end (which includes EJS, db and other back end nonsense) and thats pretty good in my book on my macAir. I promise one day I'll sit down and write a decent html parser (i've been promising that to myself for the last 10+ years lol)

kangax · 2014-12-09T14:54:16Z

Yeah, I know how hairy it could get. We'll definitely make those errors more descriptive; hopefully sooner than in 10 years :P

fregante · 2015-07-01T11:16:16Z

Similar to #375

Perhaps instead of allowing invalid HTML and prompting a visit from Zalgo, if you expect possibly-invalid HTML files, maybe pass them through something like HTML Tidy first.

kangax · 2015-07-03T17:55:50Z

Duplicate of #332

kangax mentioned this issue Jan 27, 2015

When a non html character is present in the wrong place, html-minifier will hang with no error reported. #333

Closed

kangax added the possible-feature label Jan 27, 2015

kangax closed this as completed Jul 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

< parsing bug #306

< parsing bug #306

eldiablolives commented Dec 9, 2014

kangax commented Dec 9, 2014

eldiablolives commented Dec 9, 2014

kangax commented Dec 9, 2014

eldiablolives commented Dec 9, 2014

kangax commented Dec 9, 2014

fregante commented Jul 1, 2015

kangax commented Jul 3, 2015

< parsing bug #306

< parsing bug #306

Comments

eldiablolives commented Dec 9, 2014

kangax commented Dec 9, 2014

eldiablolives commented Dec 9, 2014

kangax commented Dec 9, 2014

eldiablolives commented Dec 9, 2014

kangax commented Dec 9, 2014

fregante commented Jul 1, 2015

kangax commented Jul 3, 2015