Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow documents starting with <!doctype html> (UnexpectedBang) #230

Closed
Hocuri opened this issue Aug 19, 2020 · 4 comments
Closed

Allow documents starting with <!doctype html> (UnexpectedBang) #230

Hocuri opened this issue Aug 19, 2020 · 4 comments

Comments

@Hocuri
Copy link

Hocuri commented Aug 19, 2020

Currently, quick-xml returns UnexpectedBang if a document starts with <!doctype html>, which caused this issue: deltachat/deltachat-core-rust#1804. The problem is that quick-xml will refuse to read the rest of the file then.

@jbg
Copy link

jbg commented Mar 30, 2021

Given that that's not valid XML, I would not expect an XML library to parse it. It seems like there still isn't a clear answer for whether this library intends to be an XML parser or something else, though.

@tafia
Copy link
Owner

tafia commented Mar 30, 2021

This library does intend to parse valid xmls only by default yes. Html parsing could be supported on a case by case and best effort basis.

@jbg
Copy link

jbg commented Mar 30, 2021

IMO, given that there already exist quite good crates for HTML parsing in Rust (e.g. html5ever), and given that HTML parsing is very different from XML parsing (despite them looking similar), it would simplify things a lot and probably lead to a higher quality library if HTML parsing was explicitly out of scope for quick-xml.

@Mingun
Copy link
Collaborator

Mingun commented May 25, 2022

Currently parsing documents with <!doctype html> (lowercased) are supported and ensured by this test:

#[cfg(feature = "escape-html")]
#[test]
fn html5() {
test(
include_str!("documents/html5.html"),
include_str!("documents/html5.txt"),
false,
);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants