Skip to content

Commit

Permalink
Don't allow [ or ] in XML names.
Browse files Browse the repository at this point in the history
This is an example of a DOCTYPE that was not being parsed correctly
before:

```
<!DOCTYPE language[
  <!ENTITY nmtoken "[\-\w\d\.:_]+">
  <!ENTITY entref  "(#[0-9]+|#[xX][0-9A-Fa-f]+|&nmtoken;);">
]>
```

xml-conduit was parsing `language[` as the root element name.

I have kept to the most minimal possible change in this
PR, because I don't want to break anything inadvertently. However,
the current parser is still far from correct. As I understand
it, only a few symbols (`_`, `-`, `.`) are allowed in element names
(in addition, `:` can be used for a namespace, but that is supported
separately in this parser).  The current parser would accept things
like `<foo~bar>`.
  • Loading branch information
jgm authored and k0ral committed Jun 20, 2023
1 parent d91f6bf commit 70e4200
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 1 deletion.
2 changes: 2 additions & 0 deletions xml-conduit/src/Text/XML/Stream/Parse.hs
Original file line number Diff line number Diff line change
Expand Up @@ -635,6 +635,8 @@ parseIdent = takeWhile1 valid <?> "identifier"
valid '/' = False
valid ';' = False
valid '#' = False
valid '[' = False
valid ']' = False
valid c = not $ isXMLSpace c

parseContent :: ParseSettings
Expand Down
2 changes: 1 addition & 1 deletion xml-conduit/test/unit.hs
Original file line number Diff line number Diff line change
Expand Up @@ -798,7 +798,7 @@ testRenderComments =do

resolvedInline :: Assertion
resolvedInline = do
Res.Document _ root _ <- return $ Res.parseLBS_ Res.def "<!DOCTYPE foo [<!ENTITY bar \"baz\">]><foo>&bar;</foo>"
Res.Document _ root _ <- return $ Res.parseLBS_ Res.def "<!DOCTYPE foo[<!ENTITY bar \"baz\">]><foo>&bar;</foo>"
root @?= Res.Element "foo" Map.empty [Res.NodeContent "baz"]
Res.Document _ root2 _ <- return $ Res.parseLBS_ Res.def "<!DOCTYPE foo [<!ENTITY bar \"baz\">]><foo bar='&bar;'/>"
root2 @?= Res.Element "foo" (Map.singleton "bar" "baz") []
Expand Down

0 comments on commit 70e4200

Please sign in to comment.