-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DTD ENTITY definitions can themselves have entities in them #103
Comments
Even just supporting character entities in entities gets crazy, because they get expanded at the definition site and then as they're substituted in: see appendix D of the XML spec for an example. Considering that apparently no-one else has tripped up on the broken entity handling yet, I think that this is almost certainly not worth supporting properly. Also entities can be parsed as markup (!), which is also unsupported but not detected when it would be occurring. |
I've tripped on this just now...KDE XML syntax definitions have entities in the DOCTYPE which use numerical entities. |
Here's an example from scheme.xml: <!DOCTYPE language SYSTEM "language.dtd"
[
<!ENTITY xmlattrs "\s+([^"/>]++|"[^"]*+")*+">
<!ENTITY tab "	">
<!ENTITY regex "(?:[^\\(\[/]++|\\.|\[\^?\]?([^\\\[\]]++|\\.|\[(:[^:]+:\])?)++\]|\((?R)\))+">
<!ENTITY initial_ascii_set "a-zA-Z!$%&*/:<=>?~_^">
<!ENTITY initial_unicode_set "\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Mn}\p{Nl}\p{No}\p{Pd}\p{Pc}\p{Po}\p{Sc}\p{Sm}\p{Sk}\p{So}\p{Co}">
<!ENTITY initial_others "\\x[0-9a-fA-F]++;|(?![\x01-\x7f])[&initial_unicode_set;]">
<!ENTITY initial "(?:[&initial_ascii_set;]|&initial_others;)">
<!ENTITY subsequent "(?:[&initial_ascii_set;0-9-@.+\p{Nd}\p{Mc}\p{Me}]|&initial_others;)">
<!ENTITY symbol "(?:&initial;&subsequent;*+)">
]> We have numerical |
If you're worried about malicious recursive expansions, you can just put some small finite limit on recursive entity expansion (say 5). |
Note the NodeContent; when this is rendered, it becomes
&#65;
, rather thanA
.Also note that entities can reference other entities, which is the root of the infamous 'billion laughs' attack; here be dragons. Character entities are safe, though.
This might not be worth supporting properly, but it should definitely explicitly error out rather than producing garbage.
The text was updated successfully, but these errors were encountered: