-
-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Illegal character entity using XMLStreamReader on value encoded by external service #165
Comments
Ok but is this not invalid XML content, and as such to be fixed by whatever produced it? This based on thinking that encoder is making the mistake of blindly encoding 2 Java (?) UCS-2 surrogate characters as separate entities, producing what is not well-formed XML as per XML specification. |
Right, I did some research in problems with xml parsing and surrogate pairs I see bugs reported to external systems/libraries, which produces that. EDIT: The next problem with reader comes from Exchange Web Service, where email message contains character |
@Magmaruss yes, it's common to have legacy systems that cannot really be fixed. I don't have a good recipe for this: if this was lower level, you could implement a wrapping but I don't think anything in there (or in 2 earlier ones linked from it) would help. May be worth reading just in case. If you have time and interest, adding new configuration property that would allow inclusion of surrogates could be acceptable as well: something disabled by default, but that can be enabled. It'd then produce what looks, I think, like valid pair of Java |
Merged, to be included in 6.6.0 release. |
Hello.
Using communication with external service and reading the response I met problematic value. One of xml elements has value with emoji character encoded as two surrogate characters instead of one code-point which is problematic for XMLStreamReader and throws exception.
Provided value (throws exception):
Merry Christmas ��
The same value encoded by one entity code (works good):
Merry Christmas 🎅
Merry Christmas 🎅
Exception:
Reproduction code:
The text was updated successfully, but these errors were encountered: