-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Name predicates: which XML version? #607
Comments
I'm sure it's just XML 1.0 and not 1.1. I'm also not surprised it's inconsistent with the spec. The implementation for isNameChar in scala-xml hasn't fundamentally changed in 20 years. There's probably not someone around to explain the rationale for the differences. |
I did some homework. tl;dr:
I am willing to update docs or synchronize the predicates with a particular XML standard. -- What's defined in scalacheck-xml is fully consistent with the JDK's XMLChar. This is XML 1.0, Fourth Edition. I found an ancient rant about Fifth Edition, which is the status quo in Xerces. The scaladoc on
Furthermore, since scala-xml just delegates to Unicode character types, its predicates are a function of the JVM version. XML 1.0 Fifth Edition's "intention is to be inclusive rather than exclusive, so that writing systems not yet encoded in Unicode can be used in XML names," but it's still a fixed set. |
I am trying to implement a Scalacheck XML generator that round trips through writing and parsing. I've run into a discrepancy between the character sets in scala-xml and the JVM internals. Is it expected that scala-xml's alphabet targets a specific version of the XML spec? I'm finding that the scala-xml alphabet does not match the JVM's idea of XML 1.0 nor XML 1.1.
I tried to make this a scala-cli script, but I can't get it to accept the com.sun.org imports. I have to run this on Java 8 (specifically, I used 1.8.0_292) to avoid trouble with the module system.
scala-xml
I think I can limit my generators to a characters that pass both the JVM's and scala-xml's predicate, but I'm curious if this difference is known and intentional. Thanks!
The text was updated successfully, but these errors were encountered: