Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break UTF-16 high surrogates out into separate files #79

Open
zslayton opened this issue Sep 6, 2023 · 0 comments
Open

Break UTF-16 high surrogates out into separate files #79

zslayton opened this issue Sep 6, 2023 · 0 comments
Labels

Comments

@zslayton
Copy link
Contributor

zslayton commented Sep 6, 2023

According to the spec, supporting UTF-16 high surrogates is allowed but discouraged:

Ion does not specify the behavior of specifying invalid Unicode code points or surrogate code points (used only for UTF-16) using the escape sequences. It is highly recommended that Ion implementations reject such escape sequences as they are not proper Unicode as specified by the standard. To this point, consider the Ion string sequence, "\uD800\uDC00". A compliant parser may throw an exception because surrogate characters are specified outside of the context of UTF-16, accept the string as a technically invalid sequence of two Unicode code points (i.e. U+D800 and U+DC00), or interpret it as the single Unicode code point U+00010000. In this regard, the Ion string data type does not conform to the Unicode specification. A strict Unicode implementation of the Ion text should not accept such sequences.

There are some tests in equivs that include well-formed surrogate pairs, which makes it difficult to know if a parser is spec-compliant but has chosen not to support surrogates per the spec's recommendation.

See:

  • good/equivs/utf8/stringUtf8.ion
  • iontestdata/good/equivs/utf8/stringU0001D11E.ion

The surrogate-specific tests in those files should be broken out into a file dedicated to surrogate support so it can be easily added to a skip list as desired.

@zslayton zslayton added the bug label Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant