-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SpanScanner doesn't handle UTF-16 surrogate pairs #4
Comments
I think the real issue here is that |
This behavior runs contrary to the rest of Dart's string handling, and in particular breaks string_scanner. See dart-lang/string_scanner#4.
I agree the issue stems from a choice made within |
This behavior runs contrary to the rest of Dart's string handling, and in particular breaks string_scanner. See dart-lang/string_scanner#4.
The following example demonstrates the issue:
This code throws:
I believe the issue is that
package:string_scanner
operates on code units, whereaspackage:source_span
operates on code points (runes), resulting in this index mismatch for surrogate pairs.I found a workaround by decoding the string first:
Is the code unit and code point distinction here intentional? If so is there a better way to ensure proper UTF-16 support? The workaround seems reasonable, although having to provide a span rather than the decoded file itself is clunky. Would adding a constructor to
SpanScanner
which accepts a decoded file or list of code units be a suitable solution? Or couldSpanScanner
optionally operate on code points instead of code units?The text was updated successfully, but these errors were encountered: