-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work around CLR unicode bug with classification of U+2028 #9519
Work around CLR unicode bug with classification of U+2028 #9519
Conversation
LGTM |
CC @ellismg |
@gafter future-stabilization is basically an ask mode branch. This doesn't intuitively seem like an ask mode issue. Am I missing a case? |
Why did you assign the issue to 2.0 (Preview)? Isn't that the milestone for this preview? |
@gafter ah okay. That's my mistake. |
// U+2028 "LINE SEPARATOR" should be classified by CharUnicodeInfo.GetUnicodeCategory as | ||
// UnicodeCategory.LineSeparator but due to a bug is incorrectly categorized as | ||
// UnicodeCategory.Format. See https://github.com/dotnet/coreclr/issues/3542 | ||
replaceWith = "\\u2028"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of calls to CharUnicodeInfo.GetUnicodeCategory in our codebase and some of those sites can still suffer from the bug in this API. For example (in src\Compilers\Core\Portable\UnicodeCharacterUtilities.cs):
UnicodeCategory cat = CharUnicodeInfo.GetUnicodeCategory(ch);
return IsLetterChar(cat)
|| IsDecimalDigitChar(cat)
|| IsConnectingChar(cat)
|| IsCombiningChar(cat)
|| IsFormattingChar(cat);
Instead of patching this particular call site in this manner, we should consider creating a wrapper for CharUnicodeInfo.GetUnicodeCategory, which would make necessary adjustments to the value it returns and simply change all call sites to use that wrapper.
Moving to the future branch. |
Fixes #9484
@dotnet/roslyn-compiler Please review