-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OSX] HybridGlobalization implement compare native function #85965
Conversation
Tagging subscribers to this area: @dotnet/area-system-globalization Issue DetailsImplemented following native functions for OSX platforms
Contributes to #80689
|
Tagging subscribers to 'os-ios': @steveisok, @akoeplinger Issue DetailsImplemented following native functions for OSX platforms
Contributes to #80689
|
Tagging subscribers to 'os-tvos': @steveisok, @akoeplinger Issue DetailsImplemented following native functions for OSX platforms
Contributes to #80689
|
src/libraries/System.Globalization/tests/CompareInfo/CompareInfoTests.Compare.cs
Show resolved
Hide resolved
src/libraries/System.Globalization/tests/CompareInfo/CompareInfoTests.Compare.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Globalization/tests/CompareInfo/CompareInfoTests.Compare.cs
Outdated
Show resolved
Hide resolved
|
||
if (!PlatformDetection.IsHybridGlobalizationOnOSX) | ||
{ | ||
yield return new object[] { s_invariantCompare, "\u20A9", "\uFFE6", CompareOptions.IgnoreCase, -1 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it return? 0? We automatically always ignore width when passing IgnoreCase? Docs on the changes in behavior would be cool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the option should be changed to CompareOptions.IgnoreCase | CompareOptions.IgnoreWidth
? It seems like \uFFE6
can represent both the halfwidth and fullwidth won sign
src/libraries/System.Globalization/tests/CompareInfo/CompareInfoTests.Compare.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.OSX.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.OSX.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.OSX.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Globalization/tests/CompareInfo/CompareInfoTests.Compare.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.OSX.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrated some comments over as they weren't showing up on the files changed tab.
NSString *firstString = [NSString stringWithCharacters: (const unichar *)lpStr1 length: cwStr1Length]; | ||
NSString *secondString = [NSString stringWithCharacters: (const unichar *)lpStr2 length: cwStr2Length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrating comment
Re: passing in string lengths, could we use something like initWithUTF8String or is there a chance that lpStr1's length is less than cwStr1Length? Just not sure why we need to pass in string lengths.
Ah looking into it more, it seems like lpStr1
and lpStr2
will not be null terminated so we cannot use initWithUTF8String
, it would've been nice to not have to pass extra parameters.
Given that GlobalizationNative_CompareStringNative
is StringMarshalled
with UTF8
, it feels like there would be problems casting to const unichar *
as that is UTF16
, and a character represented in UTF8 might have a completely different representation in UTF16 that wouldn't align.
Are there any test cases that mightve failed because of a string marshalled into UTF8 is different in its UTF16 representation?
Maybe we should be using initWithBytes:length:encoding
?
[NSString initWithBytes: lpStr1 length: cwStr1Length encoding: NSUTF8StringEncoding]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried with initWithBytes:length:encoding?
[NSString initWithBytes: lpStr1 length: cwStr1Length encoding: NSUTF8StringEncoding]
It can't convert properly from char* C# to objective C string , result is wrong. Thanks for pointing out UTF16 , will match it in C# and C and see how test cases will behave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to know what the bytes passed to pString1
in Interop.Globalization.CompareStringNative(m_name, m_name.Length, pString1, string1.Length, pString2, string2.Length, options);
look like and what the bytes in lpStr1
look like here when sticking with StringMarshalling.UTF8
?
Are NSUTF8StringEncoding
and the StringMarshalling.UTF8
differnt? I'm curious as to why it fails to create the correct string.
I'm not sure what the full repercussions of using UTF16 if UTF8 is the default for OSX and for all other interop on OSX, we have been using UTF8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for bytes I will check. But changing from StringMarshaling.UTF8
to StringMarshaling.UTF16
didn't affect test cases.
Currently in Interop.Collation.cs there are functions using UTF8
and some of them UTF16
. The ones that pass char*
are using StringMarshaling.UTF16
but not sure if this is the reason https://github.com/dotnet/runtime/blob/main/src/libraries/Common/src/Interop/Interop.Collation.cs.
In OSX
https://github.com/dotnet/runtime/blob/main/src/libraries/Common/src/Interop/Interop.Locale.OSX.cs so far only UTF8
is used, but also there is no case of passing char*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked a little bit for the differences and it depends what are we passing from C# to C. In some functions we pass string, in others char*. When we need to pass char*, as char keyword occupies 2 bytes (16 bits) in the memory => UTF16 is the choice. But in my previous implementations I was only passing string (cultureName) so I chose to use StringMarshalling.Utf8 https://github.com/dotnet/runtime/blob/main/src/libraries/Common/src/Interop/Interop.Locale.OSX.cs#L10
@steveisok could you please verify if this looks correct?
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.OSX.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Globalization/tests/CompareInfo/CompareInfoTests.Compare.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Globalization/tests/CompareInfo/CompareInfoTests.Compare.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Globalization/tests/CompareInfo/CompareInfoTests.Compare.cs
Show resolved
Hide resolved
|
||
if (!PlatformDetection.IsHybridGlobalizationOnOSX) | ||
{ | ||
yield return new object[] { s_invariantCompare, "\u20A9", "\uFFE6", CompareOptions.IgnoreCase, -1 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the option should be changed to CompareOptions.IgnoreCase | CompareOptions.IgnoreWidth
? It seems like \uFFE6
can represent both the halfwidth and fullwidth won sign
src/libraries/System.Globalization/tests/CompareInfo/CompareInfoTests.Compare.cs
Outdated
Show resolved
Hide resolved
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-ioslike |
Azure Pipelines successfully started running 1 pipeline(s). |
yield return new object[] { s_invariantCompare, "\u6FA4", "\u6CA2", ignoredOptions, 1 }; | ||
|
||
yield return new object[] { s_invariantCompare, "\u3070\u3073\u3076\u3079\u307C", "\u30D0\u30D3\u30D6\u30D9\u30DC", ignoredOptions, 0 }; | ||
yield return new object[] { s_invariantCompare, "\u3070\u3073\u3076\u3079\u307C", "\u30D0\u30D3\u3076\u30D9\u30DC", ignoredOptions, 0 }; | ||
yield return new object[] { s_invariantCompare, "\u3070\u3073\uFF8C\uFF9E\uFF8D\uFF9E\u307C", "\uFF8E\uFF9E", ignoredOptions, -1 }; | ||
yield return new object[] { s_invariantCompare, "\u3070\u30DC\uFF8C\uFF9E\uFF8D\uFF9E\u307C", "\uFF8E\uFF9E", ignoredOptions, -1 }; | ||
yield return new object[] { s_invariantCompare, "\u3070\u30DC\uFF8C\uFF9E\uFF8D\uFF9E\u307C", "\u3079\uFF8E\uFF9E", ignoredOptions, -1 }; | ||
yield return new object[] { s_invariantCompare, "\u3070\u3073\uFF8C\uFF9E\uFF8D\uFF9E\u307C", "\u30D6", ignoredOptions, -1 }; | ||
yield return new object[] { s_invariantCompare, "\u3071\u3074\u30D7\u307A", "\uFF8B\uFF9F\uFF8C\uFF9F", ignoredOptions, -1 }; | ||
yield return new object[] { s_invariantCompare, "\u3070\u30DC\uFF8C\uFF9E\uFF8D\uFF9E\u307C", "\u3070\uFF8E\uFF9E\u30D6", ignoredOptions, 1 }; // known exception for hg: should be -1 | ||
yield return new object[] { s_invariantCompare, "\u3070\u30DC\uFF8C\uFF9E\uFF8D\uFF9E\u307C\u3079\u307C", "\u3079\uFF8E\uFF9E", ignoredOptions, -1 }; | ||
yield return new object[] { s_invariantCompare, "\u3070\uFF8C\uFF9E\uFF8D\uFF9E\u307C", "\u30D6", ignoredOptions, -1 }; | ||
|
||
yield return new object[] { s_invariantCompare, "ABDDE", "D", ignoredOptions, -1 }; | ||
yield return new object[] { s_invariantCompare, "ABCDE", "\uFF43D", ignoredOptions, -1 }; | ||
yield return new object[] { s_invariantCompare, "ABCDE", "c", ignoredOptions, -1 }; | ||
yield return new object[] { s_invariantCompare, "\u3060", "\u305F", ignoredOptions, 1 }; | ||
yield return new object[] { s_invariantCompare, "\u3060", "\u30C0", ignoredOptions, 0 }; | ||
|
||
yield return new object[] { s_invariantCompare, "\u30BF", "\uFF80", ignoredOptions, 0 }; // known exception for hg: should be -1 | ||
|
||
yield return new object[] { s_invariantCompare, "\u68EE\u9D0E\u5916", "\u68EE\u9DD7\u5916", ignoredOptions, -1 }; | ||
yield return new object[] { s_invariantCompare, "\u68EE\u9DD7\u5916", "\u68EE\u9DD7\u5916", ignoredOptions, 0 }; | ||
yield return new object[] { s_invariantCompare, "\u2019\u2019\u2019\u2019", "''''", ignoredOptions, 1 }; | ||
yield return new object[] { s_invariantCompare, "\u2019", "'", ignoredOptions, 1 }; | ||
yield return new object[] { s_invariantCompare, "", "'", ignoredOptions, -1 }; | ||
yield return new object[] { s_invariantCompare, "\u4E00", "\uFF11", ignoredOptions, 1 }; | ||
yield return new object[] { s_invariantCompare, "\u2160", "\uFF11", ignoredOptions, 1 }; | ||
|
||
yield return new object[] { s_invariantCompare, "0", "\uFF10", ignoredOptions, 0 }; // known exception for hg: should be -1 | ||
yield return new object[] { s_invariantCompare, "10", "1\uFF10", ignoredOptions, 0 }; // as above | ||
yield return new object[] { s_invariantCompare, "9999\uFF1910", "1\uFF10", ignoredOptions, 1 }; | ||
yield return new object[] { s_invariantCompare, "9999\uFF191010", "1\uFF10", ignoredOptions, 1 }; | ||
|
||
yield return new object[] { s_invariantCompare, "'\u3000'", "' '", ignoredOptions, 0 }; | ||
yield return new object[] { s_invariantCompare, "\uFF1B", ";", ignoredOptions, 0 }; // known exception for hg: should be 1 | ||
yield return new object[] { s_invariantCompare, "\uFF08", "(", ignoredOptions, 0 }; // known exception for hg: should be 1 | ||
yield return new object[] { s_invariantCompare, "\u30FC", "\uFF70", ignoredOptions, 0 }; | ||
yield return new object[] { s_invariantCompare, "\u30FC", "\uFF0D", ignoredOptions, 1 }; | ||
yield return new object[] { s_invariantCompare, "\u30FC", "\u30FC", ignoredOptions, 0 }; | ||
yield return new object[] { s_invariantCompare, "\u30FC", "\u2015", ignoredOptions, 1 }; | ||
yield return new object[] { s_invariantCompare, "\u30FC", "\u2010", ignoredOptions, 1 }; | ||
|
||
yield return new object[] { s_invariantCompare, "/", "\uFF0F", ignoredOptions, 0 }; // known exception for hg: should be -1 | ||
yield return new object[] { s_invariantCompare, "'", "\uFF07", ignoredOptions, 0 }; // as above | ||
yield return new object[] { s_invariantCompare, "\"", "\uFF02", ignoredOptions, 0 }; // as above | ||
|
||
yield return new object[] { s_invariantCompare, "\u3042", "\u30A1", CompareOptions.None, s_expectedHiraganaToKatakanaCompare }; | ||
if (!PlatformDetection.IsHybridGlobalizationOnBrowser) | ||
{ | ||
yield return new object[] { s_invariantCompare, "a", "\uFF41", CompareOptions.IgnoreWidth, 0 }; | ||
yield return new object[] { s_invariantCompare, "ABCDE", "\uFF21\uFF22\uFF23\uFF24\uFF25", CompareOptions.IgnoreWidth, 0 }; | ||
yield return new object[] { s_invariantCompare, "ABCDE", "\uFF21\uFF22\uFF23D\uFF25", CompareOptions.IgnoreWidth, 0 }; | ||
yield return new object[] { s_invariantCompare, "ABCDE", "a\uFF22\uFF23D\uFF25", CompareOptions.IgnoreWidth | CompareOptions.IgnoreCase, 0 }; | ||
yield return new object[] { s_invariantCompare, "ABCDE", "\uFF41\uFF42\uFF23D\uFF25", CompareOptions.IgnoreWidth | CompareOptions.IgnoreCase, 0 }; | ||
yield return new object[] { s_invariantCompare, "ABCDE", "\uFF43D", CompareOptions.IgnoreWidth, -1 }; | ||
yield return new object[] { s_invariantCompare, "ABCDE", "c", CompareOptions.IgnoreWidth, -1 }; | ||
yield return new object[] { s_invariantCompare, "\u30BF", "\uFF80", CompareOptions.IgnoreWidth, 0 }; | ||
yield return new object[] { s_invariantCompare, "0", "\uFF10", CompareOptions.IgnoreWidth, 0 }; | ||
yield return new object[] { s_invariantCompare, "10", "1\uFF10", CompareOptions.IgnoreWidth, 0 }; | ||
yield return new object[] { s_invariantCompare, "\uFF1B", ";", CompareOptions.IgnoreWidth, 0 }; | ||
yield return new object[] { s_invariantCompare, "\uFF08", "(", CompareOptions.IgnoreWidth, 0 }; | ||
yield return new object[] { s_invariantCompare, "/", "\uFF0F", CompareOptions.IgnoreWidth, 0 }; | ||
yield return new object[] { s_invariantCompare, "'", "\uFF07", CompareOptions.IgnoreWidth, 0 }; | ||
yield return new object[] { s_invariantCompare, "\"", "\uFF02", CompareOptions.IgnoreWidth, 0 }; | ||
} | ||
yield return new object[] { s_invariantCompare, "\u3042", "\u30A1", CompareOptions.None, PlatformDetection.IsHybridGlobalizationOnOSX ? 1 : s_expectedHiraganaToKatakanaCompare }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the ordering native apple platforms with NSLiteralSearch
:
lowercase hiragana < lowercase katakana < uppercase hiragana < uppercase katakana
?
I believe this should be noted in hybrid-globalization.md
as the comments there suggest hiragana and katakana characters are ordered differently compared to ICU
but it seems to depend on the case as well?
I am curious if \u3042
\u30A1
CompareOptions.IgnoreCase
leads to -1
.
If this is the case, I think in addition to noting the behavior in the docs, it would be nice to organize the hiragana/katakana tests to better reflect the different cases (i.e. Same case hiragana vs katakana, lower case hiragana vs upper case katakana, upper case hiragana vs lower case katakana or something to highlight expected behaviors) Test changes can be done in a followup, but I believe documentation change should be added in this PR.
Edit: this is just for where we use s_expectedHiraganaToKatakanaCompare
and PlatformDetection.IsHybridGlobalizationOnOSX ? 1 : s_expectedHiraganaToKatakanaCompare
, sorry somehow its referencing unrelated code blocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordering in native apple platforms is not so clear.
I couldn't find in apple documentations exact flow how they are ordered. And there are 3 cases
-
Letters that have lowercase/uppercase
For this case ordering works likehiragana lowercase < katakana lowercase < hiragana uppercase < katakana uppercase
code char name \u3041 ぁ Hiragana letter small A \u3042 あ Hiragana letter A \u30A1 ァ Katakana letter small A \u30A2 ア Katakana letter A -- -- -- -
Letters without lowercase/uppercase
For this case ordering iskatakana letter < hiragana letter
but not sure it comes after lowercase katakana letters or somewhere else.code char name \u30C0 ダ Katakana letter DA \u3060 だ Hiragana letter DA -- -- -- -
Letters only existing in katakana
Not sure where these letters are in ordering.code char name \u30F4 ヴ Katakana letter VU -- -- --
So writing in docs hiragana and katakana characters are ordered differently compared to ICU
is more reasonable for now. Created ticket #86636 so I can go through all cases and figure out how is ordering done.
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.OSX.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.OSX.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.OSX.cs
Outdated
Show resolved
Hide resolved
/azp run runtime-ioslike |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides trying to use const unichar*
directly since we StringMarshall to Utf16, looks good to me!
Function: | ||
CompareString | ||
*/ | ||
int32_t GlobalizationNative_CompareStringNative(const char* localeName, int32_t lNameLength, const char* lpStr1, int32_t cwStr1Length, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we are marshalling strings to UTF16 directly, I think we can use const unichar* lpStr1
and const unichar* lpStr2
in the parameters and remove the casting below. Can probably do the same for localeName
as well?
plus changing the declaration in the header to reflect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't recognise const unichar*
in pal_collation.h
. Not sure how can I compile unichar
type in .h file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Equivalent in C is unsigned short*, will use that in function definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can just use const uint16_t *
in both the header and the .m
?
/azp run runtime-ioslike |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-ioslike |
Azure Pipelines successfully started running 1 pipeline(s). |
Failures are not related |
Implemented following native functions for OSX platforms
GlobalizationNative_CompareStringNative
Contributes to #80689
cc @SamMonoRT