Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OSX] HybridGlobalization implement compare native function #85965

Merged
merged 22 commits into from
May 24, 2023

Conversation

mkhamoyan
Copy link
Contributor

@mkhamoyan mkhamoyan commented May 9, 2023

Implemented following native functions for OSX platforms

GlobalizationNative_CompareStringNative

Contributes to #80689

cc @SamMonoRT

@ghost
Copy link

ghost commented May 9, 2023

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

Implemented following native functions for OSX platforms

GlobalizationNative_CompareStringNative

Contributes to #80689

Author: mkhamoyan
Assignees: mkhamoyan
Labels:

area-System.Globalization

Milestone: -

@mkhamoyan mkhamoyan added NO-REVIEW Experimental/testing PR, do NOT review it os-ios Apple iOS labels May 9, 2023
@ghost
Copy link

ghost commented May 9, 2023

Tagging subscribers to 'os-ios': @steveisok, @akoeplinger
See info in area-owners.md if you want to be subscribed.

Issue Details

Implemented following native functions for OSX platforms

GlobalizationNative_CompareStringNative

Contributes to #80689

Author: mkhamoyan
Assignees: mkhamoyan
Labels:

NO-REVIEW, area-System.Globalization, os-ios

Milestone: -

@mkhamoyan mkhamoyan added os-maccatalyst MacCatalyst OS os-tvos Apple tvOS labels May 9, 2023
@ghost
Copy link

ghost commented May 9, 2023

Tagging subscribers to 'os-tvos': @steveisok, @akoeplinger
See info in area-owners.md if you want to be subscribed.

Issue Details

Implemented following native functions for OSX platforms

GlobalizationNative_CompareStringNative

Contributes to #80689

Author: mkhamoyan
Assignees: mkhamoyan
Labels:

NO-REVIEW, area-System.Globalization, os-ios, os-tvos, os-maccatalyst

Milestone: -


if (!PlatformDetection.IsHybridGlobalizationOnOSX)
{
yield return new object[] { s_invariantCompare, "\u20A9", "\uFFE6", CompareOptions.IgnoreCase, -1 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it return? 0? We automatically always ignore width when passing IgnoreCase? Docs on the changes in behavior would be cool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the option should be changed to CompareOptions.IgnoreCase | CompareOptions.IgnoreWidth? It seems like \uFFE6 can represent both the halfwidth and fullwidth won sign

Copy link
Member

@mdh1418 mdh1418 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrated some comments over as they weren't showing up on the files changed tab.

Comment on lines 30 to 31
NSString *firstString = [NSString stringWithCharacters: (const unichar *)lpStr1 length: cwStr1Length];
NSString *secondString = [NSString stringWithCharacters: (const unichar *)lpStr2 length: cwStr2Length];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrating comment


Re: passing in string lengths, could we use something like initWithUTF8String or is there a chance that lpStr1's length is less than cwStr1Length? Just not sure why we need to pass in string lengths.


Ah looking into it more, it seems like lpStr1 and lpStr2 will not be null terminated so we cannot use initWithUTF8String, it would've been nice to not have to pass extra parameters.

Given that GlobalizationNative_CompareStringNative is StringMarshalled with UTF8, it feels like there would be problems casting to const unichar * as that is UTF16, and a character represented in UTF8 might have a completely different representation in UTF16 that wouldn't align.

Are there any test cases that mightve failed because of a string marshalled into UTF8 is different in its UTF16 representation?

Maybe we should be using initWithBytes:length:encoding?
[NSString initWithBytes: lpStr1 length: cwStr1Length encoding: NSUTF8StringEncoding]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried with initWithBytes:length:encoding?
[NSString initWithBytes: lpStr1 length: cwStr1Length encoding: NSUTF8StringEncoding]
It can't convert properly from char* C# to objective C string , result is wrong. Thanks for pointing out UTF16 , will match it in C# and C and see how test cases will behave.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you happen to know what the bytes passed to pString1 in Interop.Globalization.CompareStringNative(m_name, m_name.Length, pString1, string1.Length, pString2, string2.Length, options); look like and what the bytes in lpStr1 look like here when sticking with StringMarshalling.UTF8?
Are NSUTF8StringEncoding and the StringMarshalling.UTF8 differnt? I'm curious as to why it fails to create the correct string.

I'm not sure what the full repercussions of using UTF16 if UTF8 is the default for OSX and for all other interop on OSX, we have been using UTF8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for bytes I will check. But changing from StringMarshaling.UTF8 to StringMarshaling.UTF16 didn't affect test cases.

Currently in Interop.Collation.cs there are functions using UTF8 and some of them UTF16 . The ones that pass char* are using StringMarshaling.UTF16 but not sure if this is the reason https://github.com/dotnet/runtime/blob/main/src/libraries/Common/src/Interop/Interop.Collation.cs.
In OSX https://github.com/dotnet/runtime/blob/main/src/libraries/Common/src/Interop/Interop.Locale.OSX.cs so far only UTF8 is used, but also there is no case of passing char*.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked a little bit for the differences and it depends what are we passing from C# to C. In some functions we pass string, in others char*. When we need to pass char*, as char keyword occupies 2 bytes (16 bits) in the memory => UTF16 is the choice. But in my previous implementations I was only passing string (cultureName) so I chose to use StringMarshalling.Utf8 https://github.com/dotnet/runtime/blob/main/src/libraries/Common/src/Interop/Interop.Locale.OSX.cs#L10
@steveisok could you please verify if this looks correct?


if (!PlatformDetection.IsHybridGlobalizationOnOSX)
{
yield return new object[] { s_invariantCompare, "\u20A9", "\uFFE6", CompareOptions.IgnoreCase, -1 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the option should be changed to CompareOptions.IgnoreCase | CompareOptions.IgnoreWidth? It seems like \uFFE6 can represent both the halfwidth and fullwidth won sign

@mkhamoyan
Copy link
Contributor Author

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mkhamoyan
Copy link
Contributor Author

/azp run runtime-ioslike

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines -64 to +112
yield return new object[] { s_invariantCompare, "\u6FA4", "\u6CA2", ignoredOptions, 1 };

yield return new object[] { s_invariantCompare, "\u3070\u3073\u3076\u3079\u307C", "\u30D0\u30D3\u30D6\u30D9\u30DC", ignoredOptions, 0 };
yield return new object[] { s_invariantCompare, "\u3070\u3073\u3076\u3079\u307C", "\u30D0\u30D3\u3076\u30D9\u30DC", ignoredOptions, 0 };
yield return new object[] { s_invariantCompare, "\u3070\u3073\uFF8C\uFF9E\uFF8D\uFF9E\u307C", "\uFF8E\uFF9E", ignoredOptions, -1 };
yield return new object[] { s_invariantCompare, "\u3070\u30DC\uFF8C\uFF9E\uFF8D\uFF9E\u307C", "\uFF8E\uFF9E", ignoredOptions, -1 };
yield return new object[] { s_invariantCompare, "\u3070\u30DC\uFF8C\uFF9E\uFF8D\uFF9E\u307C", "\u3079\uFF8E\uFF9E", ignoredOptions, -1 };
yield return new object[] { s_invariantCompare, "\u3070\u3073\uFF8C\uFF9E\uFF8D\uFF9E\u307C", "\u30D6", ignoredOptions, -1 };
yield return new object[] { s_invariantCompare, "\u3071\u3074\u30D7\u307A", "\uFF8B\uFF9F\uFF8C\uFF9F", ignoredOptions, -1 };
yield return new object[] { s_invariantCompare, "\u3070\u30DC\uFF8C\uFF9E\uFF8D\uFF9E\u307C", "\u3070\uFF8E\uFF9E\u30D6", ignoredOptions, 1 }; // known exception for hg: should be -1
yield return new object[] { s_invariantCompare, "\u3070\u30DC\uFF8C\uFF9E\uFF8D\uFF9E\u307C\u3079\u307C", "\u3079\uFF8E\uFF9E", ignoredOptions, -1 };
yield return new object[] { s_invariantCompare, "\u3070\uFF8C\uFF9E\uFF8D\uFF9E\u307C", "\u30D6", ignoredOptions, -1 };

yield return new object[] { s_invariantCompare, "ABDDE", "D", ignoredOptions, -1 };
yield return new object[] { s_invariantCompare, "ABCDE", "\uFF43D", ignoredOptions, -1 };
yield return new object[] { s_invariantCompare, "ABCDE", "c", ignoredOptions, -1 };
yield return new object[] { s_invariantCompare, "\u3060", "\u305F", ignoredOptions, 1 };
yield return new object[] { s_invariantCompare, "\u3060", "\u30C0", ignoredOptions, 0 };

yield return new object[] { s_invariantCompare, "\u30BF", "\uFF80", ignoredOptions, 0 }; // known exception for hg: should be -1

yield return new object[] { s_invariantCompare, "\u68EE\u9D0E\u5916", "\u68EE\u9DD7\u5916", ignoredOptions, -1 };
yield return new object[] { s_invariantCompare, "\u68EE\u9DD7\u5916", "\u68EE\u9DD7\u5916", ignoredOptions, 0 };
yield return new object[] { s_invariantCompare, "\u2019\u2019\u2019\u2019", "''''", ignoredOptions, 1 };
yield return new object[] { s_invariantCompare, "\u2019", "'", ignoredOptions, 1 };
yield return new object[] { s_invariantCompare, "", "'", ignoredOptions, -1 };
yield return new object[] { s_invariantCompare, "\u4E00", "\uFF11", ignoredOptions, 1 };
yield return new object[] { s_invariantCompare, "\u2160", "\uFF11", ignoredOptions, 1 };

yield return new object[] { s_invariantCompare, "0", "\uFF10", ignoredOptions, 0 }; // known exception for hg: should be -1
yield return new object[] { s_invariantCompare, "10", "1\uFF10", ignoredOptions, 0 }; // as above
yield return new object[] { s_invariantCompare, "9999\uFF1910", "1\uFF10", ignoredOptions, 1 };
yield return new object[] { s_invariantCompare, "9999\uFF191010", "1\uFF10", ignoredOptions, 1 };

yield return new object[] { s_invariantCompare, "'\u3000'", "' '", ignoredOptions, 0 };
yield return new object[] { s_invariantCompare, "\uFF1B", ";", ignoredOptions, 0 }; // known exception for hg: should be 1
yield return new object[] { s_invariantCompare, "\uFF08", "(", ignoredOptions, 0 }; // known exception for hg: should be 1
yield return new object[] { s_invariantCompare, "\u30FC", "\uFF70", ignoredOptions, 0 };
yield return new object[] { s_invariantCompare, "\u30FC", "\uFF0D", ignoredOptions, 1 };
yield return new object[] { s_invariantCompare, "\u30FC", "\u30FC", ignoredOptions, 0 };
yield return new object[] { s_invariantCompare, "\u30FC", "\u2015", ignoredOptions, 1 };
yield return new object[] { s_invariantCompare, "\u30FC", "\u2010", ignoredOptions, 1 };

yield return new object[] { s_invariantCompare, "/", "\uFF0F", ignoredOptions, 0 }; // known exception for hg: should be -1
yield return new object[] { s_invariantCompare, "'", "\uFF07", ignoredOptions, 0 }; // as above
yield return new object[] { s_invariantCompare, "\"", "\uFF02", ignoredOptions, 0 }; // as above

yield return new object[] { s_invariantCompare, "\u3042", "\u30A1", CompareOptions.None, s_expectedHiraganaToKatakanaCompare };
if (!PlatformDetection.IsHybridGlobalizationOnBrowser)
{
yield return new object[] { s_invariantCompare, "a", "\uFF41", CompareOptions.IgnoreWidth, 0 };
yield return new object[] { s_invariantCompare, "ABCDE", "\uFF21\uFF22\uFF23\uFF24\uFF25", CompareOptions.IgnoreWidth, 0 };
yield return new object[] { s_invariantCompare, "ABCDE", "\uFF21\uFF22\uFF23D\uFF25", CompareOptions.IgnoreWidth, 0 };
yield return new object[] { s_invariantCompare, "ABCDE", "a\uFF22\uFF23D\uFF25", CompareOptions.IgnoreWidth | CompareOptions.IgnoreCase, 0 };
yield return new object[] { s_invariantCompare, "ABCDE", "\uFF41\uFF42\uFF23D\uFF25", CompareOptions.IgnoreWidth | CompareOptions.IgnoreCase, 0 };
yield return new object[] { s_invariantCompare, "ABCDE", "\uFF43D", CompareOptions.IgnoreWidth, -1 };
yield return new object[] { s_invariantCompare, "ABCDE", "c", CompareOptions.IgnoreWidth, -1 };
yield return new object[] { s_invariantCompare, "\u30BF", "\uFF80", CompareOptions.IgnoreWidth, 0 };
yield return new object[] { s_invariantCompare, "0", "\uFF10", CompareOptions.IgnoreWidth, 0 };
yield return new object[] { s_invariantCompare, "10", "1\uFF10", CompareOptions.IgnoreWidth, 0 };
yield return new object[] { s_invariantCompare, "\uFF1B", ";", CompareOptions.IgnoreWidth, 0 };
yield return new object[] { s_invariantCompare, "\uFF08", "(", CompareOptions.IgnoreWidth, 0 };
yield return new object[] { s_invariantCompare, "/", "\uFF0F", CompareOptions.IgnoreWidth, 0 };
yield return new object[] { s_invariantCompare, "'", "\uFF07", CompareOptions.IgnoreWidth, 0 };
yield return new object[] { s_invariantCompare, "\"", "\uFF02", CompareOptions.IgnoreWidth, 0 };
}
yield return new object[] { s_invariantCompare, "\u3042", "\u30A1", CompareOptions.None, PlatformDetection.IsHybridGlobalizationOnOSX ? 1 : s_expectedHiraganaToKatakanaCompare };
Copy link
Member

@mdh1418 mdh1418 May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ordering native apple platforms with NSLiteralSearch:
lowercase hiragana < lowercase katakana < uppercase hiragana < uppercase katakana ?
I believe this should be noted in hybrid-globalization.md as the comments there suggest hiragana and katakana characters are ordered differently compared to ICU but it seems to depend on the case as well?

I am curious if \u3042 \u30A1 CompareOptions.IgnoreCase leads to -1.

If this is the case, I think in addition to noting the behavior in the docs, it would be nice to organize the hiragana/katakana tests to better reflect the different cases (i.e. Same case hiragana vs katakana, lower case hiragana vs upper case katakana, upper case hiragana vs lower case katakana or something to highlight expected behaviors) Test changes can be done in a followup, but I believe documentation change should be added in this PR.

Edit: this is just for where we use s_expectedHiraganaToKatakanaCompare and PlatformDetection.IsHybridGlobalizationOnOSX ? 1 : s_expectedHiraganaToKatakanaCompare, sorry somehow its referencing unrelated code blocks

Copy link
Contributor Author

@mkhamoyan mkhamoyan May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering in native apple platforms is not so clear.
I couldn't find in apple documentations exact flow how they are ordered. And there are 3 cases

  1. Letters that have lowercase/uppercase
    For this case ordering works like hiragana lowercase < katakana lowercase < hiragana uppercase < katakana uppercase

    code char name
    \u3041 Hiragana letter small A
    \u3042 Hiragana letter A
    \u30A1 Katakana letter small A
    \u30A2 Katakana letter A
    -- -- --
  2. Letters without lowercase/uppercase
    For this case ordering is katakana letter < hiragana letter but not sure it comes after lowercase katakana letters or somewhere else.

    code char name
    \u30C0 Katakana letter DA
    \u3060 Hiragana letter DA
    -- -- --
  3. Letters only existing in katakana
    Not sure where these letters are in ordering.

    code char name
    \u30F4 Katakana letter VU
    -- -- --

So writing in docs hiragana and katakana characters are ordered differently compared to ICU is more reasonable for now. Created ticket #86636 so I can go through all cases and figure out how is ordering done.

@mkhamoyan
Copy link
Contributor Author

/azp run runtime-ioslike

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@mdh1418 mdh1418 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides trying to use const unichar* directly since we StringMarshall to Utf16, looks good to me!

Function:
CompareString
*/
int32_t GlobalizationNative_CompareStringNative(const char* localeName, int32_t lNameLength, const char* lpStr1, int32_t cwStr1Length,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are marshalling strings to UTF16 directly, I think we can use const unichar* lpStr1 and const unichar* lpStr2 in the parameters and remove the casting below. Can probably do the same for localeName as well?
plus changing the declaration in the header to reflect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't recognise const unichar* in pal_collation.h. Not sure how can I compile unichar type in .h file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equivalent in C is unsigned short*, will use that in function definition.

Copy link
Member

@mdh1418 mdh1418 May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just use const uint16_t * in both the header and the .m?

@mkhamoyan
Copy link
Contributor Author

/azp run runtime-ioslike

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mkhamoyan
Copy link
Contributor Author

/azp run runtime-ioslike

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mkhamoyan
Copy link
Contributor Author

Failures are not related

@mkhamoyan mkhamoyan merged commit f345a70 into main May 24, 2023
@akoeplinger akoeplinger deleted the hybrid_compare_options branch May 24, 2023 14:00
@ghost ghost locked as resolved and limited conversation to collaborators Jun 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants