Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48410][SQL] Fix InitCap expression for UTF8_BINARY_LCASE & ICU collations #46732

Closed
wants to merge 11 commits into from

Conversation

uros-db
Copy link
Contributor

@uros-db uros-db commented May 24, 2024

What changes were proposed in this pull request?

String titlecase conversion under UTF8_BINARY_LCASE and other ICU collations now work using the appropriate ICU default locale for character mapping, and uses ICU BreakIterator.getWordInstance to locate boundaries between words.

Why are the changes needed?

Similar Spark expressions such as Lower & Upper use the same interface (UCharacter) to perform collation-aware string transformation, and InitCap should offer a consistant way to titlecase strings across the collation space.

Does this PR introduce any user-facing change?

Yes, InitCap should now work properly for all collations other than UTF8_BINARY.

How was this patch tested?

New and existing unit tests, as well as existing e2e sql tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label May 24, 2024
@uros-db uros-db changed the title [WIP][SPARK-48410][SQL] Fix InitCap expression for UTF8_BINARY_LCASE & ICU collations [SPARK-48410][SQL] Fix InitCap expression for UTF8_BINARY_LCASE & ICU collations May 27, 2024
@uros-db uros-db requested a review from mkaravel May 31, 2024 12:20
@uros-db uros-db requested a review from dbatomic June 5, 2024 08:24
Copy link
Contributor

@mkaravel mkaravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Suggested a few more interesting test cases here as well.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 3857a9d Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants