Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48191][SQL] Support UTF-32 for string encode and decode #46469

Closed
wants to merge 2 commits into from
Closed

[SPARK-48191][SQL] Support UTF-32 for string encode and decode #46469

wants to merge 2 commits into from

Conversation

vladimirg-db
Copy link
Contributor

What changes were proposed in this pull request?

Enable support of UTF-32

Why are the changes needed?

It already works, so we just need to enable it

Does this PR introduce any user-facing change?

Yes, decode(..., 'UTF-32') and encode(..., 'UTF-32') will start working

How was this patch tested?

Manually checked in the spark shell

Was this patch authored or co-authored using generative AI tooling?

No

@vladimirg-db vladimirg-db marked this pull request as ready for review May 8, 2024 07:54
@yaooqinn yaooqinn closed this in 003823b May 8, 2024
@yaooqinn
Copy link
Member

yaooqinn commented May 8, 2024

Thank you all. Merged to master

JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
### What changes were proposed in this pull request?
Enable support of UTF-32

### Why are the changes needed?
It already works, so we just need to enable it

### Does this PR introduce _any_ user-facing change?
Yes, `decode(..., 'UTF-32')` and `encode(..., 'UTF-32')` will start working

### How was this patch tested?
Manually checked in the spark shell

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#46469 from vladimirg-db/vladimirg-db/support-utf-32-for-string-decode.

Authored-by: Vladimir Golubev <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
MaxGekk pushed a commit that referenced this pull request Aug 22, 2024
### What changes were proposed in this pull request?
The pr aims to update the related docs after `encoding` and `decoding` support `UTF-32`,  includes:
- the `doc` of the sql config `spark.sql.legacy.javaCharsets`
- connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
- sql/core/src/main/scala/org/apache/spark/sql/functions.scala
- python/pyspark/sql/functions/builtin.py

### Why are the changes needed?
After the pr #46469, `UTF-32` for string encoding and decoding is already supported, but some related documents have not been updated synchronously.
Let's update it to avoid misunderstandings for end-users and developers.

https://github.com/apache/spark/blob/e93c5fbe81d21f8bf2ce52867013d06a63c7956e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala#L26

### Does this PR introduce _any_ user-facing change?
Yes, fix doc.

### How was this patch tested?
Nope, only fixed some docs.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47844 from panbingkun/SPARK-49353.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
### What changes were proposed in this pull request?
The pr aims to update the related docs after `encoding` and `decoding` support `UTF-32`,  includes:
- the `doc` of the sql config `spark.sql.legacy.javaCharsets`
- connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
- sql/core/src/main/scala/org/apache/spark/sql/functions.scala
- python/pyspark/sql/functions/builtin.py

### Why are the changes needed?
After the pr apache#46469, `UTF-32` for string encoding and decoding is already supported, but some related documents have not been updated synchronously.
Let's update it to avoid misunderstandings for end-users and developers.

https://github.com/apache/spark/blob/e93c5fbe81d21f8bf2ce52867013d06a63c7956e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala#L26

### Does this PR introduce _any_ user-facing change?
Yes, fix doc.

### How was this patch tested?
Nope, only fixed some docs.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47844 from panbingkun/SPARK-49353.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
The pr aims to update the related docs after `encoding` and `decoding` support `UTF-32`,  includes:
- the `doc` of the sql config `spark.sql.legacy.javaCharsets`
- connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
- sql/core/src/main/scala/org/apache/spark/sql/functions.scala
- python/pyspark/sql/functions/builtin.py

### Why are the changes needed?
After the pr apache#46469, `UTF-32` for string encoding and decoding is already supported, but some related documents have not been updated synchronously.
Let's update it to avoid misunderstandings for end-users and developers.

https://github.com/apache/spark/blob/e93c5fbe81d21f8bf2ce52867013d06a63c7956e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala#L26

### Does this PR introduce _any_ user-facing change?
Yes, fix doc.

### How was this patch tested?
Nope, only fixed some docs.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47844 from panbingkun/SPARK-49353.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants