Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Get as much of StructToJson on by default #10371

Open
revans2 opened this issue Feb 2, 2024 · 0 comments
Open

[FEA] Get as much of StructToJson on by default #10371

revans2 opened this issue Feb 2, 2024 · 0 comments
Labels
feature request New feature or request

Comments

@revans2
Copy link
Collaborator

revans2 commented Feb 2, 2024

Is your feature request related to a problem? Please describe.
Currently toJson is disabled by default because

it is experimental and has some known incompatibilities with Spark, and can be enabled by setting spark.rapids.sql.expression.StructsToJson=true.

Known issues are:

    * There can be rounding differences when formatting floating-point numbers as strings. For example, Spark may produce -4.1243574E26 but the GPU may produce -4.124357351E26.
    * Not all JSON options are respected

https://github.com/NVIDIA/spark-rapids/blob/branch-24.02/docs/compatibility.md#to_json-function

The goal of this is to go through all of the JSONOptions Specifically the write options and fall back to the CPU for things that we do not support. For example dropFieldIfAllNull, writeNullIfWithDefaultValue, and pretty.

Next we need to fall back to the CPU if we see any float or double values. Please note that we want a config to be added in that would let the user control if float/double values are allowed or not. This should be separate from spark.rapids.sql.castFloatToString.enabled. Finally StructToJson should be enabled by default.

We should then update the compatibility docs.

At a minimum we should have some tests to verify that we fall back at the right time for what we have added in, but it would be nice to see if we have time to add some more tests to verify that we are doing the right thing in as many cases as possible. Things like '\r', '\n', '\t' characters in strings...

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Feb 2, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants