Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] JSON reader parses types compatible with Spark #4609

Open
7 of 9 tasks
GaryShen2008 opened this issue Jan 24, 2022 · 1 comment
Open
7 of 9 tasks

[FEA] JSON reader parses types compatible with Spark #4609

GaryShen2008 opened this issue Jan 24, 2022 · 1 comment
Labels
epic Issue that encompasses a significant feature or body of work task Work required that improves the product but is not user facing

Comments

@GaryShen2008
Copy link
Collaborator

GaryShen2008 commented Jan 24, 2022

The current CUDF code parses types how they want to, and not how Spark wants them to be parsed.
We might be able to ask CUDF to read all of the types as Strings and then parse them ourselves.
This will not fix everything because looking at cast there are a number of types
that we do not fully support when casting from a string still.
But it will be a step in the right direction and should let us avoid most of the enable this type for JSON configs.
We could at a minimum reused the cast from string configs that already exist.

This will not work 100% because Spark parses it line by line and then casts it to the desired result.
So with {"a": "100.0"} and {"a": 100.0}. If we asked for a double to be returned,
Spark would parse the first one as a String and then cast it to a double,
but for the second one it would parse it directly as a double and do no casting at the end.
In most cases this should not make a difference, but there can be very subtile differences,
between Sparks casting and what the JSON parser does to read the data.

@GaryShen2008 GaryShen2008 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jan 24, 2022
@andygrove andygrove self-assigned this Jan 24, 2022
@andygrove andygrove added this to the Jan 10 - Jan 28 milestone Jan 24, 2022
@sameerz sameerz added task Work required that improves the product but is not user facing and removed feature request New feature or request ? - Needs Triage Need team to review and classify labels Jan 25, 2022
@andygrove andygrove added the epic Issue that encompasses a significant feature or body of work label Feb 11, 2022
@sameerz sameerz removed this from the Apr 4 - Apr 15 milestone Apr 19, 2022
@sameerz
Copy link
Collaborator

sameerz commented Jul 7, 2022

Removing from 22.08 until the cudf dependencies are satisfied.

@andygrove andygrove removed their assignment Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Issue that encompasses a significant feature or body of work task Work required that improves the product but is not user facing
Projects
None yet
Development

No branches or pull requests

3 participants