Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] read_json should output all-nulls columns for the schema columns that do not match with the input JSON #17341

Open
ttnghia opened this issue Nov 15, 2024 · 1 comment · May be fixed by #17348
Labels
feature request New feature or request

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Nov 15, 2024

This is similar to #17091, but not the same. Currently, when the input JSON data has a column with the same name as in the input schema, it will be output without checking whether that column has the correct data type. For example, with the following input:

JSON data: {"a" : [1]}
Schema: STRUCT<a: LIST<STRUCT<INT>>>

Then read_json will output a LIST<INT8> column. The correct output should be an all-null column instead.

@ttnghia ttnghia added the feature request New feature or request label Nov 15, 2024
@ttnghia
Copy link
Contributor Author

ttnghia commented Nov 15, 2024

Addressing this will also be the long term solution to fix NVIDIA/spark-rapids#10901.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
Status: Burndown
Development

Successfully merging a pull request may close this issue.

1 participant