You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I noticed this in some unit tests for the java APIs when I tried to enable schema pruning in CUDF by default for java JSON read APIs that explicitly do column pruning.
which fails because column d is being returned as a LIST<INT8> instead of a LIST<INT64> which is what it was requested to be, and which is what is returned for column d if pruning is disabled.
is failing because column e was requested to be a LIST<STRUCT>, but it was returned as a LIST<INT8> column.
Steps/Code to reproduce bug
If you want to reproduce this you can take #16796 and enable column pruning for the tests that are listed as failing. The third test is the scariest one, and it appears to return totally invalid results where the data column is empty despite the there being offsets pointing into it.
If I need to create a C++ repro case I am happy to do it
Expected behavior
I would expect the types in the schema to be honored at least in the same way that it is for the non pruning use case.
The text was updated successfully, but these errors were encountered:
This adds in the options to enable column_pruning when reading JSON using the java APIs.
This is still in draft because there are test failures if this is turned on for those tests.
#16797
That said the performance impact from enabling column pruning on some queries is huge. For one query in particular the current code takes 161.5 seconds and with CUDF column pruning it is just 16.5 seconds. That is a 10x speedup for something that is fairly real world.
Authors:
- Robert (Bobby) Evans (https://github.com/revans2)
Approvers:
- Alessandro Bellina (https://github.com/abellina)
- Nghia Truong (https://github.com/ttnghia)
URL: #16796
Describe the bug
I noticed this in some unit tests for the java APIs when I tried to enable schema pruning in CUDF by default for java JSON read APIs that explicitly do column pruning.
cudf/java/src/test/java/ai/rapids/cudf/TableTest.java
Lines 664 to 714 in 0b32f55
d
is being returned as aLIST<INT8>
instead of aLIST<INT64>
which is what it was requested to be, and which is what is returned for columnd
if pruning is disabled.cudf/java/src/test/java/ai/rapids/cudf/TableTest.java
Lines 743 to 790 in 0b32f55
d
is the wrong type.cudf/java/src/test/java/ai/rapids/cudf/TableTest.java
Lines 716 to 741 in 0b32f55
e
was requested to be aLIST<STRUCT>
, but it was returned as aLIST<INT8>
column.Steps/Code to reproduce bug
If you want to reproduce this you can take #16796 and enable column pruning for the tests that are listed as failing. The third test is the scariest one, and it appears to return totally invalid results where the data column is empty despite the there being offsets pointing into it.
If I need to create a C++ repro case I am happy to do it
Expected behavior
I would expect the types in the schema to be honored at least in the same way that it is for the non pruning use case.
The text was updated successfully, but these errors were encountered: