Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add support for DELTA_BINARY_PACKED and DELTA_BYTE_ARRAY encodings to Parquet reader #12948

Closed
wants to merge 164 commits into from
Closed
Show file tree
Hide file tree
Changes from 103 commits
Commits
Show all changes
164 commits
Select commit Hold shift + click to select a range
ae74d50
test for unsupported page encodings and throw an error if one is dete…
etseidl Feb 9, 2023
04fcd01
add test
etseidl Feb 10, 2023
30c0ecb
use sizeof
etseidl Feb 10, 2023
03a5960
use correct version of clang-format
etseidl Feb 10, 2023
f231b3a
replace thrust::any_of with thrust::count_if
etseidl Feb 10, 2023
5f1ef57
Merge branch 'rapidsai:branch-23.04' into feature/validate_encodings
etseidl Feb 10, 2023
85c2bf6
move test to python
etseidl Feb 10, 2023
21ed189
Merge branch 'feature/validate_encodings' of github.com:etseidl/cudf …
etseidl Feb 10, 2023
0fa7f3f
Merge branch 'rapidsai:branch-23.04' into feature/validate_encodings
etseidl Feb 10, 2023
b5573eb
update comment about deprecation of BIT_PACKED
etseidl Feb 13, 2023
f1af811
Merge branch 'feature/validate_encodings' of github.com:etseidl/cudf …
etseidl Feb 13, 2023
febeb02
add comment per review suggestion
etseidl Feb 13, 2023
0015e08
Merge branch 'rapidsai:branch-23.04' into feature/validate_encodings
etseidl Feb 13, 2023
aec40aa
add comment to update is_supported_encoding
etseidl Feb 13, 2023
e6fef99
Merge branch 'feature/validate_encodings' of github.com:etseidl/cudf …
etseidl Feb 13, 2023
432c9df
switch check back to host code
etseidl Feb 13, 2023
ae4f895
test for unsupported page encodings and throw an error if one is dete…
etseidl Feb 9, 2023
5494030
add test
etseidl Feb 10, 2023
9b3c00b
use correct version of clang-format
etseidl Feb 10, 2023
509c0ec
replace thrust::any_of with thrust::count_if
etseidl Feb 10, 2023
bdc0f3e
move test to python
etseidl Feb 10, 2023
268f939
update comment about deprecation of BIT_PACKED
etseidl Feb 13, 2023
f05490a
add comment per review suggestion
etseidl Feb 13, 2023
3efbe26
add comment to update is_supported_encoding
etseidl Feb 13, 2023
ecc1179
switch check back to host code
etseidl Feb 13, 2023
d8f5512
checkpoint...can decode one page serially
etseidl Feb 15, 2023
9943162
checkpoint...works multithreaded no, no output
etseidl Feb 15, 2023
a6ab85a
checkpoint...move decode to separate function
etseidl Feb 16, 2023
573fffb
clean up some clang-format mess
etseidl Feb 16, 2023
7dafba6
use cub inclusive scan to roll up the deltas
etseidl Feb 17, 2023
8a0333a
checkpoint. works for non-nested and no nulls
etseidl Feb 17, 2023
49ec0a4
refactor some for reuse later
etseidl Feb 17, 2023
73d7d76
seems to work for nested with nulls and skip_rows now
etseidl Feb 18, 2023
aca8753
checkpoint...delta byte array initialization works
etseidl Feb 21, 2023
27ffd38
checkpoint
etseidl Feb 22, 2023
235449c
refactor DecodeDeltaBinary to use 3 warps
etseidl Feb 23, 2023
7e2ff53
checkpoint...works except for properly decoding strings
etseidl Feb 23, 2023
a0ed800
fix check for end of data
etseidl Feb 23, 2023
a041646
fix delta decoding to work on mini-blocks larger than 32.
etseidl Feb 23, 2023
1e35270
get DeltaForThread working up to 64 bits
etseidl Feb 24, 2023
27c3b03
add looping version of bit unpacking
etseidl Feb 24, 2023
b64de4f
fix problem with producers getting ahead of consumers
etseidl Feb 24, 2023
3033cac
simplify decode logic to stop caring about first value since it's in …
etseidl Feb 24, 2023
916fd3e
checkpoint...strings without prefixes working
etseidl Feb 25, 2023
00874e4
strings work now, but no skip_rows yet
etseidl Feb 25, 2023
d7d4864
cleanup
etseidl Feb 27, 2023
c3b2f9e
more cleanup
etseidl Feb 27, 2023
c656770
checkpoint...kinda working again except skip_rows for nested columns
etseidl Feb 27, 2023
cd23b18
fix to not overwrite page info
etseidl Feb 28, 2023
6dd2aaa
skip_rows working for nested columns now
etseidl Feb 28, 2023
c8a33f3
Merge branch 'feature/validate_encodings' into feature/delta_binary
etseidl Feb 28, 2023
6d01a97
rework ComputePageStringSizes to not need an extra vector. instead add
etseidl Feb 28, 2023
deba306
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Feb 28, 2023
97d5a8c
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 1, 2023
b8c72e0
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 1, 2023
d46ab9c
replace test file for unsupported encoding check
etseidl Mar 2, 2023
1ebb3f9
try a more efficient string copy for delta byte array
etseidl Mar 4, 2023
f16e077
Merge remote-tracking branch 'origin/branch-23.04' into feature/delta…
etseidl Mar 4, 2023
69f825f
fix for block ending on last mini-block
etseidl Mar 6, 2023
208d588
skip string computation when not needed because of skip_rows
etseidl Mar 6, 2023
7e12df0
add struct for DELTA_BYTE_ARRAY decoding, add consts
etseidl Mar 6, 2023
63fcf72
fix for nested data with nulls and skipping
etseidl Mar 6, 2023
98df483
don't pass target_pos to string calc functions
etseidl Mar 6, 2023
66f8f4e
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 6, 2023
9c7a9de
clean up some ints
etseidl Mar 6, 2023
b5bf8ea
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 6, 2023
5ca9112
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 6, 2023
132ab25
remove todo
etseidl Mar 7, 2023
e76ad56
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 7, 2023
852eab2
move find_end lambda to function FindEndOfBlock
etseidl Mar 7, 2023
05a36e9
Merge branch 'feature/delta_binary' of github.com:etseidl/cudf into f…
etseidl Mar 7, 2023
e5d393a
add some more comments, get rid of magic 32's
etseidl Mar 7, 2023
6e6b997
get rid of old-school cast
etseidl Mar 7, 2023
49d8df5
remove FIXME
etseidl Mar 7, 2023
492fbab
add v2 indicator to page flags
etseidl Mar 8, 2023
6028074
more consts
etseidl Mar 8, 2023
83533cd
refactor and rename get_vlq64, add documentation
etseidl Mar 8, 2023
55959da
use loop for bit unpacking
etseidl Mar 8, 2023
55fd2d4
when dealing with mini-blocks, then need to decrement value_count
etseidl Mar 9, 2023
b80fe6e
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 9, 2023
9c47002
use num_encoded_values to indicate intent when comparing to value_count
etseidl Mar 9, 2023
a3bb211
do not compute string sizes when delta_byte_array encoding is used
etseidl Mar 9, 2023
fcedc62
fix string offset calc to use long ints
etseidl Mar 9, 2023
967f7e2
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 9, 2023
3b0c0fd
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 10, 2023
8daaee3
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 12, 2023
67b3b56
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 14, 2023
ef9e2eb
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 15, 2023
08749b7
use shift instead of masking to unset a bit
etseidl Mar 15, 2023
0a3ac27
Merge branch 'feature/delta_binary' of github.com:etseidl/cudf into f…
etseidl Mar 15, 2023
ff28cb9
more fix ups for value_count
etseidl Mar 15, 2023
378743f
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 17, 2023
1a7f09d
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 20, 2023
e1d53a2
clean up some comments
etseidl Mar 21, 2023
f8f65d5
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 21, 2023
c4e31cb
implement suggestion from reviewa. only need to perform check of neig…
etseidl Mar 22, 2023
916afd8
missed a brace
etseidl Mar 22, 2023
32a55be
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 22, 2023
338e0ce
remove check for prefix_lens[i] == 0 in loop
etseidl Mar 22, 2023
2b41eca
simplify StringScan further
etseidl Mar 22, 2023
d57b4b6
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 23, 2023
23341df
split delta decoders into separate kernels to save on shared mem
etseidl Mar 23, 2023
de6a0a3
Merge branch 'rapidsai:branch-23.04' into feature/delta_binary
etseidl Mar 24, 2023
04f906a
rename ComputePageStringSizes
etseidl Mar 29, 2023
eaab701
add some description of the encodings
etseidl Mar 29, 2023
35a46b6
add note about the bit packing
etseidl Mar 29, 2023
a6cca65
rework CalcMiniBlockValues to reduce syncwarp calls
etseidl Mar 29, 2023
e42ecd0
more changes from review
etseidl Mar 29, 2023
01354ca
fix WarpReduce bug caught in review
etseidl Mar 29, 2023
f938b61
Merge branch 'feature/delta_binary' of github.com:etseidl/cudf into f…
etseidl Mar 29, 2023
4fe791d
more comments
etseidl Mar 29, 2023
1ed4f1c
Merge branch 'branch-23.04' into feature/delta_binary
etseidl Mar 29, 2023
a564f7c
missed a name change
etseidl Mar 29, 2023
713fbf0
rework string size calc to use single_lane_block_sum_reduce
etseidl Mar 29, 2023
fdfb3a4
move some work done on the host in decode_page_data to device in Comp…
etseidl Mar 29, 2023
adfb4a5
add comment
etseidl Mar 29, 2023
ab33c0d
fix bug in page string size calc
etseidl Mar 29, 2023
0842190
fix bug in last transform in ComputeDeltaPageStringSizes
etseidl Mar 29, 2023
bdebca0
fix stupid bug
etseidl Mar 29, 2023
5ba6b4d
Merge branch 'rapidsai:branch-23.06' into feature/delta_binary
etseidl Mar 31, 2023
55eb17f
silence (bogus) IDE warnings
etseidl Apr 3, 2023
44c87c0
formatting
etseidl Apr 3, 2023
6f5fb7b
Merge branch 'rapidsai:branch-23.06' into feature/delta_binary
etseidl Apr 3, 2023
43526f9
add character parallel version of string calc. helps for long strings.
etseidl Apr 4, 2023
2f51172
Merge branch 'rapidsai:branch-23.06' into feature/delta_binary
etseidl Apr 5, 2023
d3686f4
Merge remote-tracking branch 'origin/branch-23.06' into feature/delta…
etseidl Apr 8, 2023
606ead3
post merge cleanup
etseidl Apr 8, 2023
5a8b268
restore stuff removed in merge
etseidl Apr 11, 2023
a734a14
Merge branch 'rapidsai:branch-23.06' into feature/delta_binary
etseidl Apr 13, 2023
b1a7f27
Merge remote-tracking branch 'origin/branch-23.06' into feature/delta…
etseidl Apr 14, 2023
b5c9bd4
Merge branch 'rapidsai:branch-23.06' into feature/delta_binary
etseidl Apr 24, 2023
3993d69
Merge branch 'rapidsai:branch-23.06' into feature/delta_binary
etseidl Apr 26, 2023
d893b1d
Merge remote-tracking branch 'origin/branch-23.06' into feature/delta…
etseidl May 3, 2023
cd850da
Merge branch 'feature/delta_binary' of github.com:etseidl/cudf into f…
etseidl May 3, 2023
0b566b9
DOC
raydouglass May 19, 2023
c823dd3
Merge pull request #13416 from rapidsai/branch-23.06
GPUtester May 23, 2023
212b1c0
Merge pull request #13420 from rapidsai/branch-23.06
GPUtester May 23, 2023
905c61e
Merge pull request #13421 from rapidsai/branch-23.06
GPUtester May 23, 2023
097b828
Merge pull request #13425 from rapidsai/branch-23.06
GPUtester May 24, 2023
9a0f87c
Merge pull request #13427 from rapidsai/branch-23.06
GPUtester May 24, 2023
063a924
Inline Cython exception handler (#13411)
vyasr May 24, 2023
aaf9362
Merge pull request #13430 from rapidsai/branch-23.06
GPUtester May 24, 2023
e29c691
Merge pull request #13432 from rapidsai/branch-23.06
GPUtester May 24, 2023
fd13c87
Merge pull request #13436 from rapidsai/branch-23.06
GPUtester May 24, 2023
0f0ebfd
Merge pull request #13439 from rapidsai/branch-23.06
GPUtester May 24, 2023
0536a3a
Merge pull request #13441 from rapidsai/branch-23.06
GPUtester May 25, 2023
5d5d367
Merge pull request #13443 from rapidsai/branch-23.06
GPUtester May 25, 2023
a03da13
Init JNI version 23.08.0-SNAPSHOT (#13401)
pxLi May 25, 2023
126fa35
Merge pull request #13445 from rapidsai/branch-23.06
GPUtester May 25, 2023
7f97b27
Merge pull request #13446 from rapidsai/branch-23.06
GPUtester May 25, 2023
2def7f1
Merge pull request #13447 from rapidsai/branch-23.06
GPUtester May 25, 2023
53c685b
Merge pull request #13448 from rapidsai/branch-23.06
GPUtester May 25, 2023
37f76c8
Merge pull request #13451 from rapidsai/branch-23.06
GPUtester May 25, 2023
5b3e3ab
Reject functions without bytecode from `_can_be_jitted` in GroupBy Ap…
brandon-b-miller May 25, 2023
cc317ed
Separate io-text and nvtext pytests into different files (#13435)
davidwendt May 26, 2023
90bb887
Allow newer scikit-build (#13424)
vyasr May 26, 2023
4384c3b
JNI: Remove cleaned objects in memory cleaner (#13378)
res-life May 30, 2023
6707ab6
Merge remote-tracking branch 'origin/branch-23.06' into feature/delta…
etseidl May 30, 2023
27f41cc
finish merge
etseidl May 30, 2023
9ff14ea
Merge branch 'rapidsai:branch-23.08' into feature/delta_binary
etseidl May 30, 2023
5e12c25
Merge pull request #13471 from rapidsai/branch-23.06
GPUtester May 30, 2023
dff6992
Merge branch 'rapidsai:branch-23.08' into feature/delta_binary
etseidl May 30, 2023
87a8ede
Ensure cccl packages don't clash with upstream version (#13235)
robertmaynard May 30, 2023
d6c0020
Merge branch 'rapidsai:branch-23.08' into feature/delta_binary
etseidl May 31, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading