[FEA] Improve ORC reader performance for decimal types #13251

vuule · 2023-04-29T00:13:39Z

Decode of decimal files is an order of magnitude slower than decode of integral types.
The reason is the use of a single thread to find the sizes of the next batch of elements, which are then decoded using the whole block. To improve kernel performance, it needs to use multiple threads to find varint boundaries:

Pass 1: every thread runs is_boundary_byte (highest bit == 0) to find if it's at the last byte of a varint element.

A = 0 0 0 0 1 0 0 1 0 0 0 1

Pass 2: Scan A to produce B. Also gets the number of elements (3 in this case).

B = 0 0 0 0 1 1 1 2 2 2 2 3
    ^       ^     ^
    t0      t4    t7

Pass 3: Threads that are on a boundary decode the element that starts at their index and store it at col[t].
t=0 writes to [0]
t=4 writes to [1]
t=7 writes to [2]
Alternatively, step 3 can store the offsets of each element so they can be decoded in parallel.

The text was updated successfully, but these errors were encountered:

vuule · 2023-04-29T00:14:14Z

Additional optimization:

divide your threads into 2 chunks based on the average length of a varint.
Let's say 2 bytes / 1 varint. 2:1
Divide your block of say 768 threads into 2 chunks of 512 and 256.
Overlap generation of the next set of offsets (512 threads) with decoding the last set (256 threads).

GregoryKimball · 2023-05-01T21:13:41Z

Also see #12677 for profiling examples

vuule added cuIO cuIO issue Performance Performance related issue labels Apr 29, 2023

GregoryKimball added this to the ORC continuous improvement milestone Jul 5, 2023

GregoryKimball added libcudf Affects libcudf (C++/CUDA) code. 0 - Backlog In queue waiting for assignment labels Jul 5, 2023

GregoryKimball mentioned this issue Jul 5, 2023

[FEA] Address performance issue with decimal types in ORC reader #12677

Closed

GregoryKimball added the feature request New feature or request label Jul 10, 2023

GregoryKimball changed the title ~~Improve ORC reader performance for decimal types~~ [FEA] Improve ORC reader performance for decimal types Jul 10, 2023

vuule assigned vyasr and vuule Aug 21, 2023

GregoryKimball mentioned this issue Sep 10, 2023

[FEA] Improve ORC reader filtering and performance #13882

Open

vuule mentioned this issue Oct 25, 2023

Optimize ORC writer for decimal columns #14190

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Improve ORC reader performance for decimal types #13251

[FEA] Improve ORC reader performance for decimal types #13251

vuule commented Apr 29, 2023

vuule commented Apr 29, 2023

GregoryKimball commented May 1, 2023

[FEA] Improve ORC reader performance for decimal types #13251

[FEA] Improve ORC reader performance for decimal types #13251

Comments

vuule commented Apr 29, 2023

vuule commented Apr 29, 2023

GregoryKimball commented May 1, 2023