Skip to content

Commit

Permalink
fix for skip_rows on with page-spanning rows (#14557)
Browse files Browse the repository at this point in the history
Fixes an issue detected in Spark where string data was being corrupted due to an incorrect page size calculation.

Closes #14560 

Authors:
   - Ed Seidl (https://github.com/etseidl)

Approvers:
   - Alessandro Bellina (https://github.com/abellina)
   - Yunsong Wang (https://github.com/PointKernel)
   - Vukasin Milovanovic (https://github.com/vuule)
   - Nghia Truong (https://github.com/ttnghia)
   - Mike Wilson (https://github.com/hyperbolic2346)
  • Loading branch information
etseidl authored Dec 5, 2023
1 parent 0a56305 commit 31aedf2
Showing 1 changed file with 19 additions and 5 deletions.
24 changes: 19 additions & 5 deletions cpp/src/io/parquet/page_string_decode.cu
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,25 @@ __device__ thrust::pair<int, int> page_bounds(page_state_s* const s,
bool skipped_values_set = false;
bool end_value_set = false;

// If page_start_row >= min_row, then skipped_values is 0 and we don't have to search for
// start_value. If there's repetition then we've already calculated
// skipped_values/skipped_leaf_values.
// TODO(ets): If we hit this condition, and end_row > last row in page, then we can skip
// more of the processing below.
if (has_repetition or page_start_row >= min_row) {
if (t == 0) {
if (has_repetition) {
skipped_values = pp->skipped_values;
skipped_leaf_values = pp->skipped_leaf_values;
} else {
skipped_values = 0;
skipped_leaf_values = 0;
}
}
skipped_values_set = true;
__syncthreads();
}

while (processed < s->page.num_input_values) {
thread_index_type start_val = processed;

Expand All @@ -150,11 +169,6 @@ __device__ thrust::pair<int, int> page_bounds(page_state_s* const s,

// special case where page does not begin at a row boundary
if (processed == 0 && rep_decode[0] != 0) {
if (t == 0) {
skipped_values = 0;
skipped_leaf_values = 0;
}
skipped_values_set = true;
end_row++; // need to finish off the previous row
row_fudge = 0;
}
Expand Down

0 comments on commit 31aedf2

Please sign in to comment.