Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use row encoding for SortExec #5292

Closed
wants to merge 44 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
ec44910
modify sort_batch to use arrow row format for multi-column sorts
jaylmiller Feb 10, 2023
c7c43e4
use row encoding for in memory partial sorting within SortExec
jaylmiller Feb 11, 2023
2e1c143
Revert preserving row encoding changes
jaylmiller Feb 12, 2023
2ab03ee
add bench for SortExec
jaylmiller Feb 12, 2023
11be061
fix preserve partitioning case to run every partition instead of just…
jaylmiller Feb 13, 2023
730a89c
rough draft: sorting works.
jaylmiller Feb 12, 2023
4b1ea08
Merge branch 'apache:master' into master
jaylmiller Feb 14, 2023
a64705f
Merge branch 'sort-preserve-row-encoding'
jaylmiller Feb 14, 2023
7aaddb5
remove some todos
jaylmiller Feb 14, 2023
96a2e15
fix clippy warnings
jaylmiller Feb 14, 2023
a3c632c
checkpointing
jaylmiller Feb 14, 2023
c51b23c
SortPreservingMergeStream emits row encodings when used from SortExec
jaylmiller Feb 14, 2023
0f7bfc3
spill logic working (w/ a temporary serialization format solution)
jaylmiller Feb 15, 2023
cdf72d8
Merge branch 'sort-exec'
jaylmiller Feb 15, 2023
e313134
add row encoding sizes to spill calculations.
jaylmiller Feb 15, 2023
8b2450b
clean comments and small todos
jaylmiller Feb 15, 2023
085a871
cleanup SortedStream types
jaylmiller Feb 16, 2023
4196a25
add SortExec input case to each merge bench case
jaylmiller Feb 16, 2023
d4f5c10
row serialization format
jaylmiller Feb 16, 2023
33c611c
RowBatch construction re-use row ref offsets instead of just appending
jaylmiller Feb 17, 2023
331d205
Merge branch 'apache:main' into master
jaylmiller Feb 17, 2023
cba6d30
Merge branch 'apache:main' into master
jaylmiller Feb 18, 2023
1513f9a
dont need to keep array refs if we use rows
jaylmiller Feb 18, 2023
2ebcbc7
dont use channel for SortedSizedStream in sort (emit tuple)
jaylmiller Feb 18, 2023
7354952
add unit test for edge case where we skip spilling the row data
jaylmiller Feb 18, 2023
e6fe175
fix sort bench to actually use full data set in non-preserve partitio…
jaylmiller Feb 18, 2023
353815b
Merge branch 'apache:main' into master
jaylmiller Feb 20, 2023
ec49492
clippy
jaylmiller Feb 20, 2023
60b8e6f
Merge branch 'apache:main' into master
jaylmiller Feb 20, 2023
e430470
add data skewed to first partition case for the tuple sorts in bench
jaylmiller Feb 21, 2023
a55d34e
Merge branch 'apache:main' into master
jaylmiller Feb 21, 2023
279c6f5
clippy err
jaylmiller Feb 21, 2023
d42f380
Merge branch 'apache:main' into master
jaylmiller Feb 23, 2023
eeb1e9c
Merge branch 'apache:main' into master
jaylmiller Mar 2, 2023
ba08237
Merge branch 'apache:main' into master
jaylmiller Mar 3, 2023
e80578b
Merge branch 'apache:main' into master
jaylmiller Mar 4, 2023
b82545e
use mergesort in the merge step of sort exec
jaylmiller Mar 4, 2023
c685a0d
remove experimental bench cases
jaylmiller Mar 4, 2023
08b3fe5
move gating logic outside of sort_batch
jaylmiller Mar 4, 2023
790546f
dont use row encoding on single batch code path
jaylmiller Mar 4, 2023
d13912b
batch insertion order fix
jaylmiller Mar 6, 2023
fe79275
Merge branch 'apache:main' into master
jaylmiller Mar 11, 2023
0a37892
Merge branch 'apache:main' into master
jaylmiller Mar 14, 2023
d202638
Merge branch 'apache:main' into master
jaylmiller Mar 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions datafusion/core/src/physical_plan/sorts/cursor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,11 @@
// specific language governing permissions and limitations
// under the License.

use arrow::row::{Row, Rows};
use arrow::row::Row;
use std::cmp::Ordering;

use super::RowBatch;

/// A `SortKeyCursor` is created from a `RecordBatch`, and a set of
/// `PhysicalExpr` that when evaluated on the `RecordBatch` yield the sort keys.
///
Expand All @@ -35,7 +37,7 @@ pub struct SortKeyCursor {
// An id uniquely identifying the record batch scanned by this cursor.
batch_id: usize,

rows: Rows,
rows: RowBatch,
}

impl std::fmt::Debug for SortKeyCursor {
Expand All @@ -50,7 +52,7 @@ impl std::fmt::Debug for SortKeyCursor {

impl SortKeyCursor {
/// Create a new SortKeyCursor
pub fn new(stream_idx: usize, batch_id: usize, rows: Rows) -> Self {
pub fn new(stream_idx: usize, batch_id: usize, rows: RowBatch) -> Self {
Self {
stream_idx,
cur_row: 0,
Expand Down
Loading