-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MINOR]: Refactor to increase readability #5874
Conversation
# Conflicts: # datafusion/core/src/physical_optimizer/sort_enforcement.rs
datafusion/common/src/utils.rs
Outdated
/// This function finds the partition points according to `partition_columns`. | ||
/// If there are no sort columns, then the result will be a single element | ||
/// vector containing one partition range spanning all data. | ||
pub fn evaluate_partition_points( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks. Some nit: maybe we can invent better naming for evaluation_partition_points
Partition point I found definitions in Informatica system, not sure if its the same we calculate here.
Also method returns lexicographical_partition_ranges
so maybe we can name method similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked up to the Informatica system definition for partition point (I guess what you referred is this definition). These are different. If it is misleading, it is better to rename this function. What do you think about renaming it with evaluate_partition_ranges
?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partition ranges sounds more consistent to me, as in fact in method if
branches we work to identify exactly ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree evaluate_partition_ranges
sounds more specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have renamed function evaluate_partition_points
to evaluate_partition_ranges
. Thanks for the suggestions @comphead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a nice cleanup to me -- thank you @mustafasrepo and @comphead
@@ -281,7 +261,7 @@ pub fn reverse_order_bys(order_bys: &[PhysicalSortExpr]) -> Vec<PhysicalSortExpr | |||
.iter() | |||
.map(|e| PhysicalSortExpr { | |||
expr: e.expr.clone(), | |||
options: reverse_sort_options(e.options), | |||
options: !e.options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
@@ -628,23 +625,6 @@ impl SortedPartitionByBoundedWindowStream { | |||
.map(|e| e.evaluate_to_sort_column(batch)) | |||
.collect::<Result<Vec<_>>>() | |||
} | |||
|
|||
/// evaluate the partition points given the sort columns; if the sort columns are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is nice to avoid this repetition 👍
datafusion/common/src/utils.rs
Outdated
/// This function finds the partition points according to `partition_columns`. | ||
/// If there are no sort columns, then the result will be a single element | ||
/// vector containing one partition range spanning all data. | ||
pub fn evaluate_partition_points( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree evaluate_partition_ranges
sounds more specific.
Thanks @mustafasrepo and @comphead |
Which issue does this PR close?
N.A
Rationale for this change
This is a refactoring PR to increase code readability.
What changes are included in this PR?
can_skip_sort
andcheck_alignment
now returnResult<Option<bool>>
instead ofResult<(bool, bool)>
. Also since arrow now supports!
onSortOptions
we removereverse_sort_options
method.Are these changes tested?
Existing tests should work.
Are there any user-facing changes?
No.