Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR]: Refactor to increase readability #5874

Merged
merged 7 commits into from
Apr 6, 2023
Merged

[MINOR]: Refactor to increase readability #5874

merged 7 commits into from
Apr 6, 2023

Conversation

mustafasrepo
Copy link
Contributor

Which issue does this PR close?

N.A

Rationale for this change

This is a refactoring PR to increase code readability.

What changes are included in this PR?

can_skip_sort and check_alignment now return Result<Option<bool>> instead of Result<(bool, bool)>. Also since arrow now supports ! on SortOptions we remove reverse_sort_options method.

Are these changes tested?

Existing tests should work.

Are there any user-facing changes?

No.

@github-actions github-actions bot added core Core DataFusion crate physical-expr Physical Expressions labels Apr 5, 2023
@mustafasrepo mustafasrepo changed the title Refactor to increase readability (MINOR changes) [MINOR]: Refactor to increase readability Apr 5, 2023
/// This function finds the partition points according to `partition_columns`.
/// If there are no sort columns, then the result will be a single element
/// vector containing one partition range spanning all data.
pub fn evaluate_partition_points(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks. Some nit: maybe we can invent better naming for evaluation_partition_points Partition point I found definitions in Informatica system, not sure if its the same we calculate here.

Also method returns lexicographical_partition_ranges so maybe we can name method similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked up to the Informatica system definition for partition point (I guess what you referred is this definition). These are different. If it is misleading, it is better to rename this function. What do you think about renaming it with evaluate_partition_ranges?.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partition ranges sounds more consistent to me, as in fact in method if branches we work to identify exactly ranges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree evaluate_partition_ranges sounds more specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed function evaluate_partition_points to evaluate_partition_ranges. Thanks for the suggestions @comphead.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice cleanup to me -- thank you @mustafasrepo and @comphead

@@ -281,7 +261,7 @@ pub fn reverse_order_bys(order_bys: &[PhysicalSortExpr]) -> Vec<PhysicalSortExpr
.iter()
.map(|e| PhysicalSortExpr {
expr: e.expr.clone(),
options: reverse_sort_options(e.options),
options: !e.options,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@@ -628,23 +625,6 @@ impl SortedPartitionByBoundedWindowStream {
.map(|e| e.evaluate_to_sort_column(batch))
.collect::<Result<Vec<_>>>()
}

/// evaluate the partition points given the sort columns; if the sort columns are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is nice to avoid this repetition 👍

/// This function finds the partition points according to `partition_columns`.
/// If there are no sort columns, then the result will be a single element
/// vector containing one partition range spanning all data.
pub fn evaluate_partition_points(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree evaluate_partition_ranges sounds more specific.

@alamb alamb merged commit e84ef37 into apache:main Apr 6, 2023
@alamb
Copy link
Contributor

alamb commented Apr 6, 2023

Thanks @mustafasrepo and @comphead

@mustafasrepo mustafasrepo deleted the feature/refactor_se branch April 6, 2023 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants