-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: propagate EmptyRelation for more join types #10963
feat: propagate EmptyRelation for more join types #10963
Conversation
dda1845
to
5a82986
Compare
@@ -400,6 +433,182 @@ mod tests { | |||
assert_together_optimized_plan_eq(plan, expected) | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm realizing these could/should be slt tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having some end to end slt would be good too, but for these rules I think having good unit tests are key as well
01)LeftSemi Join: t1.t1_id = __correlated_sq_1.t2_id | ||
02)--TableScan: t1 projection=[t1_id, t1_name] | ||
03)--EmptyRelation | ||
logical_plan EmptyRelation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the EmptyRelation in the original test is causing the entire join to be optimized out with the additional join types now propagating emptyrelation.
Is there a way I can preserve this test's testing of just the subquery rewrite logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
causing the entire join to be optimized out with the additional join types now propagating emptyrelation.
I agree with your assesment. I think the limit 0
means this plan is better
In terms of preserving the test of the subquery rewrite I think it does (otherwise it wouldn't be rewritten to a SEMI JOIN and thus the join couldn't be removed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tshauck -- I think this looks like a nice improvement to me
I agree it would be nice to add some slt tests for this feature as well as maybe reduce repetition in the unit tests, but I don't think they are required to merge the PR
I'll plan to leave this one open for another day or two to give others a chance to review it as well if they would like
@@ -400,6 +433,182 @@ mod tests { | |||
assert_together_optimized_plan_eq(plan, expected) | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having some end to end slt would be good too, but for these rules I think having good unit tests are key as well
01)LeftSemi Join: t1.t1_id = __correlated_sq_1.t2_id | ||
02)--TableScan: t1 projection=[t1_id, t1_name] | ||
03)--EmptyRelation | ||
logical_plan EmptyRelation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
causing the entire join to be optimized out with the additional join types now propagating emptyrelation.
I agree with your assesment. I think the limit 0
means this plan is better
In terms of preserving the test of the subquery rewrite I think it does (otherwise it wouldn't be rewritten to a SEMI JOIN and thus the join couldn't be removed)
if left_empty || right_empty { | ||
return Ok(Transformed::yes(LogicalPlan::EmptyRelation( | ||
EmptyRelation { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked that this code faithfully implements the semantics in the comments, and I did some mental review and verification to convince myself that these semantics are correct
i wonder if @jackwener has a moment to double check too?
} | ||
|
||
#[test] | ||
fn test_right_anti_join_empty_right_table() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also add the negative cases ? Like test right antijoin_empty_left_table and assert it wasn't rewritten?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see
datafusion/datafusion/optimizer/src/propagate_empty_relation.rs
Lines 504 to 557 in d8bd11a
fn test_join_empty_propagation_rules() -> Result<()> { | |
// test left join with empty left | |
empty_left_and_right_lp(true, false, JoinType::Left, true)?; | |
// test right join with empty right | |
empty_left_and_right_lp(false, true, JoinType::Right, true)?; | |
// test left semi join with empty left | |
empty_left_and_right_lp(true, false, JoinType::LeftSemi, true)?; | |
// test left semi join with empty right | |
empty_left_and_right_lp(false, true, JoinType::LeftSemi, true)?; | |
// test right semi join with empty left | |
empty_left_and_right_lp(true, false, JoinType::RightSemi, true)?; | |
// test right semi join with empty right | |
empty_left_and_right_lp(false, true, JoinType::RightSemi, true)?; | |
// test left anti join empty left | |
empty_left_and_right_lp(true, false, JoinType::LeftAnti, true)?; | |
// test right anti join empty right | |
empty_left_and_right_lp(false, true, JoinType::RightAnti, true) | |
} | |
#[test] | |
fn test_join_empty_propagation_rules_noop() -> Result<()> { | |
// these cases should not result in an empty relation | |
// test left join with empty right | |
empty_left_and_right_lp(false, true, JoinType::Left, false)?; | |
// test right join with empty left | |
empty_left_and_right_lp(true, false, JoinType::Right, false)?; | |
// test left semi with non-empty left and right | |
empty_left_and_right_lp(false, false, JoinType::LeftSemi, false)?; | |
// test right semi with non-empty left and right | |
empty_left_and_right_lp(false, false, JoinType::RightSemi, false)?; | |
// test left anti join with non-empty left and right | |
empty_left_and_right_lp(false, false, JoinType::LeftAnti, false)?; | |
// test left anti with non-empty left and empty right | |
empty_left_and_right_lp(false, true, JoinType::LeftAnti, false)?; | |
// test right anti join with non-empty left and right | |
empty_left_and_right_lp(false, false, JoinType::RightAnti, false)?; | |
// test right anti with empty left and non-empty right | |
empty_left_and_right_lp(true, false, JoinType::RightAnti, false) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these rewrites are correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for some small details, I have no problem with this PR.
In addition, I hope to see more slt
tests.
Thanks @tshauck
datafusion/optimizer/src/test/mod.rs
Outdated
pub fn assert_optimized_plan_eq_with_rules( | ||
rules: Vec<Arc<dyn OptimizerRule + Send + Sync>>, | ||
plan: LogicalPlan, | ||
expected: &str, | ||
) -> Result<()> { | ||
let optimized_plan = generate_optimized_plan_with_rules(rules, plan); | ||
let formatted_plan = format!("{optimized_plan:?}"); | ||
assert_eq!(formatted_plan, expected); | ||
Ok(()) | ||
} | ||
|
||
pub fn assert_optimized_plan_ne_with_rules( | ||
rules: Vec<Arc<dyn OptimizerRule + Send + Sync>>, | ||
plan: LogicalPlan, | ||
expected: &str, | ||
) -> Result<()> { | ||
let optimized_plan = generate_optimized_plan_with_rules(rules, plan); | ||
let formatted_plan = format!("{optimized_plan:?}"); | ||
assert_ne!(formatted_plan, expected); | ||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can merge them into one function and use a param to determine assert_ne
or assert_eq
Like
pub fn assert_optimized_plan(
rules: Vec<Arc<dyn OptimizerRule + Send + Sync>>,
plan: LogicalPlan,
expected: &str,
eq: bool,
) -> Result<()> {
let optimized_plan = generate_optimized_plan_with_rules(rules, plan);
let formatted_plan = format!("{optimized_plan:?}");
if eq {
assert_eq!(formatted_plan, expected);
} else {
assert_ne!(formatted_plan, expected);
}
Ok(())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 401f2e2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me -- thank you @tshauck @jonahgao and @jackwener
* feat: propagate empty for more join types * feat: update subquery de-correlation test * tests: simplify tests * refactor: better name * style: clippy * refactor: update tests * refactor: rename * refactor: fix spellings * add slt tests
* feat: propagate empty for more join types * feat: update subquery de-correlation test * tests: simplify tests * refactor: better name * style: clippy * refactor: update tests * refactor: rename * refactor: fix spellings * add slt tests
* feat: propagate empty for more join types * feat: update subquery de-correlation test * tests: simplify tests * refactor: better name * style: clippy * refactor: update tests * refactor: rename * refactor: fix spellings * add slt tests
Which issue does this PR close?
Doesn't fully close, but related to #10967
Rationale for this change
Currently the
PropagateEmptyRelation
relation only can optimize inner joins (probably the most important).This PR updates the join handling to include simple cases for left, right, left-semi, right-semi, left-anti, and right-anti joins.
What changes are included in this PR?
Slight refactoring of the join match statement
PropagateEmptyRelation::rewrite
then add the aforementioned joins.Are these changes tested?
Add tests for the added join types to confirm they're rewritten to an empty relation.
Are there any user-facing changes?
No