Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect when filters on unique constraints make subqueries scalar #8312

Merged
merged 2 commits into from
Dec 6, 2023

Conversation

Jesse-Bakker
Copy link
Contributor

Which issue does this PR close?

Part of #3725

Rationale for this change

In some cases, it is possible to prove that a Filter only ever produces one
row. In those cases, such a filter may be used in a scalar subquery to ensure
it is in fact scalar, allowing for more flexible use of scalar subqueries.

What changes are included in this PR?

This adds an is_scalar() method to Filter, which will check if there is a
unique functional dependence which is covered by the Filter's predicate.
This is used in LogicalPlan::max_rows() to provide a tighter bound on the
maximum number of rows returned in the presence of Filters.

Are these changes tested?

This is directly tested with a unit test and with a sqllogictest that exercises
the more flexible use of scalar subqueries.

Are there any user-facing changes?

Scalar subqueries are more flexible. Previous constraints were not documented
as far as I can tell

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Nov 23, 2023
@Jesse-Bakker Jesse-Bakker force-pushed the scalar-filter branch 3 times, most recently from 2f4cb79 to 44383c2 Compare November 23, 2023 09:28
@Jesse-Bakker Jesse-Bakker changed the title Detect when filters make subqueries scalar Detect when filters on unique constraints make subqueries scalar Nov 23, 2023
@alamb
Copy link
Contributor

alamb commented Nov 27, 2023

Thank you @Jesse-Bakker -- I plan to review this carefully tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Jesse-Bakker -- this looks great. I had a few suggestions, mostly about comments, and code reuse. But otherwise I think this PR is pretty much ready

cc @liukun4515 and @jackwener who may have some more thoughts about subquery rewrites

datafusion/expr/src/logical_plan/plan.rs Outdated Show resolved Hide resolved
datafusion/expr/src/logical_plan/plan.rs Show resolved Hide resolved
datafusion/expr/src/logical_plan/plan.rs Outdated Show resolved Hide resolved
@alamb
Copy link
Contributor

alamb commented Nov 28, 2023

The only thing I think is strictly needed to approve this PR is docstrings for "is_scalar" -- the rest is "nice to have" from my perspective / could be done as a follow on PR

@jackwener
Copy link
Member

jackwener commented Nov 30, 2023

unique constraints and function dependences is complex feature (Worthy of being carefully designed), I prepare to think more about it.

I will review this PR and think more about it. If this PR isn't urgent, please wait for me.

Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jesse-Bakker

/// `Filter(b = 2).is_scalar() == false`
/// and
/// `Filter(a = 2 OR b = 2).is_scalar() == false`
fn is_scalar(&self) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest use Uniform slot and Unique slot to describe FDs instead of scalar.

Commonly used Functional dependencies : including

uniform slot, which means ndv <= 1
unique slot, which means ndv = row

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the suggestion is to rename this method to Filter::is_uniform()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and add these definition into doc/

datafusion/expr/src/logical_plan/plan.rs Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Jesse-Bakker - let's see if @jackwener has a chance to clarify. If we haven't heard by tomorrow I'll merge this PR

datafusion/expr/src/logical_plan/plan.rs Show resolved Hide resolved
Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jesse-Bakker @alamb

I have no other question.

@alamb
Copy link
Contributor

alamb commented Dec 6, 2023

I took the liberty of merging up from main to resolve a conflict in this PR

@alamb alamb merged commit c8e1c84 into apache:main Dec 6, 2023
22 checks passed
@alamb
Copy link
Contributor

alamb commented Dec 6, 2023

Thanks again @Jesse-Bakker and @jackwener

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants