Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add assert on hash children partition count #5768

Merged
merged 9 commits into from
Apr 6, 2023

Conversation

duongcongtoai
Copy link
Contributor

@duongcongtoai duongcongtoai commented Mar 28, 2023

Which issue does this PR close?

Closes #5738

Rationale for this change

Adding a small validation for join executors to prevent user from providing children inputs having different partition count, causing unexpected behavior

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate optimizer Optimizer rules labels Mar 28, 2023
@duongcongtoai duongcongtoai changed the title add assert on hash children partition count Add assert on hash children partition count Mar 28, 2023
@github-actions github-actions bot removed the optimizer Optimizer rules label Mar 29, 2023
@duongcongtoai duongcongtoai marked this pull request as ready for review April 3, 2023 19:01
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @duongcongtoai -- this is a great idea. It looks like there is a conflict that needs to be resolved, but otherwise this looks great

if self.mode == PartitionMode::Partitioned
&& self.left.output_partitioning() != self.right.output_partitioning()
{
return Err(DataFusionError::Plan(format!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely caused by some sort of code error, so an internal error may make more sense

Suggested change
return Err(DataFusionError::Plan(format!(
return Err(DataFusionError::Internal(format!(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment applies to the code below as well

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @duongcongtoai

@duongcongtoai
Copy link
Contributor Author

Test is failing, i'll check

@alamb
Copy link
Contributor

alamb commented Apr 6, 2023

I took the liberty to push a fix for the clippy CI failure: https://github.com/apache/arrow-datafusion/actions/runs/4622652579/jobs/8194716556 to your branch in a29658b

I also merged up from main

@alamb alamb merged commit c5e935a into apache:main Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in HashJoin output_partition cause the input from left input not fully executed
2 participants