-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32383][SQL] Preserve hash join (BHJ and SHJ) stream side ordering #29181
Conversation
cc @cloud-fan and @sameeragarwal if you guys can help take a look. Thanks! |
Test build #126288 has finished for PR 29181 at commit
|
Can you double check that the ordering is correct if there are NULLs involved, or outer join conditions? The tricky cases I see:
(1, 'a') But the correct ordering for NULLS LAST is (1, 'a') |
@bart-samwel - just to bring us in the same page. Current spark scala/java implementation for hash join (broadcast hash join and shuffled hash join) has following restriction:
Both of cases you mentioned are to do right outer join, with left stream side. This will not happen. A separate topic: I think it would be interesting to explore support full outer join in shuffled hash join and broadcast hash join where I discussed with @cloud-fan in another PR. I created a JIRA for this now - https://issues.apache.org/jira/browse/SPARK-32399. This should help save shuffle and sort as currently for full outer join, we always do a sort merge join no matter of table size. BTW, does delta engine support full outer join in hash join? Would like to understand more here. Thanks. |
@@ -54,6 +54,8 @@ trait HashJoin extends BaseJoinExec { | |||
|
|||
override def outputPartitioning: Partitioning = streamedPlan.outputPartitioning | |||
|
|||
override def outputOrdering: Seq[SortOrder] = streamedPlan.outputOrdering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be future-proof, we should only do it if the join type allows us to do so. It's fragile to rely on what the join type can be in hash join.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - agree. Updated both outputPartitioning
and outputOrdering
to be based on join type and build side.
Doing full outer for SHJ is not that hard so we should have that. BHJ is harder because you have to merge the "probedness" of all tasks before figuring out which rows you need to emit. (Delta engine will indeed support full outer join in SHJ.) Let's be future proof for these cases! |
@bart-samwel - sounds good. I will work on to support full outer join in SHJ at its current java stack then in https://issues.apache.org/jira/browse/SPARK-32399.
For BHJ, every task gets a copy of whole build side. So I am thinking for each task, iterating all rows for build side, after exhausting stream side, and only emitting rows for its own part (we can rely on hash, e.g. task |
Test build #126427 has finished for PR 29181 at commit
|
@cloud-fan gentle ping, could you help take another look? Thanks. |
thanks, merging to master! |
Thanks @cloud-fan and @bart-samwel for review and discussion! |
What changes were proposed in this pull request?
Currently
BroadcastHashJoinExec
andShuffledHashJoinExec
do not preserve children output ordering information (inherit fromSparkPlan.outputOrdering
, which is Nil). This can add unnecessary sort in complex queries involved multiple joins.Example:
Current physical plan (extra sort on
k1
before top sort merge join):Ideal physical plan (no extra sort on
k1
before top sort merge join):Why are the changes needed?
To avoid unnecessary sort in query, and it has most impact when users read sorted bucketed table.
Though the unnecessary sort is operating on already sorted data, it would have obvious negative impact on IO and query run time if the data is large and external sorting happens.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added unit test in
JoinSuite
.