Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA][Follow On] Support non AST rewrite as join condition #9661

Open
6 tasks
winningsix opened this issue Nov 8, 2023 · 1 comment
Open
6 tasks

[FEA][Follow On] Support non AST rewrite as join condition #9661

winningsix opened this issue Nov 8, 2023 · 1 comment
Labels
feature request New feature or request

Comments

@winningsix
Copy link
Collaborator

winningsix commented Nov 8, 2023

Is your feature request related to a problem? Please describe.
There're some improvements we can make for AST rewrite.

          !Expression <Cast> cast((length(leftstr#301) + length(rightstr#296)) as bigint) cannot run on GPU because AST is required and this expression does not support AST
              @Expression <Add> (length(leftstr#301) + length(rightstr#296)) could run on GPU
                !Expression <Length> length(leftstr#301) cannot run on GPU because AST is required and this expression does not support AST
                  !Expression <AttributeReference> leftstr#301 cannot run on GPU because AST is required and expression AttributeReference leftstr#301 produces an unsupported type StringType
                !Expression <Length> length(rightstr#296) cannot run on GPU because AST is required and this expression does not support AST
                  !Expression <AttributeReference> rightstr#296 cannot run on GPU because AST is required and expression AttributeReference rightstr#296 produces an unsupported type StringType

It's more straightforward to AST failure reason results from 1) AST not supported by cast; 2) Failed to rewrite AST condition since contains cross childern attributes. Other message can share the same reason (e.g., cannot run on GPU because cast in this case).

Describe the solution you'd like
Extract non AST-able condition and push down to child nodes for conditional equal equal.

Describe alternatives you've considered
More feature coverage in Cudf. But it needs more time and some cases are not likely to be implemented.

Additional context
This is a tracking issue for some follow-ups of #9635 which introduced the AST basic framework and related support on BroadcastNestedLoopJoin.

@winningsix
Copy link
Collaborator Author

Regards to "Support conditional equi-joins with non-AST rewrite" above, it seems not applicable to have non-ast-split there. It seems we should close it.

Given this left.join(broadcast(right), f.round(left.a).cast('integer') == f.round(f.log(right.r_a).cast('integer')), join_type) as an example, join condition will be split into left and right key expressions in GpuBroadcastHashJoinExec.scala

And it will be evaluated in separated project nodes for build and stream sides accordingly.

Build side:

withResource(GpuProjectExec.project(built.getBatch, boundBuiltKeys)) { builtKeys =>
// ensure that the build data can be spilled
built.allowSpilling()
joinGatherer(builtKeys, built, streamBatch)

Stream side:

withResource(GpuProjectExec.project(streamCb.getBatch, boundStreamKeys)) { streamKeys =>
// ensure we make the stream side spillable again
streamCb.allowSpilling()
joinGatherer(buildKeys, LazySpillableColumnarBatch.spillOnly(buildData), streamKeys, streamCb)
}

The remain case I came up is that a join condition involves columns across both sides. In that case, it's also not a split-able case.

Explained Plan

GpuBroadcastHashJoin [gpuround(a#22, 0, IntegerType)], [gpuround(cast(LOG(cast(r_a#30 as double)) as int), 0, IntegerType)], Cross, GpuBuildRight
:- GpuRowToColumnar targetsize(104857600)
:  +- *(1) Scan ExistingRDD[a#22,b#23]
+- GpuBroadcastExchange HashedRelationBroadcastMode(List(cast(round(cast(ln(cast(input[0, int, true] as double)) as int), 0) as bigint)),false), [plan_id=107]
   +- GpuProject [a#26 AS r_a#30, b#27 AS r_b#33]
      +- GpuRowToColumnar targetsize(104857600)
         +- *(2) Scan ExistingRDD[a#26,b#27]

@revans2 @jlowe any thoughts here?

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants