Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add string comparison to AST expressions #8157

Closed
jlowe opened this issue Apr 20, 2023 · 3 comments · Fixed by #9672
Closed

[FEA] Add string comparison to AST expressions #8157

jlowe opened this issue Apr 20, 2023 · 3 comments · Fixed by #9672
Assignees
Labels
feature request New feature or request performance A performance related task/issue

Comments

@jlowe
Copy link
Member

jlowe commented Apr 20, 2023

rapidsai/cudf#13072 adds Java bindings for string literals and string comparisons to AST expressions. We should extend the RAPIDS Accelerator functionality to leverage this in AST expressions which may appear in join conditions.

@jlowe jlowe added feature request New feature or request ? - Needs Triage Need team to review and classify labels Apr 20, 2023
@mattahrens mattahrens added performance A performance related task/issue feature request New feature or request and removed feature request New feature or request ? - Needs Triage Need team to review and classify labels Apr 25, 2023
@nvliyuan
Copy link
Collaborator

Since one customer hit the same fallback, just posted a reproduction here:

spark.sql("select t1.col4 as tt, t2.name as tt2 from datavalid2 t1 LEFT OUTER JOIN student t2 on lower(trim(t1.col4)) < lower(trim(name))").show

fallback info:

!Exec <BroadcastNestedLoopJoinExec> cannot run on GPU because not all expressions can be replaced
      !Expression <LessThan> (lower(trim(col4#96, None)) < lower(trim(name#102, None))) cannot run on GPU because AST is required and lhs expression Lower lower(trim(col4#96, None)) (StringType is not supported); AST is required and rhs expression Lower lower(trim(name#102, None)) (StringType is not supported)
        !Expression <Lower> lower(trim(col4#96, None)) cannot run on GPU because AST is required and this expression does not support AST
          !Expression <StringTrim> trim(col4#96, None) cannot run on GPU because AST is required and this expression does not support AST
            !Expression <AttributeReference> col4#96 cannot run on GPU because AST is required and expression AttributeReference col4#96 produces an unsupported type StringType
        !Expression <Lower> lower(trim(name#102, None)) cannot run on GPU because AST is required and this expression does not support AST
          !Expression <StringTrim> trim(name#102, None) cannot run on GPU because AST is required and this expression does not support AST
            !Expression <AttributeReference> name#102 cannot run on GPU because AST is required and expression AttributeReference name#102 produces an unsupported type StringType

CC @viadea

@revans2
Copy link
Collaborator

revans2 commented Jul 27, 2023

This is not super simple to do with just AST. CUDF AST supports string comparisons, but it does not support trim or lower, and because those would need to allocate GPU memory to support the temporary output it is unlikely that we would ever be able to support it. So to make this work we would have to rewrite the query to be something like

val tmpt1 = t1.selectExpr("*", "lower(trim(col4)) as my_join_key_t1")
val tmpt2 = t2.selectExpr("*", lower(trim(name)) as my_join_key_t2")
tmpt1.join(tmpt2, tmpt1("my_join_key_t1") < tmpt2("my_join_key_t2"), joinType="left outer")...

So this one feature is not going to be enough to support the example query. I'll file an issue for this.

@revans2
Copy link
Collaborator

revans2 commented Jul 27, 2023

@viadea and @nvliyuan I filed #8832 to hopefully support the query rewrite needed to get the above example to pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request performance A performance related task/issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants