Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][SPARK-31114][SQL] Constraints inferred from equality constraints with cast #27874

Closed
wants to merge 1 commit into from
Closed

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Mar 11, 2020

What changes were proposed in this pull request?

This PR add support inferred morre constraints from equality constraints with cast, For e.g.,

spark.sql("CREATE TABLE SPARK_31114_1(a BIGINT)")
spark.sql("CREATE TABLE SPARK_31114_2(b DECIMAL(18, 0))")
spark.sql("SELECT t1.* FROM SPARK_31114_1 t1 JOIN SPARK_31114_2 t2 ON t1.a = t2.b AND t1.a = 1L").explain

Before this PR:

== Physical Plan ==
*(2) Project [a#0L]
+- *(2) BroadcastHashJoin [cast(a#0L as decimal(20,0))], [cast(b#1 as decimal(20,0))], Inner, BuildRight
   :- *(2) Project [a#0L]
   :  +- *(2) Filter (isnotnull(a#0L) AND (a#0L = 1))
   :     +- *(2) ColumnarToRow
   :        +- FileScan parquet default.spark_31114_1[a#0L]
   +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, decimal(18,0), true] as decimal(20,0)))), [id=#50]
      +- *(1) Project [b#1]
         +- *(1) Filter isnotnull(b#1)
            +- *(1) ColumnarToRow
               +- FileScan parquet default.spark_31114_2[b#1]

After this PR:

*(2) Project [a#218L]
+- *(2) BroadcastHashJoin [cast(a#218L as decimal(20,0))], [cast(b#219 as decimal(20,0))], Inner, BuildRight
   :- *(2) Project [a#218L]
   :  +- *(2) Filter (isnotnull(a#218L) AND (a#218L = 1))
   :     +- *(2) ColumnarToRow
   :        +- FileScan parquet default.spark_31114_1[a#218L]
   +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, decimal(18,0), true] as decimal(20,0)))), [id=#119]
      +- *(1) Project [b#219]
         +- *(1) Filter ((cast(b#219 as bigint) = 1) AND isnotnull(b#219))
            +- *(1) ColumnarToRow
               +- FileScan parquet default.spark_31114_2[b#219]

Why are the changes needed?

Improve query performance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@apache apache deleted a comment from SparkQA Mar 11, 2020
@SparkQA
Copy link

SparkQA commented Mar 11, 2020

Test build #119669 has finished for PR 27874 at commit 1e92854.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Mar 14, 2020

I closed it because this change cannot handle this case:

spark.sql("create table T1(a string)")
spark.sql("create table T2(b string)")
spark.sql("create table T3(c bigint)")
spark.sql("create table T4(d bigint)")

spark.sql(
  """
    |SELECT t1.a, t2.b, t4.d
    |FROM T1 t1 JOIN T2 t2
    |       ON (t1.a = t2.b)
    |     JOIN T3 t3
    |       ON (t1.a = t3.c)
    |     JOIN T4 t4
    |       ON (t3.c = t4.d)
    |""".stripMargin).explain()

@wangyum wangyum closed this Mar 14, 2020
@wangyum wangyum deleted the SPARK-31114 branch March 14, 2020 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants