[WIP][SPARK-31114][SQL] Constraints inferred from equality constraints with cast #27874

wangyum · 2020-03-11T08:01:22Z

What changes were proposed in this pull request?

This PR add support inferred morre constraints from equality constraints with cast, For e.g.,

spark.sql("CREATE TABLE SPARK_31114_1(a BIGINT)")
spark.sql("CREATE TABLE SPARK_31114_2(b DECIMAL(18, 0))")
spark.sql("SELECT t1.* FROM SPARK_31114_1 t1 JOIN SPARK_31114_2 t2 ON t1.a = t2.b AND t1.a = 1L").explain

Before this PR:

== Physical Plan ==
*(2) Project [a#0L]
+- *(2) BroadcastHashJoin [cast(a#0L as decimal(20,0))], [cast(b#1 as decimal(20,0))], Inner, BuildRight
   :- *(2) Project [a#0L]
   :  +- *(2) Filter (isnotnull(a#0L) AND (a#0L = 1))
   :     +- *(2) ColumnarToRow
   :        +- FileScan parquet default.spark_31114_1[a#0L]
   +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, decimal(18,0), true] as decimal(20,0)))), [id=#50]
      +- *(1) Project [b#1]
         +- *(1) Filter isnotnull(b#1)
            +- *(1) ColumnarToRow
               +- FileScan parquet default.spark_31114_2[b#1]

After this PR:

*(2) Project [a#218L]
+- *(2) BroadcastHashJoin [cast(a#218L as decimal(20,0))], [cast(b#219 as decimal(20,0))], Inner, BuildRight
   :- *(2) Project [a#218L]
   :  +- *(2) Filter (isnotnull(a#218L) AND (a#218L = 1))
   :     +- *(2) ColumnarToRow
   :        +- FileScan parquet default.spark_31114_1[a#218L]
   +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, decimal(18,0), true] as decimal(20,0)))), [id=#119]
      +- *(1) Project [b#219]
         +- *(1) Filter ((cast(b#219 as bigint) = 1) AND isnotnull(b#219))
            +- *(1) ColumnarToRow
               +- FileScan parquet default.spark_31114_2[b#219]

Why are the changes needed?

Improve query performance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

SparkQA · 2020-03-11T12:36:48Z

Test build #119669 has finished for PR 27874 at commit 1e92854.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2020-03-14T11:44:35Z

I closed it because this change cannot handle this case:

spark.sql("create table T1(a string)")
spark.sql("create table T2(b string)")
spark.sql("create table T3(c bigint)")
spark.sql("create table T4(d bigint)")

spark.sql(
  """
    |SELECT t1.a, t2.b, t4.d
    |FROM T1 t1 JOIN T2 t2
    |       ON (t1.a = t2.b)
    |     JOIN T3 t3
    |       ON (t1.a = t3.c)
    |     JOIN T4 t4
    |       ON (t3.c = t4.d)
    |""".stripMargin).explain()

Constraints inferred from inequality constraints

1e92854

apache deleted a comment from SparkQA Mar 11, 2020

wangyum closed this Mar 14, 2020

wangyum deleted the SPARK-31114 branch March 14, 2020 11:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][SPARK-31114][SQL] Constraints inferred from equality constraints with cast #27874

[WIP][SPARK-31114][SQL] Constraints inferred from equality constraints with cast #27874

wangyum commented Mar 11, 2020

SparkQA commented Mar 11, 2020

wangyum commented Mar 14, 2020

[WIP][SPARK-31114][SQL] Constraints inferred from equality constraints with cast #27874

[WIP][SPARK-31114][SQL] Constraints inferred from equality constraints with cast #27874

Conversation

wangyum commented Mar 11, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Mar 11, 2020

wangyum commented Mar 14, 2020