Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28481][SQL] More expressions should extend NullIntolerant #28626

Closed
wants to merge 8 commits into from
Closed

[SPARK-28481][SQL] More expressions should extend NullIntolerant #28626

wants to merge 8 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented May 24, 2020

What changes were proposed in this pull request?

  1. Make more expressions extend NullIntolerant.
  2. Add a checker(in ExpressionInfoSuite) to identify whether the expression is NullIntolerant.

Why are the changes needed?

Avoid skew join if the join column has many null values and can improve query performance. For examples:

CREATE TABLE t1(c1 string, c2 string) USING parquet;
CREATE TABLE t2(c1 string, c2 string) USING parquet;
EXPLAIN SELECT t1.* FROM t1 JOIN t2 ON upper(t1.c1) = upper(t2.c1);

Before and after this PR:

== Physical Plan ==
*(2) Project [c1#5, c2#6]
+- *(2) BroadcastHashJoin [upper(c1#5)], [upper(c1#7)], Inner, BuildLeft
   :- BroadcastExchange HashedRelationBroadcastMode(List(upper(input[0, string, true]))), [id=#41]
   :  +- *(1) ColumnarToRow
   :     +- FileScan parquet default.t1[c1#5,c2#6]
   +- *(2) ColumnarToRow
      +- FileScan parquet default.t2[c1#7] 


== Physical Plan ==
*(2) Project [c1#5, c2#6]
+- *(2) BroadcastHashJoin [upper(c1#5)], [upper(c1#7)], Inner, BuildRight
   :- *(2) Project [c1#5, c2#6]
   :  +- *(2) Filter isnotnull(c1#5)
   :     +- *(2) ColumnarToRow
   :        +- FileScan parquet default.t1[c1#5,c2#6]
   +- BroadcastExchange HashedRelationBroadcastMode(List(upper(input[0, string, true]))), [id=#59]
      +- *(1) Project [c1#7]
         +- *(1) Filter isnotnull(c1#7)
            +- *(1) ColumnarToRow
               +- FileScan parquet default.t2[c1#7]

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@SparkQA
Copy link

SparkQA commented May 24, 2020

Test build #123041 has finished for PR 28626 at commit c3625ca.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented May 24, 2020

retest this please.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented May 24, 2020

Test build #123053 has finished for PR 28626 at commit c3625ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 24, 2020

Test build #123059 has finished for PR 28626 at commit 7fe3490.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented May 25, 2020

also cc: @viirya

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123105 has finished for PR 28626 at commit b127c41.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented May 26, 2020

retest this please

@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123113 has finished for PR 28626 at commit b127c41.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 26, 2020

Test build #123124 has finished for PR 28626 at commit 82e97e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented May 28, 2020

@gatorsmile @cloud-fan

@@ -309,17 +309,11 @@ trait DivModLike extends BinaryArithmetic {

override def nullable: Boolean = true

final override def eval(input: InternalRow): Any = {
val input2 = right.eval(input)
if (input2 == null || input2 == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see the difference now. Previously we can skip evaluating input1 if input2 is 0. Can we change it back and add comment to explain it? sorry for the back and forth!

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay. Thanks for the updates.

@SparkQA
Copy link

SparkQA commented May 28, 2020

Test build #123233 has finished for PR 28626 at commit 1bdff95.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 28, 2020

Test build #123237 has finished for PR 28626 at commit 265abd3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 91148f4 May 29, 2020
@wangyum wangyum deleted the SPARK-28481 branch May 29, 2020 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants