[SPARK-35078][SQL] Add tree traversal pruning in expression rules #32280

-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-    case q: LogicalPlan => q transformExpressionsUp {
+  def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning(
+    _.containsAnyPattern(NULL_CHECK, NULL_LITERAL, COUNT, CAST), ruleId) {


Why do we need CAST here?

I tried to capture the first case:
case e @ WindowExpression(Cast(Literal(0L, _), _, _), _)

I'm not sure why this pattern is related to NullPropagation, but that seems a separate issue.
I updated the condition to be more restrictive since CAST may be common in certain queries:

t.containsAnyPattern(NULL_CHECK, NULL_LITERAL, COUNT)
|| t.containsAllPatterns(WINDOW_EXPRESSION, CAST, LITERAL)

OK. I think now developers need to spend extra effort on choosing the Bits here. And there seems no good solution for avoiding regressions except for asking developers to add more unit tests.
Do you have any idea to improve the framework for this?

seems no good solution for avoiding regressions except for asking developers to add more unit tests

For an existing rule that we're adding this pruning, yes, we need to be careful, particularly when a rule has complex case patterns, although many rules are simple and intuitive.

For a new rule or a new case branch in an existing rule, it seems that the regression risk doesn't change much? if a case pattern branch is not covered by unit tests, there's also a regression risk even without bits pruning, e.g., the case pattern could be wrong too.

gengliangwang · 2021-04-23T04:10:35Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

@@ -29,7 +29,7 @@ import org.apache.spark.sql.catalyst.plans._
 import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, Partitioning, RangePartitioning, RoundRobinPartitioning, SinglePartition}
 import org.apache.spark.sql.catalyst.trees.TreeNodeTag
 import org.apache.spark.sql.catalyst.trees.TreePattern.{
-  INNER_LIKE_JOIN, JOIN, LEFT_SEMI_OR_ANTI_JOIN, NATURAL_LIKE_JOIN, OUTER_JOIN, TreePattern
+  FILTER, INNER_LIKE_JOIN, JOIN, LEFT_SEMI_OR_ANTI_JOIN, NATURAL_LIKE_JOIN, OUTER_JOIN, TreePattern


nit:
import org.apache.spark.sql.catalyst.trees.TreePattern._

gengliangwang

LGTM except two comments

gengliangwang · 2021-04-23T05:03:44Z

@sigmod could you resolve the conflicts?

SparkQA · 2021-04-23T05:16:06Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42374/

SparkQA · 2021-04-23T05:16:08Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42374/

sigmod

@sigmod could you resolve the conflicts?

Done.

sigmod · 2021-04-23T05:19:52Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-    case q: LogicalPlan => q transformExpressionsUp {
+  def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning(
+    _.containsAnyPattern(NULL_CHECK, NULL_LITERAL, COUNT, CAST), ruleId) {


seems no good solution for avoiding regressions except for asking developers to add more unit tests

For an existing rule that we're adding this pruning, yes, we need to be careful, particularly when a rule has complex case patterns, although many rules are simple and intuitive.

For a new rule or a new case branch in an existing rule, it seems that the regression risk doesn't change much? if a case pattern branch is not covered by unit tests, there's also a regression risk even without bits pruning, e.g., the case pattern could be wrong too.

SparkQA · 2021-04-23T07:01:49Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42376/

SparkQA · 2021-04-23T07:06:45Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42376/

gengliangwang · 2021-04-23T08:33:30Z

Thanks, merging to master

SparkQA · 2021-04-23T08:43:08Z

Test build #137844 has finished for PR 32280 at commit bc6c103.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-04-23T10:45:58Z

Test build #137846 has finished for PR 32280 at commit cd1d9ef.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

snapshot

5a09050

github-actions bot added the SQL label Apr 22, 2021

sigmod changed the title ~~[WIP][SPARK-35078] Support tree traversal pruning in expression rules~~ [WIP][SPARK-35078] Add tree traversal pruning in expression rules Apr 22, 2021

sigmod added 2 commits April 21, 2021 20:53

snapshot

d6cf209

merge master

3083338

sigmod added 2 commits April 21, 2021 22:39

support SimplifyCasts

89f13f8

update

09300c4

sigmod changed the title ~~[WIP][SPARK-35078] Add tree traversal pruning in expression rules~~ [WIP][SPARK-35078][SQL] Add tree traversal pruning in expression rules Apr 22, 2021

sigmod added 2 commits April 22, 2021 00:53

update

8b2096f

update

549d976

sigmod added 3 commits April 22, 2021 11:28

update

2c98d2e

merge master

e9fec51

mark a few nodePatterns as finals

63df6ca

sigmod changed the title ~~[WIP][SPARK-35078][SQL] Add tree traversal pruning in expression rules~~ [SPARK-35078][SQL] Add tree traversal pruning in expression rules Apr 22, 2021

gengliangwang reviewed Apr 23, 2021

View reviewed changes

gengliangwang approved these changes Apr 23, 2021

View reviewed changes

address Gengliang's comment

bc6c103

sigmod mentioned this pull request Apr 23, 2021

[SPARK-35075][SQL] Add traversal pruning for subquery related rules #32247

Closed

merge master

cd1d9ef

sigmod commented Apr 23, 2021

View reviewed changes

gengliangwang closed this in 9af338c Apr 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-35078][SQL] Add tree traversal pruning in expression rules #32280

[SPARK-35078][SQL] Add tree traversal pruning in expression rules #32280

sigmod commented Apr 22, 2021 •

edited

Loading

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 23, 2021

sigmod commented Apr 23, 2021

gengliangwang Apr 23, 2021

sigmod Apr 23, 2021 •

edited

Loading

gengliangwang Apr 23, 2021

sigmod Apr 23, 2021

gengliangwang Apr 23, 2021

sigmod Apr 23, 2021

gengliangwang left a comment

gengliangwang commented Apr 23, 2021

SparkQA commented Apr 23, 2021

SparkQA commented Apr 23, 2021

sigmod left a comment

sigmod Apr 23, 2021

SparkQA commented Apr 23, 2021

SparkQA commented Apr 23, 2021

gengliangwang commented Apr 23, 2021

SparkQA commented Apr 23, 2021

SparkQA commented Apr 23, 2021

[SPARK-35078][SQL] Add tree traversal pruning in expression rules #32280

[SPARK-35078][SQL] Add tree traversal pruning in expression rules #32280

Conversation

sigmod commented Apr 22, 2021 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

How was this patch tested?

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 22, 2021

SparkQA commented Apr 23, 2021

sigmod commented Apr 23, 2021

gengliangwang Apr 23, 2021

Choose a reason for hiding this comment

sigmod Apr 23, 2021 • edited Loading

Choose a reason for hiding this comment

gengliangwang Apr 23, 2021

Choose a reason for hiding this comment

sigmod Apr 23, 2021

Choose a reason for hiding this comment

gengliangwang Apr 23, 2021

Choose a reason for hiding this comment

sigmod Apr 23, 2021

Choose a reason for hiding this comment

gengliangwang left a comment

Choose a reason for hiding this comment

gengliangwang commented Apr 23, 2021

SparkQA commented Apr 23, 2021

SparkQA commented Apr 23, 2021

sigmod left a comment

Choose a reason for hiding this comment

sigmod Apr 23, 2021

Choose a reason for hiding this comment

SparkQA commented Apr 23, 2021

SparkQA commented Apr 23, 2021

gengliangwang commented Apr 23, 2021

SparkQA commented Apr 23, 2021

SparkQA commented Apr 23, 2021

sigmod commented Apr 22, 2021 •

edited

Loading

sigmod Apr 23, 2021 •

edited

Loading