Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7142][SQL]: Minor enhancement to BooleanSimplification Optimizer rule #5700

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -434,6 +434,11 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
case (_, Literal(false, BooleanType)) => Literal(false)
// a && a => a
case (l, r) if l fastEquals r => l
// a && (not(a) || b) => a && b
case (l, Or(l1, r)) if (Not(l) fastEquals l1) => And(l, r)
case (l, Or(r, l1)) if (Not(l) fastEquals l1) => And(l, r)
case (Or(l, l1), r) if (l1 fastEquals Not(r)) => And(l, r)
case (Or(l1, l), r) if (l1 fastEquals Not(r)) => And(l, r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need fastEquals here? Not(l) and l1 will never be a same reference and we always fallback to normal equality check. I think just here == here is more reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

// (a || b) && (a || c) => a || (b && c)
case _ =>
// 1. Split left and right to get the disjunctive predicates,
Expand Down Expand Up @@ -512,6 +517,10 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
case LessThan(l, r) => GreaterThanOrEqual(l, r)
// not(l <= r) => l > r
case LessThanOrEqual(l, r) => GreaterThan(l, r)
// not(l || r) => not(l) && not(r)
case Or(l, r) => And(Not(l), Not(r))
// not(l && r) => not(l) or not(r)
case And(l, r) => Or(Not(l), Not(r))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How these 2 rules optimize execution?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for example the filter is not(Or(left, r)) , where r might be some filter on a partitioned column like part>=12 , in the present case this filter cannot be pushed down, since while evaluating we will encounter reference of partitioned column, whereas if this rule is applied we get And(not(l), part<12) and then not(l) might be pushed down since now splitting into conjunctive predicates is possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case Not(Or(l, r))? Seems you miss the Not...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inside a case match :

 case not @ Not(exp) => exp match {
  ....
  ....
  case Or(l, r) => And(Not(l), Not(r))

}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing Not(And(a, b)) to And(Not(a), Not(b)) may not always be an optimization. It do helps to push down filters for datasource, but I think #8200 is more reasonable since it only do this optimization in datasource part.

cc @marmbrus

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan could you please explain a bit more when and how converting to "And" may not be an optimization ? I was wondering would it actually result in any kind of performance hit ? Also could you tell how #8200 is more reasonable ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought this optimization should be only used in datasource as it may cause performance regression for normal cases(see these comments). However after an offline discussion with @marmbrus , this optimization is pretty standard and generally good, so let's keep it in Optimizer :)

// not(not(e)) => e
case Not(e) => e
case _ => not
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,22 @@ class BooleanSimplificationSuite extends PlanTest with PredicateHelper {
('a === 'b || 'b > 3 && 'a > 3 && 'a < 5))
}

test("a && (!a || b)") {
checkCondition(('a && (!('a) || 'b )), ('a && 'b))

checkCondition(('a && ('b || !('a) )), ('a && 'b))

checkCondition(((!('a) || 'b ) && 'a), ('b && 'a))

checkCondition((('b || !('a) ) && 'a), ('b && 'a))
}

test("!(a && b) , !(a || b)") {
checkCondition((!('a && 'b)), (!('a) || !('b)))

checkCondition(!('a || 'b), (!('a) && !('b)))
}

private val caseInsensitiveAnalyzer =
new Analyzer(EmptyCatalog, EmptyFunctionRegistry, new SimpleCatalystConf(false))

Expand Down