-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7142][SQL]: Minor enhancement to BooleanSimplification Optimizer rule #5700
Conversation
Test build #30953 has finished for PR 5700 at commit
|
// not(l || r) => not(l) && not(r) | ||
case Or(l, r) => And(Not(l), Not(r)) | ||
// not(l && r) => not(l) or not(r) | ||
case And(l, r) => Or(Not(l), Not(r)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How these 2 rules optimize execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for example the filter is not(Or(left, r)) , where r might be some filter on a partitioned column like part>=12 , in the present case this filter cannot be pushed down, since while evaluating we will encounter reference of partitioned column, whereas if this rule is applied we get And(not(l), part<12) and then not(l) might be pushed down since now splitting into conjunctive predicates is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case Not(Or(l, r))
? Seems you miss the Not
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inside a case match :
case not @ Not(exp) => exp match {
....
....
case Or(l, r) => And(Not(l), Not(r))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan could you please explain a bit more when and how converting to "And" may not be an optimization ? I was wondering would it actually result in any kind of performance hit ? Also could you tell how #8200 is more reasonable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Can you please add test cases for these optimizations? |
…, using these rules: A and (not(A) or B) => A and B not(A and B) => not(A) or not(B) not(A or B) => not(A) and not(B)
added test cases |
Test build #42249 has finished for PR 5700 at commit
|
Test build #42243 has finished for PR 5700 at commit
|
Test build #42257 has finished for PR 5700 at commit
|
Thanks, merging to master. |
thanks @marmbrus :) |
case (l, Or(l1, r)) if (Not(l) fastEquals l1) => And(l, r) | ||
case (l, Or(r, l1)) if (Not(l) fastEquals l1) => And(l, r) | ||
case (Or(l, l1), r) if (l1 fastEquals Not(r)) => And(l, r) | ||
case (Or(l1, l), r) if (l1 fastEquals Not(r)) => And(l, r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need fastEquals
here? Not(l)
and l1
will never be a same reference and we always fallback to normal equality check. I think just here ==
here is more reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
…er rule. Incorporate review comments Adding changes suggested by cloud-fan in #5700 cc marmbrus Author: Yash Datta <[email protected]> Closes #8716 from saucam/bool_simp.
Use these in the optimizer as well: