-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13995][SQL] Extract correct IsNotNull constraints for Expression #11809
Conversation
Yeah, I also hit this issue when fixing this PR: #11765 You are so fast! Actually, they are related. My original plan is to merge the previous one and then revisit this issue. If this is merged before the above PR, I need to redo the work. Anyway, thank you for fixing this issue! |
@gatorsmile Thanks for providing the info! I found this issue before when dealing with another PR. But I has no time to submit it separately as new PR until today. |
Test build #53495 has finished for PR 11809 at commit
|
|
||
private def collectCasts(e: Expression): Option[Attribute] = { | ||
if (e.isInstanceOf[Cast]) { | ||
collectCasts(e.children(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.child
for better readability?
Thanks for fixing this! Just few non-critical suggestions. |
This fix seems okay, but I feel like we are just adding one-offs instead of taking a step back and thinking about how to generally infer null-intollerance from an expression. For example, after this PR we still aren't doing great in this case: scala> val df = Seq((1,2,3)).toDF("a", "b", "c")
scala> df.where("a + b = c").queryExecution.analyzed.constraints
res2: org.apache.spark.sql.catalyst.expressions.ExpressionSet = Set(((a#4 + b#5) = c#6), isnotnull((a#4 + b#5)), isnotnull(c#6)) Given that it seems most useful to infer We could even consider making |
+1 completely agree with @marmbrus |
Yeah, also agree with @marmbrus . This is a general issue. When fixing |
@marmbrus Great suggestion. Thanks! I will update this according to your suggestion. |
Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
Test build #53766 has finished for PR 11809 at commit
|
Test build #53765 has finished for PR 11809 at commit
|
|
||
private def scanNullIntolerantExpr(expr: Expression): Set[Expression] = expr match { | ||
case a: Attribute => Set(IsNotNull(a)) | ||
case IsNotNull(e) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it more general. Here, you just cover a single case
Test build #53771 has finished for PR 11809 at commit
|
Test build #53903 has finished for PR 11809 at commit
|
retest this please. |
Test build #53927 has finished for PR 11809 at commit
|
retest this please. |
Test build #53917 has finished for PR 11809 at commit
|
Test build #53929 has finished for PR 11809 at commit
|
A not related flaky test... |
retest this please. |
Test build #53999 has finished for PR 11809 at commit
|
Test build #54012 has finished for PR 11809 at commit
|
retest this please. |
Test build #54022 has finished for PR 11809 at commit
|
retest this please. |
Any hint about why Tests are passed locally. |
Test build #54026 has finished for PR 11809 at commit
|
retest this please. |
Test build #54046 has finished for PR 11809 at commit
|
@marmbrus @sameeragarwal @gatorsmile This is ready for review now. Thanks! |
Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
@marmbrus Can you take a look this? Thanks. |
Test build #54635 has finished for PR 11809 at commit
|
LGTM, this approach is pretty neat. Thanks! |
@sameeragarwal Thanks for reviewing. Waiting for @marmbrus checking this. |
Thanks, merging to master! |
What changes were proposed in this pull request?
JIRA: https://issues.apache.org/jira/browse/SPARK-13995
We infer relative
IsNotNull
constraints from logical plan's expressions inconstructIsNotNullConstraints
now. However, we don't consider the case of (nested)Cast
.For example:
Then, the plan's constraints will have
IsNotNull(Cast(resolveColumn(tr, "a"), LongType))
, instead ofIsNotNull(resolveColumn(tr, "a"))
. This PR fixes it.Besides, as
IsNotNull
constraints are most useful forAttribute
, we should do recursing through anyExpression
that is null intolerant and constructIsNotNull
constraints for allAttribute
s under these Expressions.For example, consider the following constraints:
The inferred isnotnull constraints should be isnotnull(a), isnotnull(b), isnotnull(c), instead of isnotnull(a + c) and isnotnull(c).
How was this patch tested?
Test is added into
ConstraintPropagationSuite
.