Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13995][SQL] Extract correct IsNotNull constraints for Expression #11809

Closed
wants to merge 13 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Mar 18, 2016

What changes were proposed in this pull request?

JIRA: https://issues.apache.org/jira/browse/SPARK-13995

We infer relative IsNotNull constraints from logical plan's expressions in constructIsNotNullConstraints now. However, we don't consider the case of (nested) Cast.

For example:

val tr = LocalRelation('a.int, 'b.long)
val plan = tr.where('a.attr === 'b.attr).analyze

Then, the plan's constraints will have IsNotNull(Cast(resolveColumn(tr, "a"), LongType)), instead of IsNotNull(resolveColumn(tr, "a")). This PR fixes it.

Besides, as IsNotNull constraints are most useful for Attribute, we should do recursing through any Expression that is null intolerant and construct IsNotNull constraints for all Attributes under these Expressions.

For example, consider the following constraints:

val df = Seq((1,2,3)).toDF("a", "b", "c")
df.where("a + b = c").queryExecution.analyzed.constraints

The inferred isnotnull constraints should be isnotnull(a), isnotnull(b), isnotnull(c), instead of isnotnull(a + c) and isnotnull(c).

How was this patch tested?

Test is added into ConstraintPropagationSuite.

@gatorsmile
Copy link
Member

Yeah, I also hit this issue when fixing this PR: #11765

You are so fast! Actually, they are related. My original plan is to merge the previous one and then revisit this issue. If this is merged before the above PR, I need to redo the work. Anyway, thank you for fixing this issue!

@viirya
Copy link
Member Author

viirya commented Mar 18, 2016

@gatorsmile Thanks for providing the info! I found this issue before when dealing with another PR. But I has no time to submit it separately as new PR until today.

@SparkQA
Copy link

SparkQA commented Mar 18, 2016

Test build #53495 has finished for PR 11809 at commit b4e6033.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Mar 19, 2016

cc @marmbrus @sameeragarwal


private def collectCasts(e: Expression): Option[Attribute] = {
if (e.isInstanceOf[Cast]) {
collectCasts(e.children(0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.child for better readability?

@sameeragarwal
Copy link
Member

Thanks for fixing this! Just few non-critical suggestions.

@marmbrus
Copy link
Contributor

This fix seems okay, but I feel like we are just adding one-offs instead of taking a step back and thinking about how to generally infer null-intollerance from an expression. For example, after this PR we still aren't doing great in this case:

scala> val df = Seq((1,2,3)).toDF("a", "b", "c")
scala> df.where("a + b = c").queryExecution.analyzed.constraints
res2: org.apache.spark.sql.catalyst.expressions.ExpressionSet = Set(((a#4 + b#5) = c#6), isnotnull((a#4 + b#5)), isnotnull(c#6))

Given that it seems most useful to infer IsNotNull for Attributes, it seems that we just fix constructIsNotNullConstraints to recurse through any Expression that is null intolerant (i.e. any null input results in null output). Then it can just return a Seq[Attribute].

We could even consider making NullIntolerant into a trait so we could easily mark all the valid expressions as such. Which would make the logic in QueryPlan really simple to reason about.

@sameeragarwal
Copy link
Member

+1 completely agree with @marmbrus

@gatorsmile
Copy link
Member

Yeah, also agree with @marmbrus .

This is a general issue. When fixing isNotNull Constraints, we also have to fix the related reasoning parts for using these Constraints. If possible, we really need to have a fundamental way to do it easily. Otherwise, we might fix many small bugs again and again.

@viirya
Copy link
Member Author

viirya commented Mar 22, 2016

@marmbrus Great suggestion. Thanks! I will update this according to your suggestion.

@viirya viirya changed the title [SPARK-13995][SQL] Constraints should take care of Cast [SPARK-13995][SQL] Extract correct IsNotNull constraints for Expression Mar 22, 2016
Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
@SparkQA
Copy link

SparkQA commented Mar 22, 2016

Test build #53766 has finished for PR 11809 at commit c819572.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 22, 2016

Test build #53765 has finished for PR 11809 at commit 5fa1760.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.


private def scanNullIntolerantExpr(expr: Expression): Set[Expression] = expr match {
case a: Attribute => Set(IsNotNull(a))
case IsNotNull(e) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it more general. Here, you just cover a single case

@SparkQA
Copy link

SparkQA commented Mar 22, 2016

Test build #53771 has finished for PR 11809 at commit 3373396.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53903 has finished for PR 11809 at commit c8fb736.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Mar 23, 2016

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53927 has finished for PR 11809 at commit 81c46c7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Mar 23, 2016

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53917 has finished for PR 11809 at commit 81c46c7.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53929 has finished for PR 11809 at commit 81c46c7.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Mar 24, 2016

A not related flaky test...

@viirya
Copy link
Member Author

viirya commented Mar 24, 2016

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #53999 has finished for PR 11809 at commit 81c46c7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #54012 has finished for PR 11809 at commit 8e8dd72.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class Attribute extends LeafExpression with NamedExpression with NullIntolerant

@viirya
Copy link
Member Author

viirya commented Mar 24, 2016

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #54022 has finished for PR 11809 at commit 8e8dd72.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class Attribute extends LeafExpression with NamedExpression with NullIntolerant

@viirya
Copy link
Member Author

viirya commented Mar 24, 2016

retest this please.

@viirya
Copy link
Member Author

viirya commented Mar 24, 2016

Any hint about why org.apache.spark.sql.hive.execution.SQLQuerySuite continues to be failed across different PRs. Error message: org.apache.spark.sql.AnalysisException: Table not found: src.

Tests are passed locally.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #54026 has finished for PR 11809 at commit 8e8dd72.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class Attribute extends LeafExpression with NamedExpression with NullIntolerant

@viirya
Copy link
Member Author

viirya commented Mar 24, 2016

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #54046 has finished for PR 11809 at commit 8e8dd72.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class Attribute extends LeafExpression with NamedExpression with NullIntolerant

@viirya
Copy link
Member Author

viirya commented Mar 24, 2016

@marmbrus @sameeragarwal @gatorsmile This is ready for review now. Thanks!

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala
@viirya
Copy link
Member Author

viirya commented Mar 31, 2016

@marmbrus Can you take a look this? Thanks.

@SparkQA
Copy link

SparkQA commented Mar 31, 2016

Test build #54635 has finished for PR 11809 at commit ec84c5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sameeragarwal
Copy link
Member

LGTM, this approach is pretty neat. Thanks!

@viirya
Copy link
Member Author

viirya commented Apr 1, 2016

@sameeragarwal Thanks for reviewing. Waiting for @marmbrus checking this.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 1, 2016

Thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants