Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18467][SQL] Extracts method for preparing arguments from StaticInvoke, Invoke and NewInstance and modify to short circuit if arguments have null when needNullCheck == true. #15901

Closed
wants to merge 29 commits into from

Conversation

ueshin
Copy link
Member

@ueshin ueshin commented Nov 16, 2016

What changes were proposed in this pull request?

This pr extracts method for preparing arguments from StaticInvoke, Invoke and NewInstance and modify to short circuit if arguments have null when propageteNull == true.

The steps are as follows:

  1. Introduce InvokeLike to extract common logic from StaticInvoke, Invoke and NewInstance to prepare arguments.
    StaticInvoke and Invoke had a risk to exceed 64kb JVM limit to prepare arguments but after this patch they can handle them because they share the preparing code of NewInstance, which handles the limit well.

  2. Remove unneeded null checking and fix nullability of NewInstance.
    Avoid some of nullabilty checking which are not needed because the expression is not nullable.

  3. Modify to short circuit if arguments have null when needNullCheck == true.
    If needNullCheck == true, preparing arguments can be skipped if we found one of them is null, so modified to short circuit in the case.

How was this patch tested?

Existing tests.

@rxin
Copy link
Contributor

rxin commented Nov 16, 2016

So the refactoring for simplification adds more code than it deletes? Did you introduce new features as well?

@ueshin
Copy link
Member Author

ueshin commented Nov 16, 2016

@rxin I introduced features as follows:

  1. StaticInvoke and Invoke had a risk to exceed 64kb JVM limit to prepare arguments but after this patch they can handle them because they share the preparing code of NewInstance, which handles the limit well.
  2. Avoid some of nullabilty checking which are not needed because the expression is not nullable.
  3. If progagateNull == true, preparing arguments can be skipped if we found one of them is null, so modified to short circuit in the case.

@rxin
Copy link
Contributor

rxin commented Nov 16, 2016

cc @cloud-fan


def propagateNull: Boolean

def prepareArguments(ctx: CodegenContext, ev: ExprCode): (String, String, String) = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment for this method? It is not simple enough to skip the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

def prepareArguments(ctx: CodegenContext, ev: ExprCode): (String, String, String) = {

val argsHaveNull =
if (propagateNull && arguments.exists(_.nullable)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propagateNull && arguments.exists(_.nullable) is used many times, define a variable for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@SparkQA
Copy link

SparkQA commented Nov 16, 2016

Test build #68700 has finished for PR 15901 at commit 2f30f53.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Nov 16, 2016

This looks good overall. The more codes added are mostly used to deal with short circuit cases of nullability.

@@ -109,7 +171,7 @@ case class Invoke(
functionName: String,
dataType: DataType,
arguments: Seq[Expression] = Nil,
propagateNull: Boolean = true) extends Expression with NonSQLExpression {
propagateNull: Boolean = true) extends Expression with InvokeLike with NonSQLExpression {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment of Invoke doesn't have explanation for the param propagateNull. Can you also add it too?

@viirya
Copy link
Member

viirya commented Nov 16, 2016

The title is not well explained this pr and can be updated to reflect its purpose.

The explanation of the features you added can be moved to the pr description.


def prepareArguments(ctx: CodegenContext, ev: ExprCode): (String, String, String) = {

val argsHaveNull =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this variable name looks like a boolean flag, can you think of a better name?

@@ -33,6 +33,73 @@ import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData}
import org.apache.spark.sql.types._

/**
* Common base class for [[StaticInvoke]], [[Invoke]], and [[NewInstance]].
*/
trait InvokeLike extends Expression {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to extend Expression?

@ueshin ueshin changed the title [SPARK-18467][SQL] Refactor StaticInvoke, Invoke and NewInstance. [SPARK-18467][SQL] Extracts method for preparing arguments from StaticInvoke, Invoke and NewInstance and modify to short circuit if arguments have null when propageteNull == true. Nov 17, 2016
@ueshin
Copy link
Member Author

ueshin commented Nov 17, 2016

@viirya @cloud-fan I addressed your comments and updated pr title and description.

* Prepares codes for arguments.
*
* - generate codes for argument.
* - use ctx.splitExpressions() not to exceed 64kb JVM limit while preparing arguments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not to -> to not

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

* - use ctx.splitExpressions() not to exceed 64kb JVM limit while preparing arguments.
* - avoid some of nullabilty checking which are not needed because the expression is not
* nullable.
* - when progagateNull == true, short circuit if we found one of arguments is null because
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

progagateNull -> propagatingNull? Actually can we think of better name than propagatingNull? how about needNullCheck ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I used needNullCheck for now.

val reset = s"$containsNullInArguments = false;"
val argCodes = arguments.zipWithIndex.map { case (e, i) =>
val expr = e.genCode(ctx)
s"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

val updateContainsNull = if (e.nullable) {
  "$containsNull = ${expr.isNull};"
} else {
  ""
}
s"""
  if (!$containsNull) {
      ${expr.code}
      $updateContainsNull
      ${argValues(i)} = ${expr.value};
  }
"""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll use it.

@@ -50,7 +132,7 @@ case class StaticInvoke(
dataType: DataType,
functionName: String,
arguments: Seq[Expression] = Nil,
propagateNull: Boolean = true) extends Expression with NonSQLExpression {
propagateNull: Boolean = true) extends Expression with InvokeLike with NonSQLExpression {
Copy link
Contributor

@cloud-fan cloud-fan Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make InvokeLike extends Expression and NonSQLExpression, then we can just extends InvokeLike here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll modify them.

if ($argIsNulls[idx]) { $isNull = true; break; }
}
"""
val prepareIsNull = if (propagateNull && arguments.exists(_.nullable)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use propogatingNull?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I missed it, fixed.

@SparkQA
Copy link

SparkQA commented Nov 17, 2016

Test build #68777 has finished for PR 15901 at commit 2c52d91.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2016

Test build #68778 has finished for PR 15901 at commit 43d2693.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Nov 18, 2016

LGTM cc @cloud-fan for another check.

* @param ev an [[ExprCode]] with unique terms.
* @return (code to prepare arguments, argument string, result of argument null check)
*/
def prepareArguments(ctx: CodegenContext, ev: ExprCode): (String, String, String) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ev is not used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed.

if (!${obj.isNull}) {
$argCode
}
boolean ${ev.isNull} = ${obj.isNull} || $resultIsNull;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumes resultIsNull is a member variable not local variable, can we avoid doing this? We can make the code clearer and more robust like:

javaType ${ev.value} = defaultValue;
boolean ${ev.isNull} = true;
if (!${obj.isNull}) {
  $argCode
  ${ev.isNull} = $resultIsNull
  ...
  $postNullCheck
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid not because if evaluating arguments is split to some methods, resultIsNull can't be referred from the split methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know resultIsNull must be a member variable now, but in the future we may make it a local variable if the argument code is not so long to exceed the 64kb limit. Here why not we assume resultIsNull can be local variable and make us more robust?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code as you mentioned.

But I'm sorry, I didn't understand what you want to do.

  • You are assuming that the argument code is not so long, so use local variable without splitting evaluations? (could also make the argument variable as local variable)
    • I think the argument code would easily become large enough to exceed the 64kb limit.
  • You want to modify only resultIsNull to local variable and pass it and return the value to the split methods?
  • You want to switch local variable or member variable by whether split or not?
    • We need to know whether split or not but we can't now. Modify ctx.splitExpressions() to return split or not?

Could you let me know if you have other thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are assuming that the argument code is not so long, so use local variable without splitting evaluations? (could also make the argument variable as local variable)

This is what I mean. We can generate code first and see if it exceeds the 64kb limitation, so logically we do have a chance to use local variables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Let's see if it exceeds the limit or not, using local variables.

$prepareIsNull
""" +
(if (needNullCheck) {
s"final $javaType ${ev.value} = $isNull ? ${ctx.defaultValue(javaType)} : $constructorCall;"
Copy link
Contributor

@cloud-fan cloud-fan Nov 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary? I'm pretty sure javac can optimize ... = false ? a : b to ... = b for us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use janino instead of javac?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I mean "java compiler"... I think janino is smart enough about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, updated. Thanks.

@cloud-fan
Copy link
Contributor

LGTM except some minor comments, thanks for working on it.

@SparkQA
Copy link

SparkQA commented Nov 18, 2016

Test build #68833 has finished for PR 15901 at commit 1baac55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

$javaType ${ev.value} = ${ctx.defaultValue(dataType)};
if (!${ev.isNull}) {
$evaluate
if (!${obj.isNull}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I think this new code looks clearer than before, what do you think? @ueshin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with you. Thank you for your suggestion.

}
"""
val prepareIsNull = if (needNullCheck) {
s"boolean $isNull = $resultIsNull;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more optimization: we can just write ev.isNull = resultIsNull, then if resultIsNull is false, the final evaluation will become ... = false ? aaa : bbb and get optimized by janino.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean, modified. Thanks.

@ueshin
Copy link
Member Author

ueshin commented Nov 18, 2016

f8acda6 is a test to see if it exceeds the 64kb JVM limit or not, using only local variables.
I'll revert it if some test failed.

@SparkQA
Copy link

SparkQA commented Nov 18, 2016

Test build #68858 has finished for PR 15901 at commit f8acda6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 19, 2016

Test build #68877 has finished for PR 15901 at commit fe2871c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

${argGen.map(_.code).mkString("\n")}
$setIsNull
$argCode
boolean ${ev.isNull} = $resultIsNull;
final $javaType ${ev.value} = ${ev.isNull} ? ${ctx.defaultValue(dataType)} : $callFunc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ${ev.isNull} at line 153 can be $resultIsNull. So the optimization for false ? ${ctx.defaultValue(dataType)} : $callFunc; can work here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. I modified. Thanks.

@SparkQA
Copy link

SparkQA commented Nov 19, 2016

Test build #68881 has finished for PR 15901 at commit c88a1ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Nov 21, 2016

LGTM cc @cloud-fan for another look.

@cloud-fan
Copy link
Contributor

thanks, merging to master/2.1!

asfgit pushed a commit that referenced this pull request Nov 21, 2016
…cInvoke, Invoke and NewInstance and modify to short circuit if arguments have null when `needNullCheck == true`.

## What changes were proposed in this pull request?

This pr extracts method for preparing arguments from `StaticInvoke`, `Invoke` and `NewInstance` and modify to short circuit if arguments have `null` when `propageteNull == true`.

The steps are as follows:

1. Introduce `InvokeLike` to extract common logic from `StaticInvoke`, `Invoke` and `NewInstance` to prepare arguments.
`StaticInvoke` and `Invoke` had a risk to exceed 64kb JVM limit to prepare arguments but after this patch they can handle them because they share the preparing code of NewInstance, which handles the limit well.

2. Remove unneeded null checking and fix nullability of `NewInstance`.
Avoid some of nullabilty checking which are not needed because the expression is not nullable.

3. Modify to short circuit if arguments have `null` when `needNullCheck == true`.
If `needNullCheck == true`, preparing arguments can be skipped if we found one of them is `null`, so modified to short circuit in the case.

## How was this patch tested?

Existing tests.

Author: Takuya UESHIN <[email protected]>

Closes #15901 from ueshin/issues/SPARK-18467.

(cherry picked from commit 6585479)
Signed-off-by: Wenchen Fan <[email protected]>
@asfgit asfgit closed this in 6585479 Nov 21, 2016
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…cInvoke, Invoke and NewInstance and modify to short circuit if arguments have null when `needNullCheck == true`.

## What changes were proposed in this pull request?

This pr extracts method for preparing arguments from `StaticInvoke`, `Invoke` and `NewInstance` and modify to short circuit if arguments have `null` when `propageteNull == true`.

The steps are as follows:

1. Introduce `InvokeLike` to extract common logic from `StaticInvoke`, `Invoke` and `NewInstance` to prepare arguments.
`StaticInvoke` and `Invoke` had a risk to exceed 64kb JVM limit to prepare arguments but after this patch they can handle them because they share the preparing code of NewInstance, which handles the limit well.

2. Remove unneeded null checking and fix nullability of `NewInstance`.
Avoid some of nullabilty checking which are not needed because the expression is not nullable.

3. Modify to short circuit if arguments have `null` when `needNullCheck == true`.
If `needNullCheck == true`, preparing arguments can be skipped if we found one of them is `null`, so modified to short circuit in the case.

## How was this patch tested?

Existing tests.

Author: Takuya UESHIN <[email protected]>

Closes apache#15901 from ueshin/issues/SPARK-18467.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants