Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8117] [SQL] Move codegen implementation into Expression #6660

Closed
wants to merge 16 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Jun 4, 2015

This PR move codegen implementation of expressions into Expression class itself, make it easy to manage.

It introduces two APIs in Expression:

def gen(ctx: CodeGenContext): GeneratedExpressionCode
def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): Code

gen(ctx) will call genSource(ctx, ev) to generate Java source code for the current expression. A expression needs to override genSource().

Here are the types:

type Term String
type Code String

/**
 * Java source for evaluating an [[Expression]] given a [[Row]] of input.
 */
case class GeneratedExpressionCode(var code: Code,
                               nullTerm: Term,
                               primitiveTerm: Term,
                               objectTerm: Term)
/**
 * A context for codegen, which is used to bookkeeping the expressions those are not supported
 * by codegen, then they are evaluated directly. The unsupported expression is appended at the
 * end of `references`, the position of it is kept in the code, used to access and evaluate it.
 */
class CodeGenContext {
  /**
   * Holding all the expressions those do not support codegen, will be evaluated directly.
   */
  val references: Seq[Expression] = new mutable.ArrayBuffer[Expression]()
}

@davies davies changed the title [SPARK-8117] [SQL] Move codegen implementation into Expression [WIP] [SPARK-8117] [SQL] Move codegen implementation into Expression Jun 4, 2015
@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34224 has finished for PR 6660 at commit 3ff25f8.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34227 has finished for PR 6660 at commit e57959d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class EvaluatedExpression(var code: Code,
    • class CodeGenContext
    • abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Logging

protected val genericMutableRowType = classOf[GenericMutableRow].getName

private val curId = new java.util.concurrent.atomic.AtomicInteger()
case class EvaluatedExpression(var code: Code,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GeneratedExpressionCode?

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34229 has finished for PR 6660 at commit 8c6d82d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34231 has finished for PR 6660 at commit c5fb514.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class GeneratedExpressionCode(var code: Code,
    • class CodeGenContext
    • abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Logging

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34232 has finished for PR 6660 at commit 12ff88a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class GeneratedExpressionCode(var code: Code,
    • class CodeGenContext
    • abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Logging

*/
def equalFunc(dataType: DataType): ((Term, Term) => Code) = dataType match {
case BinaryType => { case (eval1, eval2) => s"java.util.Arrays.equals($eval1, $eval2)" }
case dt if isNativeType(dt) => { case (eval1, eval2) => s"$eval1 == $eval2" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work? native doesn't mean it works in java?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only primitive types is native type here, from bool to long, double.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea but the definition of native type (4 lines down) is : " List of data types that have special accessors and setters in [[Row]]." Which means in the future we might break this.

I'd make it explicit with primitive type here.

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34247 has finished for PR 6660 at commit 2344bc0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class GeneratedExpressionCode(var code: Code,
    • class CodeGenContext
    • abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Logging

@davies davies changed the title [WIP] [SPARK-8117] [SQL] Move codegen implementation into Expression [SPARK-8117] [SQL] Move codegen implementation into Expression Jun 5, 2015
@@ -86,6 +87,8 @@ case class Abs(child: Expression) extends UnaryArithmetic {
abstract class BinaryArithmetic extends BinaryExpression {
self: Product =>

def decimalMethod: String = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add inline comment explaining what this is for?

rxin added a commit to rxin/spark that referenced this pull request Jun 5, 2015
[SPARK-8117] [SQL] Move codegen implementation into Expression
@rxin
Copy link
Contributor

rxin commented Jun 5, 2015

One idea I have is to use numeric type for the null term. In that case, for a lot of expressions we no longer need conditional branches.

To compute null term: && becomes &, || becomes |, xor becomes ^.

@rxin
Copy link
Contributor

rxin commented Jun 5, 2015

BTW - I sent a pull request against your branch with some review comments.

Davies Liu added 2 commits June 5, 2015 11:16
@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34294 has finished for PR 6660 at commit b5d3617.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class GeneratedExpressionCode(var code: Code,
    • class CodeGenContext
    • abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Logging

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34306 has finished for PR 6660 at commit 02262c9.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class GeneratedExpressionCode(var code: Code, var nullTerm: Term, primitiveTerm: Term)
    • class CodeGenContext

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34308 has finished for PR 6660 at commit 86fac2c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class GeneratedExpressionCode(var code: Code, var nullTerm: Term, primitiveTerm: Term)
    • class CodeGenContext

* @param ctx a [[CodeGenContext]]
* @return [[GeneratedExpressionCode]]
*/
def gen(ctx: CodeGenContext): GeneratedExpressionCode = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this can be final.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some cases that we need to override it.

@JoshRosen
Copy link
Contributor

High-level comment: we should have a checklist / guide at the top of codegen/package.scala which gives some general guidelines on how to write codegen code (e.g. code formatting, uniqueness of variable / term names, etc.)

* by codegen, then they are evaluated directly. The unsupported expression is appended at the
* end of `references`, the position of it is kept in the code, used to access and evaluate it.
*/
class CodeGenContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, but can this class go into its own file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this later (reduce the number of changed lines, better for review)

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34313 has finished for PR 6660 at commit e03edaa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class GeneratedExpressionCode(var code: Code, var nullTerm: Term, var primitiveTerm: Term)
    • class CodeGenContext

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34320 has finished for PR 6660 at commit bad6828.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class GeneratedExpressionCode(var code: Code, var nullTerm: Term, var primitiveTerm: Term)
    • class CodeGenContext

@SparkQA
Copy link

SparkQA commented Jun 6, 2015

Test build #34357 has finished for PR 6660 at commit f42c732.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class GeneratedExpressionCode(var code: Code, var isNull: Term, var primitive: Term)
    • class CodeGenContext
    • case class Pow(left: Expression, right: Expression)
    • case class Rint(child: Expression) extends UnaryMathExpression(math.rint, "ROUND")
    • case class ToDegrees(child: Expression) extends UnaryMathExpression(math.toDegrees, "DEGREES")
    • case class ToRadians(child: Expression) extends UnaryMathExpression(math.toRadians, "RADIANS")

@SparkQA
Copy link

SparkQA commented Jun 6, 2015

Test build #34358 has finished for PR 6660 at commit 9adaeaf.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class GeneratedExpressionCode(var code: Code, var isNull: Term, var primitive: Term)
    • class CodeGenContext
    • case class Pow(left: Expression, right: Expression)
    • case class Rint(child: Expression) extends UnaryMathExpression(math.rint, "ROUND")
    • case class ToDegrees(child: Expression) extends UnaryMathExpression(math.toDegrees, "DEGREES")
    • case class ToRadians(child: Expression) extends UnaryMathExpression(math.toRadians, "RADIANS")

rxin pushed a commit to rxin/spark that referenced this pull request Jun 7, 2015
This PR move codegen implementation of expressions into Expression class itself, make it easy to manage.

It introduces two APIs in Expression:
```
def gen(ctx: CodeGenContext): GeneratedExpressionCode
def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): Code
```

gen(ctx) will call genSource(ctx, ev) to generate Java source code for the current expression. A expression needs to override genSource().

Here are the types:
```
type Term String
type Code String

/**
 * Java source for evaluating an [[Expression]] given a [[Row]] of input.
 */
case class GeneratedExpressionCode(var code: Code,
                               nullTerm: Term,
                               primitiveTerm: Term,
                               objectTerm: Term)
/**
 * A context for codegen, which is used to bookkeeping the expressions those are not supported
 * by codegen, then they are evaluated directly. The unsupported expression is appended at the
 * end of `references`, the position of it is kept in the code, used to access and evaluate it.
 */
class CodeGenContext {
  /**
   * Holding all the expressions those do not support codegen, will be evaluated directly.
   */
  val references: Seq[Expression] = new mutable.ArrayBuffer[Expression]()
}
```

This is basically apache#6660, but fixed style violation and compilation failure.

Author: Davies Liu <[email protected]>
Author: Reynold Xin <[email protected]>

Closes apache#6690 from rxin/codegen and squashes the following commits:

e1368c2 [Reynold Xin] Fixed tests.
73db80e [Reynold Xin] Fixed compilation failure.
19d6435 [Reynold Xin] Fixed style violation.
9adaeaf [Davies Liu] address comments
f42c732 [Davies Liu] improve coverage and tests
bad6828 [Davies Liu] address comments
e03edaa [Davies Liu] consts fold
86fac2c [Davies Liu] fix style
02262c9 [Davies Liu] address comments
b5d3617 [Davies Liu] Merge pull request #5 from rxin/codegen
48c454f [Reynold Xin] Some code gen update.
2344bc0 [Davies Liu] fix test
12ff88a [Davies Liu] fix build
c5fb514 [Davies Liu] rename
8c6d82d [Davies Liu] update docs
b145047 [Davies Liu] fix style
e57959d [Davies Liu] add type alias
3ff25f8 [Davies Liu] refactor
593d617 [Davies Liu] pushing codegen into Expression
@srowen
Copy link
Member

srowen commented Jun 8, 2015

Now that 5e7b6b6 has merged, can this be closed?

@davies davies closed this Jun 8, 2015
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
This PR move codegen implementation of expressions into Expression class itself, make it easy to manage.

It introduces two APIs in Expression:
```
def gen(ctx: CodeGenContext): GeneratedExpressionCode
def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): Code
```

gen(ctx) will call genSource(ctx, ev) to generate Java source code for the current expression. A expression needs to override genSource().

Here are the types:
```
type Term String
type Code String

/**
 * Java source for evaluating an [[Expression]] given a [[Row]] of input.
 */
case class GeneratedExpressionCode(var code: Code,
                               nullTerm: Term,
                               primitiveTerm: Term,
                               objectTerm: Term)
/**
 * A context for codegen, which is used to bookkeeping the expressions those are not supported
 * by codegen, then they are evaluated directly. The unsupported expression is appended at the
 * end of `references`, the position of it is kept in the code, used to access and evaluate it.
 */
class CodeGenContext {
  /**
   * Holding all the expressions those do not support codegen, will be evaluated directly.
   */
  val references: Seq[Expression] = new mutable.ArrayBuffer[Expression]()
}
```

This is basically apache#6660, but fixed style violation and compilation failure.

Author: Davies Liu <[email protected]>
Author: Reynold Xin <[email protected]>

Closes apache#6690 from rxin/codegen and squashes the following commits:

e1368c2 [Reynold Xin] Fixed tests.
73db80e [Reynold Xin] Fixed compilation failure.
19d6435 [Reynold Xin] Fixed style violation.
9adaeaf [Davies Liu] address comments
f42c732 [Davies Liu] improve coverage and tests
bad6828 [Davies Liu] address comments
e03edaa [Davies Liu] consts fold
86fac2c [Davies Liu] fix style
02262c9 [Davies Liu] address comments
b5d3617 [Davies Liu] Merge pull request apache#5 from rxin/codegen
48c454f [Reynold Xin] Some code gen update.
2344bc0 [Davies Liu] fix test
12ff88a [Davies Liu] fix build
c5fb514 [Davies Liu] rename
8c6d82d [Davies Liu] update docs
b145047 [Davies Liu] fix style
e57959d [Davies Liu] add type alias
3ff25f8 [Davies Liu] refactor
593d617 [Davies Liu] pushing codegen into Expression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants