Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16839][SQL] Simplify Struct creation code path #15718

Closed
wants to merge 47 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
313d584
SPARK-16839_redundant_aliases_after_cleanupAliases: failing test.
Aug 1, 2016
0ac4a4d
SPARK-16839_redundant_aliases_after_cleanupAliases: trnsform CreateSt…
Aug 1, 2016
8920c12
SPARK-16839_redundant_aliases_after_cleanupAliases: fix tests, all te…
Aug 1, 2016
ae4ec34
SPARK-16839_redundant_aliases_after_cleanupAliases: styling fixes.
Aug 3, 2016
0bcbab7
SPARK-16839_redundant_aliases_after_cleanupAliases: moved the struct=…
Aug 3, 2016
c0f601c
SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_Creat…
Aug 4, 2016
2656547
SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_Creat…
Aug 4, 2016
a0f491b
(SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_Crea…
Aug 4, 2016
c701c01
SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_Creat…
Aug 4, 2016
a1d240d
SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_Creat…
Aug 4, 2016
2d377f9
(SPARK-16839_redundant_aliases_after_cleanupAliases__unevaluable_Crea…
Aug 4, 2016
3fc24b7
fix a mintority that prevented decent debugging on windows (default r…
Aug 21, 2016
62d060c
SPARK-16839_redundant_aliases_after_cleanupAliases: make sure not to …
Aug 21, 2016
ffc521e
Merge remote-tracking branch 'origin/master' into SPARK-16839_redunda…
Aug 21, 2016
27c8157
SPARK-16839_redundant_aliases_after_cleanupAliases: fix after merge, …
Aug 21, 2016
7f1be34
SPARK-16839_redundant_aliases_after_cleanupAliases: CreateStruct is n…
Aug 29, 2016
7ed5144
SPARK-16839_redundant_aliases_after_cleanupAliases: address some revi…
Aug 29, 2016
48ed3a9
SPARK-16839_redundant_aliases_after_cleanupAliases: restore two ignor…
Aug 29, 2016
fa1ae24
SPARK-16839_redundant_aliases_after_cleanupAliases: fix formatting.
Aug 29, 2016
76dd6a4
SPARK-16839_redundant_aliases_after_cleanupAliases: simplify CreateSt…
Sep 5, 2016
8a34fb2
SPARK-16839_redundant_aliases_after_cleanupAliases: CreateStructUnsaf…
Sep 5, 2016
3f7b63f
SPARK-16839_redundant_aliases_after_cleanupAliases: remove redundant …
Sep 5, 2016
d6fbe2a
SPARK-16839_redundant_aliases_after_cleanupAliases: fix reviewer's co…
Sep 7, 2016
5012a47
SPARK-16839_redundant_aliases_after_cleanupAliases: remove redundant …
Sep 8, 2016
a77ec20
SPARK-16839_redundant_aliases_after_cleanupAliases: remove duplicate …
Sep 12, 2016
53308ea
SPARK-16839_redundant_aliases_after_cleanupAliases: updated hive gold…
Sep 12, 2016
4cbb9a5
Merge branch 'master' into SPARK-16839_redundant_aliases_after_cleanu…
Sep 13, 2016
c1d5b20
Fix R tests to pass by using alias for struct functions (#1)
HyukjinKwon Sep 14, 2016
f4f3de4
SPARK-16839_redundant_aliases_after_cleanupAliases: fix quotes type i…
Sep 14, 2016
60a49c3
Merge branch 'master' into SPARK-16839_redundant_aliases_after_cleanu…
Sep 15, 2016
3360e8c
SPARK-16839_redundant_aliases_after_cleanupAliases: CreateStruct.appl…
eyalfa Sep 18, 2016
abf7712
SPARK-16839_redundant_aliases_after_cleanupAliases: fix failing test.…
eyalfa Sep 19, 2016
fe40c60
SPARK-16839_redundant_aliases_after_cleanupAliases: some attribute na…
eyalfa Sep 20, 2016
0d93d80
SPARK-16839_redundant_aliases_after_cleanupAliases: yet another small…
eyalfa Sep 20, 2016
32bfd77
Use unresolved CreateStruct
hvanhovell Oct 6, 2016
ec6aca3
Reintroduce NamePlaceholder.
hvanhovell Oct 30, 2016
2fbed29
Update NamePlaceHolder.
hvanhovell Oct 30, 2016
2529211
Use NamePlaceholder object.
hvanhovell Oct 30, 2016
29f5962
Fix a bug
hvanhovell Oct 31, 2016
c69707d
Merge pull request #2 from hvanhovell/SPARK-16839
eyalfa Oct 31, 2016
35b260a
Merge remote-tracking branch 'origin/master' into SPARK-16839_redunda…
eyalfa Oct 31, 2016
b397d04
SPARK-16839_redundant_aliases_after_cleanupAliases: fix minor compila…
eyalfa Oct 31, 2016
9b89e31
SPARK-16839_redundant_aliases_after_cleanupAliases: fix failing test …
eyalfa Nov 1, 2016
18bfdcf
Merge remote-tracking branch 'apache-github/master' into SPARK-16839-2
hvanhovell Nov 1, 2016
af277ee
Update after conflicting merge
hvanhovell Nov 1, 2016
29ccf4e
Fix UT
hvanhovell Nov 1, 2016
c0263d7
Merge remote-tracking branch 'apache-github/master' into SPARK-16839-2
hvanhovell Nov 1, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions R/pkg/inst/tests/testthat/test_sparkSQL.R
Original file line number Diff line number Diff line change
Expand Up @@ -1222,16 +1222,16 @@ test_that("column functions", {
# Test struct()
df <- createDataFrame(list(list(1L, 2L, 3L), list(4L, 5L, 6L)),
schema = c("a", "b", "c"))
result <- collect(select(df, struct("a", "c")))
result <- collect(select(df, alias(struct("a", "c"), "d")))
expected <- data.frame(row.names = 1:2)
expected$"struct(a, c)" <- list(listToStruct(list(a = 1L, c = 3L)),
listToStruct(list(a = 4L, c = 6L)))
expected$"d" <- list(listToStruct(list(a = 1L, c = 3L)),
listToStruct(list(a = 4L, c = 6L)))
expect_equal(result, expected)

result <- collect(select(df, struct(df$a, df$b)))
result <- collect(select(df, alias(struct(df$a, df$b), "d")))
expected <- data.frame(row.names = 1:2)
expected$"struct(a, b)" <- list(listToStruct(list(a = 1L, b = 2L)),
listToStruct(list(a = 4L, b = 5L)))
expected$"d" <- list(listToStruct(list(a = 1L, b = 2L)),
listToStruct(list(a = 4L, b = 5L)))
expect_equal(result, expected)

# Test encode(), decode()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ import org.apache.spark.sql.catalyst.optimizer.BooleanSimplification
import org.apache.spark.sql.catalyst.plans._
import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, _}
import org.apache.spark.sql.catalyst.rules._
import org.apache.spark.sql.catalyst.trees.{TreeNodeRef}
import org.apache.spark.sql.catalyst.trees.TreeNodeRef
import org.apache.spark.sql.catalyst.util.toPrettySQL
import org.apache.spark.sql.types._

Expand Down Expand Up @@ -83,6 +83,7 @@ class Analyzer(
ResolveTableValuedFunctions ::
ResolveRelations ::
ResolveReferences ::
ResolveCreateNamedStruct ::
ResolveDeserializer ::
ResolveNewInstance ::
ResolveUpCast ::
Expand Down Expand Up @@ -653,11 +654,12 @@ class Analyzer(
case s: Star => s.expand(child, resolver)
case o => o :: Nil
})
case c: CreateStruct if containsStar(c.children) =>
c.copy(children = c.children.flatMap {
case s: Star => s.expand(child, resolver)
case o => o :: Nil
})
case c: CreateNamedStruct if containsStar(c.valExprs) =>
val newChildren = c.children.grouped(2).flatMap {
case Seq(k, s : Star) => CreateStruct(s.expand(child, resolver)).children
case kv => kv
}
c.copy(children = newChildren.toList )
case c: CreateArray if containsStar(c.children) =>
c.copy(children = c.children.flatMap {
case s: Star => s.expand(child, resolver)
Expand Down Expand Up @@ -1141,7 +1143,7 @@ class Analyzer(
case In(e, Seq(l @ ListQuery(_, exprId))) if e.resolved =>
// Get the left hand side expressions.
val expressions = e match {
case CreateStruct(exprs) => exprs
case cns : CreateNamedStruct => cns.valExprs
case expr => Seq(expr)
}
resolveSubQuery(l, plans, expressions.size) { (rewrite, conditions) =>
Expand Down Expand Up @@ -2072,18 +2074,8 @@ object EliminateUnions extends Rule[LogicalPlan] {
*/
object CleanupAliases extends Rule[LogicalPlan] {
private def trimAliases(e: Expression): Expression = {
var stop = false
e.transformDown {
// CreateStruct is a special case, we need to retain its top level Aliases as they decide the
// name of StructField. We also need to stop transform down this expression, or the Aliases
// under CreateStruct will be mistakenly trimmed.
case c: CreateStruct if !stop =>
stop = true
c.copy(children = c.children.map(trimNonTopLevelAliases))
case c: CreateStructUnsafe if !stop =>
stop = true
c.copy(children = c.children.map(trimNonTopLevelAliases))
case Alias(child, _) if !stop => child
case Alias(child, _) => child
}
}

Expand Down Expand Up @@ -2116,15 +2108,8 @@ object CleanupAliases extends Rule[LogicalPlan] {
case a: AppendColumns => a

case other =>
var stop = false
other transformExpressionsDown {
case c: CreateStruct if !stop =>
stop = true
c.copy(children = c.children.map(trimNonTopLevelAliases))
case c: CreateStructUnsafe if !stop =>
stop = true
c.copy(children = c.children.map(trimNonTopLevelAliases))
case Alias(child, _) if !stop => child
case Alias(child, _) => child
}
}
}
Expand Down Expand Up @@ -2217,3 +2202,19 @@ object TimeWindowing extends Rule[LogicalPlan] {
}
}
}

/**
* Resolve a [[CreateNamedStruct]] if it contains [[NamePlaceholder]]s.
*/
object ResolveCreateNamedStruct extends Rule[LogicalPlan] {
override def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressions {
case e: CreateNamedStruct if !e.resolved =>
val children = e.children.grouped(2).flatMap {
case Seq(NamePlaceholder, e: NamedExpression) if e.resolved =>
Seq(Literal(e.name), e)
case kv =>
kv
}
CreateNamedStruct(children.toList)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,7 @@ object FunctionRegistry {
expression[MapValues]("map_values"),
expression[Size]("size"),
expression[SortArray]("sort_array"),
expression[CreateStruct]("struct"),
CreateStruct.registryEntry,

// misc functions
expression[AssertTrue]("assert_true"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,6 @@ object UnsafeProjection {
*/
def create(exprs: Seq[Expression]): UnsafeProjection = {
val unsafeExprs = exprs.map(_ transform {
case CreateStruct(children) => CreateStructUnsafe(children)
case CreateNamedStruct(children) => CreateNamedStructUnsafe(children)
})
GenerateUnsafeProjection.generate(unsafeExprs)
Expand All @@ -145,7 +144,6 @@ object UnsafeProjection {
subexpressionEliminationEnabled: Boolean): UnsafeProjection = {
val e = exprs.map(BindReferences.bindReference(_, inputSchema))
.map(_ transform {
case CreateStruct(children) => CreateStructUnsafe(children)
case CreateNamedStruct(children) => CreateNamedStructUnsafe(children)
})
GenerateUnsafeProjection.generate(e, subexpressionEliminationEnabled)
Expand Down
Loading