-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16839][SQL] Simplify Struct creation code path #15718
Conversation
…ruct into CreateNamedStruct, this maintains field names and simplifies Alias removal.
…sts in analysis package are now passing.
…>namedStruct loginc into its own Analyzer rule.
…eStruct: CreateStruct is now unevaluable, direct test is marked as ignored.
…eStruct: reorg analyzer rules according to cloud-fan's comment.
…teStruct: ExpressionEncoder used CreateStruct directly, fixed.
…eStruct: refactored ResolveCreateStruct, move the conversion logic to CrateStruct and CreateStructUnsafe, introduce a helper mixin.
…eStruct: few tester fixes, few tests are still failing.
…teStruct: it turns out we're calling toCreateNameStruct during AST construction (and some tests), in these cases the tree is not resolved yet which causes the dataType method to explode. instead of relying on the dataType method, we examine the children names directly, as a manner of code reuse dataType is now implemented in terms of toCreateNamedStruct.
…esulted with an invalid URI, URI considered file:c:/... as a relative path, modified to file:c:/...).
…miss any expressions 'hiding' in projections or other corners of the logical plan.
…nt_aliases_after_cleanupAliases
…ast-builder no longer 'resolves' createStruct into createNamedStruct.
…ow a 'constructor' of CreateNamedStruct.
…ruct and CreateStructUnsafe. restructure test to meet project's standards.
…en file for a test affected by CreateStruct becoming a CreateNamedStruct.
…pAliases Conflicts: sql/hive/src/test/resources/sqlgen/subquery_in_having_2.sql
fix R tests that relied on the generated alias for CreateStruct expressions (thanks to @HyukjinKwon)
…n expected output.
…pAliases Conflicts: sql/hive/src/test/resources/sqlgen/subquery_in_having_2.sql
…y avoids calling name() on unresolved named expressions.
… contribute minimal fix to get (some of) hive tests running on windows.
…mes are not available when CreateStruct.apply executes, we now identify this during construction and plant a place holder for the attrubute name (leaving the resulting CreateNamesStruct unresolved). once the Analyzer resolves the relevant child expressions we deduce the attirbute names from the values
… step to get (some of) hive tests running on Windows.
[SPARK-16839][SQL] Use unresolved CreateStruct instead
…nt_aliases_after_cleanupAliases
…cases for org.apache.spark.sql.SQLQueryTestSuite, basically add an explicit alias over an aggregated struct in projection, this is basically best practice any way as there's no guarantee that 'generated' aliases are stable.
cc @eyalfa |
Test build #67906 has finished for PR 15718 at commit
|
# Conflicts: # sql/core/src/test/resources/sql-tests/inputs/group-by.sql # sql/core/src/test/resources/sql-tests/results/group-by.sql.out
Test build #67918 has finished for PR 15718 at commit
|
Test build #67925 has finished for PR 15718 at commit
|
Merging to master/2.1. |
## What changes were proposed in this pull request? Simplify struct creation, especially the aspect of `CleanupAliases` which missed some aliases when handling trees created by `CreateStruct`. This PR includes: 1. A failing test (create struct with nested aliases, some of the aliases survive `CleanupAliases`). 2. A fix that transforms `CreateStruct` into a `CreateNamedStruct` constructor, effectively eliminating `CreateStruct` from all expression trees. 3. A `NamePlaceHolder` used by `CreateStruct` when column names cannot be extracted from unresolved `NamedExpression`. 4. A new Analyzer rule that resolves `NamePlaceHolder` into a string literal once the `NamedExpression` is resolved. 5. `CleanupAliases` code was simplified as it no longer has to deal with `CreateStruct`'s top level columns. ## How was this patch tested? Running all tests-suits in package org.apache.spark.sql, especially including the analysis suite, making sure added test initially fails, after applying suggested fix rerun the entire analysis package successfully. Modified few tests that expected `CreateStruct` which is now transformed into `CreateNamedStruct`. Author: eyal farago <eyal farago> Author: Herman van Hovell <[email protected]> Author: eyal farago <[email protected]> Author: Eyal Farago <[email protected]> Author: Hyukjin Kwon <[email protected]> Author: eyalfa <[email protected]> Closes #15718 from hvanhovell/SPARK-16839-2. (cherry picked from commit f151bd1) Signed-off-by: Herman van Hovell <[email protected]>
StructType(fields) | ||
} | ||
|
||
case object NamePlaceholder extends LeafExpression with Unevaluable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell Although this is merged, I have a question about this NamePlaceholder
.
Is it necessary? Actually it has no function and will be replaced with the field name later in analysis.
The resolution of CreateNamedStruct
is depended on the field expressions, actually. So if we directly use the field name, seems it is no harm too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. I got it. nvm. Some unresolved named expressions cannot be accessed of name
property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking the same thing at first and I tried a few things. However this was the most straightforward way of getting it to work.
## What changes were proposed in this pull request? Simplify struct creation, especially the aspect of `CleanupAliases` which missed some aliases when handling trees created by `CreateStruct`. This PR includes: 1. A failing test (create struct with nested aliases, some of the aliases survive `CleanupAliases`). 2. A fix that transforms `CreateStruct` into a `CreateNamedStruct` constructor, effectively eliminating `CreateStruct` from all expression trees. 3. A `NamePlaceHolder` used by `CreateStruct` when column names cannot be extracted from unresolved `NamedExpression`. 4. A new Analyzer rule that resolves `NamePlaceHolder` into a string literal once the `NamedExpression` is resolved. 5. `CleanupAliases` code was simplified as it no longer has to deal with `CreateStruct`'s top level columns. ## How was this patch tested? Running all tests-suits in package org.apache.spark.sql, especially including the analysis suite, making sure added test initially fails, after applying suggested fix rerun the entire analysis package successfully. Modified few tests that expected `CreateStruct` which is now transformed into `CreateNamedStruct`. Author: eyal farago <eyal farago> Author: Herman van Hovell <[email protected]> Author: eyal farago <[email protected]> Author: Eyal Farago <[email protected]> Author: Hyukjin Kwon <[email protected]> Author: eyalfa <[email protected]> Closes apache#15718 from hvanhovell/SPARK-16839-2.
What changes were proposed in this pull request?
Simplify struct creation, especially the aspect of
CleanupAliases
which missed some aliases when handling trees created byCreateStruct
.This PR includes:
CleanupAliases
).CreateStruct
into aCreateNamedStruct
constructor, effectively eliminatingCreateStruct
from all expression trees.NamePlaceHolder
used byCreateStruct
when column names cannot be extracted from unresolvedNamedExpression
.NamePlaceHolder
into a string literal once theNamedExpression
is resolved.CleanupAliases
code was simplified as it no longer has to deal withCreateStruct
's top level columns.How was this patch tested?
Running all tests-suits in package org.apache.spark.sql, especially including the analysis suite, making sure added test initially fails, after applying suggested fix rerun the entire analysis package successfully.
Modified few tests that expected
CreateStruct
which is now transformed intoCreateNamedStruct
.