-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4296][SQL] Trims aliases when resolving and checking aggregate expressions #3910
Conversation
Test build #25100 has started for PR 3910 at commit
|
Test build #25100 has finished for PR 3910 at commit
|
Test FAILed. |
Test build #25123 has started for PR 3910 at commit
|
Test build #25123 has finished for PR 3910 at commit
|
Test FAILed. |
Test build #563 has started for PR 3910 at commit
|
Test build #563 has finished for PR 3910 at commit
|
// Should trim aliases. These aliases can only be introduced while resolving unnamed | ||
// expressions like `GetField` and UDF calls, because GROUP BY clause doesn't allow | ||
// aliasing. | ||
.get(e.transform { case a: Alias => a.child }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this line can remove needed aliases. For example, in subq2.q
SELECT a.k, a.c
FROM (SELECT b.key as k, count(1) as c FROM src b GROUP BY b.key) a
WHERE a.k >= 90;
We have the following plans.
[info] == Parsed Logical Plan ==
[info] 'Project ['a.k,'a.c]
[info] 'Filter ('a.k >= 90)
[info] 'Subquery a
[info] 'Aggregate ['b.key], ['b.key AS k#84130,COUNT(1) AS c#84131L]
[info] 'UnresolvedRelation [src], Some(b)
[info]
[info] == Analyzed Logical Plan ==
[info] Project [k#84130,c#84131L]
[info] Filter (k#84130 >= 90)
[info] Aggregate [key#84132], [key#84132 AS k#84130,COUNT(1) AS c#84131L]
[info] MetastoreRelation default, src, Some(b)
[info]
[info] == Optimized Logical Plan ==
[info] Filter (k#84130 >= 90)
[info] Aggregate [key#84132], [key#84132 AS k#84130,COUNT(1) AS c#84131L]
[info] Project [key#84132]
[info] MetastoreRelation default, src, Some(b)
[info]
[info] == Physical Plan ==
[info] !Filter (k#84130 >= 90)
[info] Aggregate false, [key#84132], [key#84132,Coalesce(SUM(PartialCount#84135L),0) AS c#84131L]
[info] Exchange (HashPartitioning [key#84132], 2)
[info] Aggregate true, [key#84132], [key#84132,COUNT(1) AS PartialCount#84135L]
[info] HiveTableScan [key#84132], (MetastoreRelation default, src, Some(b)), None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this issue puzzled me for a loooong time! I realized this solution is wrong. But I didn't figure out the right one until you pointed out that #3987 (accidentally?) fixes this. Thanks!
… expressions I believe that SPARK-4296 has been fixed by 3684fd2. I am adding tests based #3910 (change the udf to HiveUDF instead). Author: Yin Huai <[email protected]> Author: Cheng Lian <[email protected]> Closes #4010 from yhuai/SPARK-4296-yin and squashes the following commits: 6343800 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-4296-yin 6cfadd2 [Yin Huai] Actually, this issue has been fixed by 3684fd2. d42b707 [Yin Huai] Update comment. 8b3a274 [Yin Huai] Since expressions in grouping expressions can have aliases, which can be used by the outer query block, revert this change. 443538d [Cheng Lian] Trims aliases when resolving and checking aggregate expressions
This PR is a follow-up of PR #3248. We should not only trim
Alias
aroundGetField
but also all unnamed expressions when resolving and checking aggregate expressions. Trimming all aliases is safe becauseGROUP BY
doesn't allow aliasing, thus all aliases added to an aggregate expression which is also a grouping expression must be introduced while resolving unnamed expressions likeGetField
and UDF calls.