Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35026][SQL] Support nested CUBE/ROLLUP/GROUPING SETS in GROUPING SETS #32201

Closed
wants to merge 13 commits into from
Closed
Original file line number Diff line number Diff line change
Expand Up @@ -601,14 +601,24 @@ aggregationClause
;

groupByClause
: groupingAnalytics
: nestedGroupingSets
| groupingAnalytics
| expression
;

nestedGroupingSets
: GROUPING SETS '(' nestedGroupingSet (',' nestedGroupingSet)* ')'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have (2 or more)-nested grouping sets? GROUPING SETS(GROUPING SETS(GROUPING SETS(...? I think we need tests for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have (2 or more)-nested grouping sets? GROUPING SETS(GROUPING SETS(GROUPING SETS(...? I think we need tests for it.

Good catch..Just tested, PG support nested grouping sets.

;

groupingAnalytics
: (ROLLUP | CUBE | GROUPING SETS) '(' groupingSet (',' groupingSet)* ')'
;

nestedGroupingSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about nestedGroupingSet -> groupingElement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

: groupingAnalytics
| '(' (expression (',' expression)*)? ')'
;

groupingSet
: '(' (expression (',' expression)*)? ')'
| expression
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -993,27 +993,19 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
ctx.groupingExpressionsWithGroupingAnalytics.asScala
.map(groupByExpr => {
val groupingAnalytics = groupByExpr.groupingAnalytics
val nestedGroupingSets = groupByExpr.nestedGroupingSets()
if (groupingAnalytics != null) {
val groupingSets = groupingAnalytics.groupingSet.asScala
.map(_.expression.asScala.map(e => expression(e)).toSeq)
if (groupingAnalytics.CUBE != null) {
// CUBE(A, B, (A, B), ()) is not supported.
if (groupingSets.exists(_.isEmpty)) {
throw new ParseException("Empty set in CUBE grouping sets is not supported.",
groupingAnalytics)
resolveGroupingAnalytics(groupingAnalytics)
} else if (nestedGroupingSets != null) {
val groupingSets = nestedGroupingSets.nestedGroupingSet.asScala.map { expr =>
val groupingAnalytics = expr.groupingAnalytics()
if (groupingAnalytics != null) {
resolveGroupingAnalytics(groupingAnalytics).selectedGroupByExprs
} else {
Seq(expr.expression().asScala.map(e => expression(e)))
}
Cube(groupingSets.toSeq)
} else if (groupingAnalytics.ROLLUP != null) {
// ROLLUP(A, B, (A, B), ()) is not supported.
if (groupingSets.exists(_.isEmpty)) {
throw new ParseException("Empty set in ROLLUP grouping sets is not supported.",
groupingAnalytics)
}
Rollup(groupingSets.toSeq)
} else {
assert(groupingAnalytics.GROUPING != null && groupingAnalytics.SETS != null)
GroupingSets(groupingSets.toSeq)
}
}.flatten.toSeq
GroupingSets(groupingSets)
} else {
expression(groupByExpr.expression)
}
Expand All @@ -1022,6 +1014,29 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
}
}

def resolveGroupingAnalytics(groupingAnalytics: GroupingAnalyticsContext): BaseGroupingSets = {
AngersZhuuuu marked this conversation as resolved.
Show resolved Hide resolved
val groupingSets = groupingAnalytics.groupingSet.asScala
.map(_.expression.asScala.map(e => expression(e)).toSeq)
if (groupingAnalytics.CUBE != null) {
// CUBE(A, B, (A, B), ()) is not supported.
if (groupingSets.exists(_.isEmpty)) {
throw new ParseException("Empty set in CUBE grouping sets is not supported.",
groupingAnalytics)
}
Cube(groupingSets.toSeq)
} else if (groupingAnalytics.ROLLUP != null) {
// ROLLUP(A, B, (A, B), ()) is not supported.
if (groupingSets.exists(_.isEmpty)) {
throw new ParseException("Empty set in ROLLUP grouping sets is not supported.",
groupingAnalytics)
}
Rollup(groupingSets.toSeq)
} else {
assert(groupingAnalytics.GROUPING != null && groupingAnalytics.SETS != null)
GroupingSets(groupingSets.toSeq)
}
}

/**
* Add [[UnresolvedHint]]s to a logical plan.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,3 +80,8 @@ SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS((a, b), (a), ());
SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), GROUPING SETS((a, b), (a), ());
SELECT a, b, count(1) FROM testData GROUP BY a, CUBE(a, b), ROLLUP(a, b), GROUPING SETS((a, b), (a), ());

-- Support use CUBE/ROLLUP/GROUPING SETS in GROUPING SETS
AngersZhuuuu marked this conversation as resolved.
Show resolved Hide resolved
SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS(ROLLUP(a, b));
SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS((a, b), (a), ());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is related to this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is related to this PR?

Hmmm what to show

SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS(ROLLUP(a, b));

is equal to

SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS((a, b), (a), ());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, this test is the same with one in L79?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, this test is the same with one in L79?

Remove this line

SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS(GROUPING SETS((a, b), (a), ()));
AngersZhuuuu marked this conversation as resolved.
Show resolved Hide resolved

Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 44
-- Number of queries: 47


-- !query
Expand Down Expand Up @@ -1067,3 +1067,60 @@ struct<a:int,b:int,count(1):bigint>
3 NULL 2
3 NULL 2
3 NULL 2


-- !query
SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS(ROLLUP(a, b))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: Just in case, I've checked all the results here are the same with PostgreSQL ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your double check.

-- !query schema
struct<a:int,b:int,count(1):bigint>
-- !query output
1 1 1
1 2 1
1 NULL 2
1 NULL 2
2 1 1
2 2 1
2 NULL 2
2 NULL 2
3 1 1
3 2 1
3 NULL 2
3 NULL 2


-- !query
SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS((a, b), (a), ())
-- !query schema
struct<a:int,b:int,count(1):bigint>
-- !query output
1 1 1
1 2 1
1 NULL 2
1 NULL 2
2 1 1
2 2 1
2 NULL 2
2 NULL 2
3 1 1
3 2 1
3 NULL 2
3 NULL 2


-- !query
SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS(GROUPING SETS((a, b), (a), ()))
-- !query schema
struct<a:int,b:int,count(1):bigint>
-- !query output
1 1 1
1 2 1
1 NULL 2
1 NULL 2
2 1 1
2 2 1
2 NULL 2
2 NULL 2
3 1 1
3 2 1
3 NULL 2
3 NULL 2
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 44
-- Number of queries: 47


-- !query
Expand Down Expand Up @@ -1087,3 +1087,60 @@ struct<a:int,b:int,count(1):bigint>
3 NULL 2
3 NULL 2
3 NULL 2


-- !query
SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS(ROLLUP(a, b))
-- !query schema
struct<a:int,b:int,count(1):bigint>
-- !query output
1 1 1
1 2 1
1 NULL 2
1 NULL 2
2 1 1
2 2 1
2 NULL 2
2 NULL 2
3 1 1
3 2 1
3 NULL 2
3 NULL 2


-- !query
SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS((a, b), (a), ())
-- !query schema
struct<a:int,b:int,count(1):bigint>
-- !query output
1 1 1
1 2 1
1 NULL 2
1 NULL 2
2 1 1
2 2 1
2 NULL 2
2 NULL 2
3 1 1
3 2 1
3 NULL 2
3 NULL 2


-- !query
SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS(GROUPING SETS((a, b), (a), ()))
-- !query schema
struct<a:int,b:int,count(1):bigint>
-- !query output
1 1 1
1 2 1
1 NULL 2
1 NULL 2
2 1 1
2 2 1
2 NULL 2
2 NULL 2
3 1 1
3 2 1
3 NULL 2
3 NULL 2