You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe:
support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP
#44216 logical expand support #44436 grouping function refactor
Now, after the above two PR is merged, now we can support transforming a ast rollup syntax to a Expand logical plan.
There is something important that needs to be done before we step into the next physical plan generation --- that's rewriting.
grouping function needs to be rewritten. what the user typed in the SQL layer is the raw definition of the grouping function which receives multi-raw columns ref as its parameters with a maximum number of up to 64. While in its runtime, what she only considered is the gid(uint64) column, getting the gid uint64 value from every row and comparing it with the meta filled in the rewriting time according to some rule regulated with the grouping mode.
case: select grouping(a), grouping(b,a) from t group by a, b with rollup.
| |
| +>. grouping(gid) with [meta1, meta2]
+> grouping(gid) with meta
except for the grouping function, normal columns or expressions should be considered to be rewritten as grouping set columns or expressions. As we talked about in the previous pull request, the gby expressions will be projected out in one another projection under expand even if it's simple columns already. We do this because gby item will be filled with a null value when we replicate the source rows to distinguish data for different grouping layouts, so it's not a real column ref yet! (column ref doesn't change anything from the source column, including the field type). So is there any chance we should use the source column rather than the copied-changed grouping column? Yes, any column args in an aggregation should use the source column ref, otherwise, use the copied-changed grouping column instead!
eg: select sum(a), a, a+1, grouping(b,a) from t group by a, b with rollup.
| | | +> absolutely resolve the grouping set col a and b.
| | +> resolve to the grouping set col a'#3. (it can show NULL anyway)
| +> resolve to the grouping set col a'#3. (it can show NULL anyway)
+> resolve to the base column a.
logical plan:
projection: col#9, a'#3, a'#3+1, grouping(gid#5, metas)
aggregation: sum(a) -> col#9
expand: [a, b, a'#3, b'#4, gid#5]
+ projection: [a, b, a -> a'#3, b -> b'#4]
After comparison between spark and MySQL, the former has a more analyzing ability, cases like below can even be resolved.
eg: select sum(a), a+1, grouping(1+a,b) from t group by 1+a, b with rollup.
| | +> absolutely resolve the grouping set col a+1 (A#3) and b'#4.
| |
| +> resolve to the grouping set ** Expression ** a+1, here we should substitute it with A#3.
+> resolve to the base column a.
logical plan:
projection: col#9, A#3, grouping(gid#5, metas)
aggregation: sum(a) -> col#9
expand: [a, b, A#3, b'#4, gid#5]
+ projection: [a, b, a+1 -> A#3, b -> b'#4]
grouping function rewriting and grouping column/expression-related rewriting can exist in select-list fields/having/order clause. Given the complexity of rewriting them in having/order clause, this PR currently focus on the rewriting of them in the select-list fields.
we rewrite the grouping function in place when we first meet/rewrite it in the expressionWriter.
we rewrite the grouping expressions/columns after select Expr is built, then traverse down the expression tree to find the shallow-most scope for substitution. (how to understand shallow-most, let's say we have a field like a+1, then we got two group by item (a(a#3), a+1 (A#4)) with a rollup, the a+1 should be resolved to ** bigger** one a+1 rather than single expression a)
+ find gby item using scalar plus function hashcode (shadow layer in tree), final substitution as A#4
/ \
a 1
+
/ \
a 1 find gby item using a hashcode (deep layer in tree), final substitution as a'#3 + 1
both cases is reasonable, but actually, we need case1: substitute a+1 completely as gby item as A#4
After rewriting is done, then, we put the logical plan tree through the logical optimization rules, at the final rule resolveExpand we generate the level projections for this logical Expand. Then, enumerate the possible physical plan for logical Expand, we should keep in mind that currently we only have the MPP Expand Execution mode, so only MPP task type can be generated by now.
Describe the feature you'd like:
support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP
AilinKid
changed the title
support grouping function/col rewriting and physical plan exhaustion for rollup expand OP
support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP
Jun 7, 2023
Feature Request
Is your feature request related to a problem? Please describe:
support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP
#44216 logical expand support
#44436 grouping function refactor
Now, after the above two PR is merged, now we can support transforming a ast rollup syntax to a Expand logical plan.
There is something important that needs to be done before we step into the next physical plan generation --- that's rewriting.
resolveExpand
we generate the level projections for this logical Expand. Then, enumerate the possible physical plan for logical Expand, we should keep in mind that currently we only have the MPP Expand Execution mode, so only MPP task type can be generated by now.Describe the feature you'd like:
support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy:
The text was updated successfully, but these errors were encountered: