support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP #44487

AilinKid · 2023-06-07T10:53:52Z

Feature Request

Is your feature request related to a problem? Please describe:

support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP

#44216 logical expand support
#44436 grouping function refactor

Now, after the above two PR is merged, now we can support transforming a ast rollup syntax to a Expand logical plan.
There is something important that needs to be done before we step into the next physical plan generation --- that's rewriting.

grouping function needs to be rewritten. what the user typed in the SQL layer is the raw definition of the grouping function which receives multi-raw columns ref as its parameters with a maximum number of up to 64. While in its runtime, what she only considered is the gid(uint64) column, getting the gid uint64 value from every row and comparing it with the meta filled in the rewriting time according to some rule regulated with the grouping mode.

case: select grouping(a), grouping(b,a) from t group by a, b with rollup.
                      |             |
                      |             +>.  grouping(gid) with [meta1, meta2]
                      +>    grouping(gid) with meta

except for the grouping function, normal columns or expressions should be considered to be rewritten as grouping set columns or expressions. As we talked about in the previous pull request, the gby expressions will be projected out in one another projection under expand even if it's simple columns already. We do this because gby item will be filled with a null value when we replicate the source rows to distinguish data for different grouping layouts, so it's not a real column ref yet! (column ref doesn't change anything from the source column, including the field type). So is there any chance we should use the source column rather than the copied-changed grouping column? Yes, any column args in an aggregation should use the source column ref, otherwise, use the copied-changed grouping column instead!

eg： select   sum(a),  a,  a+1,  grouping(b,a) from t group by a, b with rollup.
                |      |     |             +>   absolutely resolve the grouping set col a and b.
                |      |     +>  resolve to the grouping set col a'#3. (it can show NULL anyway)
                |     +> resolve to the grouping set col a'#3. (it can show NULL anyway)
                +> resolve to the base column a.

logical plan:   
                projection: col#9, a'#3, a'#3+1, grouping(gid#5, metas)
                     aggregation: sum(a) -> col#9
                         expand:   [a, b, a'#3, b'#4, gid#5]
                             + projection:  [a, b, a -> a'#3, b -> b'#4]

After comparison between spark and MySQL, the former has a more analyzing ability, cases like below can even be resolved.

eg： select   sum(a),  a+1,  grouping(1+a,b) from t group by 1+a, b with rollup.
                |      |                 +>   absolutely resolve the grouping set col a+1 (A#3) and b'#4.
                |      |     
                |      +> resolve to the grouping set ** Expression ** a+1, here we should substitute it with A#3. 
                +> resolve to the base column a.

logical plan:   
                projection: col#9, A#3,  grouping(gid#5, metas)
                     aggregation: sum(a) -> col#9
                         expand:   [a, b, A#3, b'#4, gid#5]
                             + projection:  [a, b, a+1 -> A#3, b -> b'#4]

grouping function rewriting and grouping column/expression-related rewriting can exist in select-list fields/having/order clause. Given the complexity of rewriting them in having/order clause, this PR currently focus on the rewriting of them in the select-list fields.

we rewrite the grouping function in place when we first meet/rewrite it in the expressionWriter.
we rewrite the grouping expressions/columns after select Expr is built, then traverse down the expression tree to find the shallow-most scope for substitution. (how to understand shallow-most, let's say we have a field like a+1, then we got two group by item (a(a#3), a+1 (A#4)) with a rollup, the a+1 should be resolved to ** bigger** one a+1 rather than single expression a)

          
   +     find gby item using scalar plus function hashcode (shadow layer in tree),  final substitution as  A#4         
  /  \                                                                                                                                 
a     1                                                                                                                    
                                                                                                 
   +       
  /  \                                                                                                                                 
a     1  find gby item using a hashcode (deep layer in tree), final substitution as a'#3 + 1
  

both cases is reasonable, but actually, we need case1: substitute a+1 completely as gby item as A#4

After rewriting is done, then, we put the logical plan tree through the logical optimization rules, at the final rule resolveExpand we generate the level projections for this logical Expand. Then, enumerate the possible physical plan for logical Expand, we should keep in mind that currently we only have the MPP Expand Execution mode, so only MPP task type can be generated by now.

Describe the feature you'd like:

support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

The text was updated successfully, but these errors were encountered:

…cal plan exhaustion for rollup expand OP (#44488) close #44487

AilinKid added type/feature-request Categorizes issue or PR as related to a new feature. sig/planner SIG: Planner labels Jun 7, 2023

AilinKid self-assigned this Jun 7, 2023

AilinKid mentioned this issue Jun 7, 2023

planner: support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP #44488

Merged

12 tasks

AilinKid changed the title ~~support grouping function/col rewriting and physical plan exhaustion for rollup expand OP~~ support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP Jun 7, 2023

AilinKid mentioned this issue Jun 3, 2023

Implement Underlying Grouping Sets #42631

Closed

ti-chi-bot bot closed this as completed in #44488 Jun 12, 2023

ti-chi-bot bot pushed a commit that referenced this issue Jun 12, 2023

planner: support grouping function/col/expression rewriting and physi…

465bd60

…cal plan exhaustion for rollup expand OP (#44488) close #44487

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP #44487

support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP #44487

AilinKid commented Jun 7, 2023 •

edited

Loading

support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP #44487

support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP #44487

Comments

AilinKid commented Jun 7, 2023 • edited Loading

Feature Request

AilinKid commented Jun 7, 2023 •

edited

Loading