Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP #44487

Closed
AilinKid opened this issue Jun 7, 2023 · 0 comments · Fixed by #44488
Assignees
Labels
sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@AilinKid
Copy link
Contributor

AilinKid commented Jun 7, 2023

Feature Request

Is your feature request related to a problem? Please describe:

support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP

#44216 logical expand support
#44436 grouping function refactor

Now, after the above two PR is merged, now we can support transforming a ast rollup syntax to a Expand logical plan.
There is something important that needs to be done before we step into the next physical plan generation --- that's rewriting.

  • grouping function needs to be rewritten. what the user typed in the SQL layer is the raw definition of the grouping function which receives multi-raw columns ref as its parameters with a maximum number of up to 64. While in its runtime, what she only considered is the gid(uint64) column, getting the gid uint64 value from every row and comparing it with the meta filled in the rewriting time according to some rule regulated with the grouping mode.
case: select grouping(a), grouping(b,a) from t group by a, b with rollup.
                      |             |
                      |             +>.  grouping(gid) with [meta1, meta2]
                      +>    grouping(gid) with meta
  • except for the grouping function, normal columns or expressions should be considered to be rewritten as grouping set columns or expressions. As we talked about in the previous pull request, the gby expressions will be projected out in one another projection under expand even if it's simple columns already. We do this because gby item will be filled with a null value when we replicate the source rows to distinguish data for different grouping layouts, so it's not a real column ref yet! (column ref doesn't change anything from the source column, including the field type). So is there any chance we should use the source column rather than the copied-changed grouping column? Yes, any column args in an aggregation should use the source column ref, otherwise, use the copied-changed grouping column instead!
eg: select   sum(a),  a,  a+1,  grouping(b,a) from t group by a, b with rollup.
                |      |     |             +>   absolutely resolve the grouping set col a and b.
                |      |     +>  resolve to the grouping set col a'#3. (it can show NULL anyway)
                |     +> resolve to the grouping set col a'#3. (it can show NULL anyway)
                +> resolve to the base column a.

logical plan:   
                projection: col#9, a'#3, a'#3+1, grouping(gid#5, metas)
                     aggregation: sum(a) -> col#9
                         expand:   [a, b, a'#3, b'#4, gid#5]
                             + projection:  [a, b, a -> a'#3, b -> b'#4]
  • After comparison between spark and MySQL, the former has a more analyzing ability, cases like below can even be resolved.
eg: select   sum(a),  a+1,  grouping(1+a,b) from t group by 1+a, b with rollup.
                |      |                 +>   absolutely resolve the grouping set col a+1 (A#3) and b'#4.
                |      |     
                |      +> resolve to the grouping set ** Expression ** a+1, here we should substitute it with A#3. 
                +> resolve to the base column a.

logical plan:   
                projection: col#9, A#3,  grouping(gid#5, metas)
                     aggregation: sum(a) -> col#9
                         expand:   [a, b, A#3, b'#4, gid#5]
                             + projection:  [a, b, a+1 -> A#3, b -> b'#4]
  • grouping function rewriting and grouping column/expression-related rewriting can exist in select-list fields/having/order clause. Given the complexity of rewriting them in having/order clause, this PR currently focus on the rewriting of them in the select-list fields.
  1. we rewrite the grouping function in place when we first meet/rewrite it in the expressionWriter.
  2. we rewrite the grouping expressions/columns after select Expr is built, then traverse down the expression tree to find the shallow-most scope for substitution. (how to understand shallow-most, let's say we have a field like a+1, then we got two group by item (a(a#3), a+1 (A#4)) with a rollup, the a+1 should be resolved to ** bigger** one a+1 rather than single expression a)
          
   +     find gby item using scalar plus function hashcode (shadow layer in tree),  final substitution as  A#4         
  /  \                                                                                                                                 
a     1                                                                                                                    
                                                                                                 
   +       
  /  \                                                                                                                                 
a     1  find gby item using a hashcode (deep layer in tree), final substitution as a'#3 + 1
  

both cases is reasonable, but actually, we need case1: substitute a+1 completely as gby item as A#4
  • After rewriting is done, then, we put the logical plan tree through the logical optimization rules, at the final rule resolveExpand we generate the level projections for this logical Expand. Then, enumerate the possible physical plan for logical Expand, we should keep in mind that currently we only have the MPP Expand Execution mode, so only MPP task type can be generated by now.

Describe the feature you'd like:

support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

@AilinKid AilinKid added type/feature-request Categorizes issue or PR as related to a new feature. sig/planner SIG: Planner labels Jun 7, 2023
@AilinKid AilinKid self-assigned this Jun 7, 2023
@AilinKid AilinKid changed the title support grouping function/col rewriting and physical plan exhaustion for rollup expand OP support grouping function/col/expression rewriting and physical plan exhaustion for rollup expand OP Jun 7, 2023
ti-chi-bot bot pushed a commit that referenced this issue Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
1 participant