-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizer: Add partition by support for derived TopN(filter on row_nu… #41469
Optimizer: Add partition by support for derived TopN(filter on row_nu… #41469
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
bc21d15
to
ec7ca86
Compare
ec7ca86
to
b1122e9
Compare
" └─Sort 1.11 root test.customer.primary_key, test.customer.secondary_key, test.customer.c_timestamp", | ||
" └─TopN 1.11 root test.customer.c_timestamp, offset:0, count:10", | ||
" └─TableReader 1.11 root data:TopN", | ||
" └─TopN 1.11 cop[tikv] test.customer.c_timestamp, offset:0, count:10", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add partition_by_column to explain result?
like "TopN 1.11 cop[tikv] order_by:XXX desc, partition_by: XXX, offset: X, count: X"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point and will do in next update.
3c0ba4d
to
790d1e9
Compare
790d1e9
to
0e10f17
Compare
return true | ||
} | ||
|
||
// Table not clustered and window has partition by. Can not do the TopN piush down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Table not clustered and window has partition by. Can not do the TopN piush down. | |
// Table not clustered and window has partition by. Can not do the TopN push down. |
A typo here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it and will be fixed in next update.
5d9fd72
to
45a7402
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should have some tests to cover result correctness? Rest LGTM.
@@ -49,6 +49,9 @@ const ( | |||
// TiDBOptAggPushDown is used to enable/disable the optimizer rule of aggregation push down. | |||
TiDBOptAggPushDown = "tidb_opt_agg_push_down" | |||
|
|||
// TiDBOptDeriveTopN is used to enable/disable the optimizer rule of deriving topN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we control this optimizer rule open or not by the blocklist, instead of a new variable?
ref: https://docs.pingcap.com/tidb/stable/blocklist-control-plan#important-optimization-rules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the use case is different for system variables and blocklist. Blocklist to me seems a configuration which lasts longer. Session/global variables can be used to control a more granular level on query level as a workaround to force using or not using an optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, actually the optimizer rule blocklist is rarely used now, so a system variable seems better here.
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 45a7402
|
/retest |
Issue Number: Ref #39792
Problem Summary:
This is a continuation of #41209 which introduced deriving top N from "row_number as RN ... where RN <= N". This PR adds support for partition by in the window function "row_number as RN over (partition by X) ... where RN <= N" and restricted if the partition by is a prefix of the PK. Execution support for this extension is here tikv/tikv#14116
Design
This optimization is a logical rewrite that (1) finds the desired pattern and (2) rewrites the query by inserting a TopN node that is expected to be pushed down to storage later by the physical planner.
Pattern
The logical rewrite looks like for the pattern selection(filter)->window->data-source
with the following restrictions:
- Filter is simple condition of row_number < value or row_number <= value
- The window function is a simple row number with default frame. If the row_number has partition by then the partition
- by fields should be a prefix of clustered index. This is reflects the limitation in the executor that works only
for this case.
- Data source has no tiflash option.
Rewrite
Add topN above (down stream) of data source. The write change selection(filter)->window->data-source to selection(filter)->window->topN->data-source. TopN node is extended to store the partition by of row_number if any. The physical planner may change TopN to Limit if sort is met at the source. For this reason, we also extended PhysicalLimit to include partition by as well. Another change made to physical planner is to only apply TopN/Limit to TiKv since it does not help to apply it at the root. Applying TopN at the root with partition by also requires changes to executor which is not needed.
Dependencies
The change in this PR address only the planner enhancements. The end to end solution requires: (a) extending the interface between physical planner and execution to include the partition by fields for both Limit and TopN (b) enhancing the execution engine code for Limit and TopN to support partitioning when data is fully/partially ordered on partition key.
Scope and user interface
The change adds a session/global variable to enable/disable this optimization called "tidb_opt_derive_topn" and it is off by default to reduce the risk. The change also limits the optimization to TikV and rule not applied when TiFlash exists. This is the case since TiFllash execution does not support the extended TopN (with partition). Also, TiFlash can push down window functions and probably does not need this optimization.
Customer use case
The original user case that motivates this change is tested and work as expected in the planner. See planner/core/casetest/testdata/derive_topn_from_window_out.json in this PR.
Tests
Unit tests
Side effects
None
Documentation
None
Release note