Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer: Add partition by support for derived TopN(filter on row_nu… #41469

Merged
merged 2 commits into from
Feb 23, 2023

Conversation

ghazalfamilyusa
Copy link
Contributor

@ghazalfamilyusa ghazalfamilyusa commented Feb 16, 2023

Issue Number: Ref #39792

Problem Summary:

This is a continuation of #41209 which introduced deriving top N from "row_number as RN ... where RN <= N". This PR adds support for partition by in the window function "row_number as RN over (partition by X) ... where RN <= N" and restricted if the partition by is a prefix of the PK. Execution support for this extension is here tikv/tikv#14116

Design

This optimization is a logical rewrite that (1) finds the desired pattern and (2) rewrites the query by inserting a TopN node that is expected to be pushed down to storage later by the physical planner.

  • Pattern
    The logical rewrite looks like for the pattern selection(filter)->window->data-source
    with the following restrictions:
    - Filter is simple condition of row_number < value or row_number <= value
    - The window function is a simple row number with default frame. If the row_number has partition by then the partition
    - by fields should be a prefix of clustered index. This is reflects the limitation in the executor that works only
    for this case.
    - Data source has no tiflash option.

  • Rewrite
    Add topN above (down stream) of data source. The write change selection(filter)->window->data-source to selection(filter)->window->topN->data-source. TopN node is extended to store the partition by of row_number if any. The physical planner may change TopN to Limit if sort is met at the source. For this reason, we also extended PhysicalLimit to include partition by as well. Another change made to physical planner is to only apply TopN/Limit to TiKv since it does not help to apply it at the root. Applying TopN at the root with partition by also requires changes to executor which is not needed.

  • Dependencies
    The change in this PR address only the planner enhancements. The end to end solution requires: (a) extending the interface between physical planner and execution to include the partition by fields for both Limit and TopN (b) enhancing the execution engine code for Limit and TopN to support partitioning when data is fully/partially ordered on partition key.

Scope and user interface
The change adds a session/global variable to enable/disable this optimization called "tidb_opt_derive_topn" and it is off by default to reduce the risk. The change also limits the optimization to TikV and rule not applied when TiFlash exists. This is the case since TiFllash execution does not support the extended TopN (with partition). Also, TiFlash can push down window functions and probably does not need this optimization.

Customer use case
The original user case that motivates this change is tested and work as expected in the planner. See planner/core/casetest/testdata/derive_topn_from_window_out.json in this PR.

        "SQL": "select * from (select *, row_number() over (partition by primary_key, secondary_key order by c_timestamp) as rownum from customer where primary_key = 0x002 and secondary_key >= 0x001 and c_timestamp >= 1661883508511000000) as nested where rownum <= 10 order by secondary_key desc;",
        "Plan": [
          "Sort 0.89 root  test.customer.secondary_key:desc",
          "└─Selection 0.89 root  le(Column#6, 10)",
          "  └─Window 1.11 root  row_number()->Column#6 over(partition by test.customer.primary_key, test.customer.secondary_key order by test.customer.c_timestamp rows between current row and current row)",
          "    └─Sort 1.11 root  test.customer.primary_key, test.customer.secondary_key, test.customer.c_timestamp",
          "      └─TableReader 1.11 root  data:TopN",
          "        └─TopN 1.11 cop[tikv]  partition by test.customer.primary_key, test.customer.secondary_key order by test.customer.c_timestamp, offset:0, count:10",
          "          └─Selection 1.11 cop[tikv]  ge(test.customer.c_timestamp, 1661883508511000000)",
          "            └─TableRangeScan 33.33 cop[tikv] table:customer range:[0x0002 0x0001,0x0002 +inf], keep order:false, stats:pseudo"

Tests

Unit tests

Side effects

None

Documentation

None

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Feb 16, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • chrysan
  • qw4990

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 16, 2023
@ghazalfamilyusa ghazalfamilyusa marked this pull request as draft February 16, 2023 00:09
@ti-chi-bot ti-chi-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 16, 2023
@ghazalfamilyusa ghazalfamilyusa marked this pull request as ready for review February 16, 2023 00:50
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 16, 2023
@ghazalfamilyusa ghazalfamilyusa requested a review from a team as a code owner February 16, 2023 20:17
" └─Sort 1.11 root test.customer.primary_key, test.customer.secondary_key, test.customer.c_timestamp",
" └─TopN 1.11 root test.customer.c_timestamp, offset:0, count:10",
" └─TableReader 1.11 root data:TopN",
" └─TopN 1.11 cop[tikv] test.customer.c_timestamp, offset:0, count:10",
Copy link
Contributor

@LittleFall LittleFall Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add partition_by_column to explain result?
like "TopN 1.11 cop[tikv] order_by:XXX desc, partition_by: XXX, offset: X, count: X"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point and will do in next update.

@ghazalfamilyusa ghazalfamilyusa force-pushed the derived_topn_latest branch 2 times, most recently from 3c0ba4d to 790d1e9 Compare February 17, 2023 20:15
return true
}

// Table not clustered and window has partition by. Can not do the TopN piush down.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Table not clustered and window has partition by. Can not do the TopN piush down.
// Table not clustered and window has partition by. Can not do the TopN push down.

A typo here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it and will be fixed in next update.

Copy link
Contributor

@chrysan chrysan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have some tests to cover result correctness? Rest LGTM.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 22, 2023
@@ -49,6 +49,9 @@ const (
// TiDBOptAggPushDown is used to enable/disable the optimizer rule of aggregation push down.
TiDBOptAggPushDown = "tidb_opt_agg_push_down"

// TiDBOptDeriveTopN is used to enable/disable the optimizer rule of deriving topN.
Copy link
Contributor

@LittleFall LittleFall Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we control this optimizer rule open or not by the blocklist, instead of a new variable?

ref: https://docs.pingcap.com/tidb/stable/blocklist-control-plan#important-optimization-rules

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use case is different for system variables and blocklist. Blocklist to me seems a configuration which lasts longer. Session/global variables can be used to control a more granular level on query level as a workaround to force using or not using an optimization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, actually the optimizer rule blocklist is rarely used now, so a system variable seems better here.

@qw4990 qw4990 added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner labels Feb 23, 2023
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Feb 23, 2023
@qw4990
Copy link
Contributor

qw4990 commented Feb 23, 2023

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 45a7402

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Feb 23, 2023
@qw4990
Copy link
Contributor

qw4990 commented Feb 23, 2023

/retest

@ti-chi-bot ti-chi-bot merged commit f2163e7 into pingcap:master Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants