Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: improve skyline pruning #26271

Merged
merged 14 commits into from
Aug 2, 2021

Conversation

xuyifangreeneyes
Copy link
Contributor

@xuyifangreeneyes xuyifangreeneyes commented Jul 15, 2021

What problem does this PR solve?

closes #26320

What is changed and how it works?

What's Changed & How it Works:

  1. When comparing whether double scan is needed, if both paths need double scan, we further compare the column set of IndexFilters, if the column set of path A is the subset of that of path B, then path B has less table rows to access than path A, so path B is better than path A on the factor. If there is no subset relation, then the two paths are incomparable.
  2. For the isMatchProp factor,
create table t(a int, b int, c int, d int, index idx_a_b_c(a, b, c));
explain select a, b, c from t where a > 3 and b = 4 order by a, c;

idx_a_b_c actually matches the order a, c since b = 4 is constant. This pr takes the situation into consideration. (ps: this pr can also close #26017)

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

  • improve skyline pruning rules

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jul 15, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • time-and-fate
  • winoros

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 15, 2021
@xuyifangreeneyes
Copy link
Contributor Author

/cc @winoros

@ti-chi-bot ti-chi-bot requested a review from winoros July 15, 2021 07:35
@github-actions github-actions bot added the sig/planner SIG: Planner label Jul 15, 2021
@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 21, 2021
Copy link
Member

@time-and-fate time-and-fate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bad case:

create table t(a int, b int, c int, d int, index i(a,b,c));
explain select a,b,c from t where (a=1 and b=1 and c=1) or (a=1 and b=1 and c=2) order by c;
+--------------------------+---------+-----------+---------------------------+-----------------------------------------------------+
| id                       | estRows | task      | access object             | operator info                                       |
+--------------------------+---------+-----------+---------------------------+-----------------------------------------------------+
| Sort_5                   | 0.03    | root      |                           | test.t.c                                            |
| └─IndexReader_9          | 0.03    | root      |                           | index:IndexRangeScan_8                              |
|   └─IndexRangeScan_8     | 0.03    | cop[tikv] | table:t, index:i(a, b, c) | range:[1 1 1,1 1 2], keep order:false, stats:pseudo |
+--------------------------+---------+-----------+---------------------------+-----------------------------------------------------+

It's because the EqualCols is not calculated correctly.
Seems like we also need to handle the pointRes case in detachCNFCondAndBuildRangeForIndex and the DNF case in detachCondAndBuildRangeForCols.

@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 23, 2021
@@ -619,6 +710,9 @@ type DetachRangeResult struct {
AccessConds []expression.Expression
// RemainedConds is the filter conditions which should be kept after access.
RemainedConds []expression.Expression
// ColumnValues records the constant column values for all index columns.
// For the ith column, if it is evaluated as constant, ColumnValues[i] is its value. Otherwise ColumnValues[i] is nil.
ColumnValues []*valueInfo
// EqCondCount is the number of equal conditions extracted.
EqCondCount int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This var can be deleted now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is slight difference between how ColumnValues and EqCondCount are calculated. For example,

mysql> create table t2(a int, b int, c int, d int, index idx_a_b_c_d(a, b, c, d));
Query OK, 0 rows affected (0.01 sec)

mysql> explain select * from t2 where ((a = 1 and b = 1 and d < 3) or (a = 1 and b = 1 and d > 6)) and c = 3 order by d;
+---------------------------+---------+-----------+-----------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                        | estRows | task      | access object                           | operator info                                                                                                                                        |
+---------------------------+---------+-----------+-----------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| IndexReader_14            | 0.00    | root      |                                         | index:Selection_13                                                                                                                                   |
| └─Selection_13            | 0.00    | cop[tikv] |                                         | eq(test.t2.c, 3), or(and(eq(test.t2.a, 1), and(eq(test.t2.b, 1), lt(test.t2.d, 3))), and(eq(test.t2.a, 1), and(eq(test.t2.b, 1), gt(test.t2.d, 6)))) |
|   └─IndexRangeScan_12     | 10.00   | cop[tikv] | table:t2, index:idx_a_b_c_d(a, b, c, d) | range:[1,1], keep order:true, stats:pseudo                                                                                                           |
+---------------------------+---------+-----------+-----------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+

In the case, ColumnValues detect a, b, c are constant columns while EqCondCount only detect a is a constant column. Do we need to improve it in this pr or unify ColumnValues and EqCondCount in another pr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can improve it later, not in this PR. What's your opinion? @qw4990 @time-and-fate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to improve it later.

@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 27, 2021
Comment on lines +454 to +462
func compareIndexBack(lhs, rhs *candidatePath) (int, bool) {
result := compareBool(lhs.isSingleScan, rhs.isSingleScan)
if result == 0 && !lhs.isSingleScan {
// if both lhs and rhs need to access table after IndexScan, we use the set of columns that occurred in IndexFilters
// to compare how many table rows will be accessed.
return compareColumnSet(lhs.indexFiltersColSet, rhs.indexFiltersColSet)
}
return result, true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any test case that can cover this rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break
} else if i >= path.EqCondCount {
}
if path.ConstCols == nil || !path.ConstCols[i] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding one more condition || i >= len(path.ConstCols) || for safety?

util/ranger/detacher.go Outdated Show resolved Hide resolved
util/ranger/detacher.go Show resolved Hide resolved
util/ranger/detacher.go Outdated Show resolved Hide resolved
return r, offset, columnValues, nil
}

func unionColumnValues(lhs, rhs []*valueInfo, numCols int) []*valueInfo {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the numCols should have become useless.

Copy link
Member

@time-and-fate time-and-fate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The logic in the util/ranger is quite complicated though.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 28, 2021
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 2, 2021
@winoros
Copy link
Member

winoros commented Aug 2, 2021

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: cf39a2c

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Aug 2, 2021
@ti-chi-bot ti-chi-bot merged commit 7f28438 into pingcap:master Aug 2, 2021
@xuyifangreeneyes xuyifangreeneyes deleted the improve-skyline-pruning-2 branch October 30, 2021 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refine the skyline pruning planner: redundant sort when making sql plan
5 participants