planner: improve skyline pruning #26271

xuyifangreeneyes · 2021-07-15T07:33:46Z

What problem does this PR solve?

What is changed and how it works?

What's Changed & How it Works:

When comparing whether double scan is needed, if both paths need double scan, we further compare the column set of IndexFilters, if the column set of path A is the subset of that of path B, then path B has less table rows to access than path A, so path B is better than path A on the factor. If there is no subset relation, then the two paths are incomparable.
For the isMatchProp factor,

create table t(a int, b int, c int, d int, index idx_a_b_c(a, b, c));
explain select a, b, c from t where a > 3 and b = 4 order by a, c;

idx_a_b_c actually matches the order a, c since b = 4 is constant. This pr takes the situation into consideration. (ps: this pr can also close #26017)

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

improve skyline pruning rules

ti-chi-bot · 2021-07-15T07:33:48Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

time-and-fate
winoros

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

xuyifangreeneyes · 2021-07-15T07:35:52Z

/cc @winoros

planner/core/find_best_task.go

time-and-fate

A bad case:

create table t(a int, b int, c int, d int, index i(a,b,c));
explain select a,b,c from t where (a=1 and b=1 and c=1) or (a=1 and b=1 and c=2) order by c;

+--------------------------+---------+-----------+---------------------------+-----------------------------------------------------+
| id                       | estRows | task      | access object             | operator info                                       |
+--------------------------+---------+-----------+---------------------------+-----------------------------------------------------+
| Sort_5                   | 0.03    | root      |                           | test.t.c                                            |
| └─IndexReader_9          | 0.03    | root      |                           | index:IndexRangeScan_8                              |
|   └─IndexRangeScan_8     | 0.03    | cop[tikv] | table:t, index:i(a, b, c) | range:[1 1 1,1 1 2], keep order:false, stats:pseudo |
+--------------------------+---------+-----------+---------------------------+-----------------------------------------------------+

It's because the EqualCols is not calculated correctly.
Seems like we also need to handle the pointRes case in detachCNFCondAndBuildRangeForIndex and the DNF case in detachCondAndBuildRangeForCols.

winoros · 2021-07-26T17:41:53Z

util/ranger/detacher.go

@@ -619,6 +710,9 @@ type DetachRangeResult struct {
 	AccessConds []expression.Expression
 	// RemainedConds is the filter conditions which should be kept after access.
 	RemainedConds []expression.Expression
+	// ColumnValues records the constant column values for all index columns.
+	// For the ith column, if it is evaluated as constant, ColumnValues[i] is its value. Otherwise ColumnValues[i] is nil.
+	ColumnValues []*valueInfo
 	// EqCondCount is the number of equal conditions extracted.
 	EqCondCount int


This var can be deleted now?

There is slight difference between how ColumnValues and EqCondCount are calculated. For example,

mysql> create table t2(a int, b int, c int, d int, index idx_a_b_c_d(a, b, c, d)); Query OK, 0 rows affected (0.01 sec) mysql> explain select * from t2 where ((a = 1 and b = 1 and d < 3) or (a = 1 and b = 1 and d > 6)) and c = 3 order by d; +---------------------------+---------+-----------+-----------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +---------------------------+---------+-----------+-----------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ | IndexReader_14 | 0.00 | root | | index:Selection_13 | | └─Selection_13 | 0.00 | cop[tikv] | | eq(test.t2.c, 3), or(and(eq(test.t2.a, 1), and(eq(test.t2.b, 1), lt(test.t2.d, 3))), and(eq(test.t2.a, 1), and(eq(test.t2.b, 1), gt(test.t2.d, 6)))) | | └─IndexRangeScan_12 | 10.00 | cop[tikv] | table:t2, index:idx_a_b_c_d(a, b, c, d) | range:[1,1], keep order:true, stats:pseudo | +---------------------------+---------+-----------+-----------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+

In the case, ColumnValues detect a, b, c are constant columns while EqCondCount only detect a is a constant column. Do we need to improve it in this pr or unify ColumnValues and EqCondCount in another pr?

I think we can improve it later, not in this PR. What's your opinion? @qw4990 @time-and-fate

I think it's OK to improve it later.

qw4990 · 2021-07-27T11:05:39Z

planner/core/find_best_task.go

+func compareIndexBack(lhs, rhs *candidatePath) (int, bool) {
+	result := compareBool(lhs.isSingleScan, rhs.isSingleScan)
+	if result == 0 && !lhs.isSingleScan {
+		// if both lhs and rhs need to access table after IndexScan, we use the set of columns that occurred in IndexFilters
+		// to compare how many table rows will be accessed.
+		return compareColumnSet(lhs.indexFiltersColSet, rhs.indexFiltersColSet)
+	}
+	return result, true
+}


Is there any test case that can cover this rule?

Yes. https://github.com/xuyifangreeneyes/tidb/blob/116e60b8ad38f7ebfa5ea68b09e87054723a9394/planner/core/logical_plan_test.go#L1703-L1710 In the two cases f_g is better than f due to this rule.

qw4990 · 2021-07-27T11:07:41Z

planner/core/find_best_task.go

 					break
-				} else if i >= path.EqCondCount {
+				}
+				if path.ConstCols == nil || !path.ConstCols[i] {


How about adding one more condition || i >= len(path.ConstCols) || for safety?

util/ranger/detacher.go

time-and-fate · 2021-07-28T18:30:35Z

util/ranger/detacher.go

+	return r, offset, columnValues, nil
+}
+
+func unionColumnValues(lhs, rhs []*valueInfo, numCols int) []*valueInfo {


Now the numCols should have become useless.

time-and-fate

LGTM. The logic in the util/ranger is quite complicated though.

winoros · 2021-08-02T06:06:13Z

/merge

ti-chi-bot · 2021-08-02T06:06:16Z

This pull request has been accepted and is ready to merge.

Commit hash: cf39a2c

xuyifangreeneyes added 5 commits July 14, 2021 16:32

refine index back factor of skyline prunning

d9d8ce2

fix test case

b45895a

enhance isMatchProp

25b39f5

fix ut

6ad7d5c

add test for isMatchProp

d246484

ti-chi-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 15, 2021

ti-chi-bot requested a review from winoros July 15, 2021 07:35

fmt

0685f15

github-actions bot added the sig/planner SIG: Planner label Jul 15, 2021

winoros requested review from time-and-fate and qw4990 July 19, 2021 07:57

qw4990 reviewed Jul 20, 2021

View reviewed changes

planner/core/find_best_task.go Outdated Show resolved Hide resolved

add comment

486fdc7

ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 21, 2021

time-and-fate reviewed Jul 21, 2021

View reviewed changes

enhance detection of constant columns

7decc45

ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 23, 2021

winoros reviewed Jul 26, 2021

View reviewed changes

xuyifangreeneyes and others added 2 commits July 27, 2021 10:44

fix ut & add comment

b2d975c

Merge branch 'master' into improve-skyline-pruning-2

116e60b

ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 27, 2021

xuyifangreeneyes requested review from time-and-fate, qw4990 and winoros July 27, 2021 02:55

qw4990 reviewed Jul 27, 2021

View reviewed changes

minor fix

aadd749

xuyifangreeneyes requested a review from qw4990 July 28, 2021 03:41

time-and-fate reviewed Jul 28, 2021

View reviewed changes

util/ranger/detacher.go Outdated Show resolved Hide resolved

util/ranger/detacher.go Show resolved Hide resolved

util/ranger/detacher.go Outdated Show resolved Hide resolved

minor fix

a5ae5a4

time-and-fate reviewed Jul 28, 2021

View reviewed changes

time-and-fate approved these changes Jul 28, 2021

View reviewed changes

ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 28, 2021

minor fix

cf39a2c

winoros approved these changes Aug 2, 2021

View reviewed changes

ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 2, 2021

ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Aug 2, 2021

Merge branch 'master' into improve-skyline-pruning-2

c33aa50

ti-chi-bot merged commit 7f28438 into pingcap:master Aug 2, 2021

xuyifangreeneyes deleted the improve-skyline-pruning-2 branch October 30, 2021 05:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planner: improve skyline pruning #26271

planner: improve skyline pruning #26271

xuyifangreeneyes commented Jul 15, 2021 •

edited by tisonkun

Loading

ti-chi-bot commented Jul 15, 2021 •

edited

Loading

xuyifangreeneyes commented Jul 15, 2021

time-and-fate left a comment •

edited

Loading

winoros Jul 26, 2021

xuyifangreeneyes Jul 27, 2021

winoros Jul 27, 2021

time-and-fate Jul 28, 2021

qw4990 Jul 27, 2021

xuyifangreeneyes Jul 27, 2021

qw4990 Jul 27, 2021

time-and-fate Jul 28, 2021

time-and-fate left a comment

winoros commented Aug 2, 2021

ti-chi-bot commented Aug 2, 2021

planner: improve skyline pruning #26271

planner: improve skyline pruning #26271

Conversation

xuyifangreeneyes commented Jul 15, 2021 • edited by tisonkun Loading

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

ti-chi-bot commented Jul 15, 2021 • edited Loading

xuyifangreeneyes commented Jul 15, 2021

time-and-fate left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

time-and-fate left a comment

Choose a reason for hiding this comment

winoros commented Aug 2, 2021

ti-chi-bot commented Aug 2, 2021

xuyifangreeneyes commented Jul 15, 2021 •

edited by tisonkun

Loading

ti-chi-bot commented Jul 15, 2021 •

edited

Loading

time-and-fate left a comment •

edited

Loading