Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update description on rule based index selection #6160

Merged
merged 9 commits into from
Aug 20, 2021
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 67 additions & 7 deletions choose-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,23 +29,83 @@ Before introducing index selection, it is important to understand the ways TiDB

## Index selection rules

TiDB provides a heuristic rule named skyline-pruning based on the cost estimation of each operator for accessing tables. It can reduce the probability of wrong index selection caused by wrong estimation.
TiDB selects indexes based on rules or cost. the based rules include pre rules and Skyline-pruning. When selecting an index, TiDB tries the pre rule first. If an index satisfies a pre rule, TiDB directly selects the index. Otherwise, TiDB uses Skyline-pruning to exclude unqualified indexes, and then based on the cost estimation of each operator for accessing tables, selects the index with the lowest cost.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the based rules -> The based rules
Skyline-pruning -> skyline-pruning it seems that we don't capitalize the first letter before the PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether unqualified corresponds to 不合适. Maybe we can also consider improper or unsuitable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsuitable sounds better. fixed in 40a51d1.


### Selection based on rules
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

#### Pre rules
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

TiDB uses the following heuristic pre rules to select indexes:
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

+ rule 1: If an index satisfies "unique index with full match + no need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", directly select this index.
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

+ rule 2: If an index satisfies "unique index + need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", select the index with the smallest number of rows to be retrieved from a table as the candidate index.
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

+ rule 3: If an index satisfies "common index + no need to retrieve rows from a table + the number of rows to be read is less than the value of a certain threshold", select the index with the smallest number of rows to be read as the candidate index.
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

+ rule 4: If only one candidate index is selected based on rule 2 and 3, select this index. If two candidate indexes are separately selected based on rule 2 and 3, select the index with the smaller number of rows to be read (the number of rows with index + the number of rows to be retrieved from a table).
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

The "index with full match" in the above rules means each indexed column has the equal condition. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if the pre rules matches an index, TiDB will output a NOTE level warning indicating that the index matches the pre rule.
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

In the following example, because the index `idx_b` meets the condition "unique index + need to retrieve rows from a table" in rule 2, TiDB selects the index `idx_b` as the access path, and `SHOW WARNING` returns a note indicating that the index `idx_b` matches the pre rule.
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

```sql
mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, UNIQUE INDEX idx_b(b));
Query OK, 0 rows affected (0.01 sec)
mysql> EXPLAIN FORMAT = 'verbose' SELECT b, c FROM t WHERE b = 3 OR b = 6;
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved
+-------------------+---------+---------+------+-------------------------+------------------------------+
| id | estRows | estCost | task | access object | operator info |
+-------------------+---------+---------+------+-------------------------+------------------------------+
| Batch_Point_Get_5 | 2.00 | 8.80 | root | table:t, index:idx_b(b) | keep order:false, desc:false |
+-------------------+---------+---------+------+-------------------------+------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved
+-------+------+-------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------+
| Note | 1105 | unique index idx_b of t is selected since the path only has point ranges with double scan |
+-------+------+-------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
```

### Skyline-pruning

Skyline-pruning is a heuristic filtering rule for indexes. To judge an index, the following three dimensions are needed:
Skyline-pruning is a heuristic filtering rule for indexes, which can reduce the probability of wrong index selection caused by wrong estimation. To judge an index, the following three dimensions are needed:

TomShawn marked this conversation as resolved.
Show resolved Hide resolved
- Whether it needs to retrieve rows from a table when you select the index to access the table (that is, the plan generated by the index is IndexReader operator or IndexLookupReader operator). Indexes that do not retrieve rows from a table are better on this dimension than indexes that do.
- Whether it needs to retrieve rows from a table when you select the index to access the table (that is, the plan generated by the index is IndexReader operator or IndexLookupReader operator). Indexes that do not retrieve rows from a table are better on this dimension than indexes that do. If both indexes need to retrieve rows, compare how many filter conditions are covered by the indexed columns. Filter conditions mean the `where` condition that can be judged based on the index. If the column set of an index covers more access conditions, the smaller the number of retrieved rows from a table, and the better the index is in this dimension.
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

- Select whether the index satisfies a certain order. Because index reading can guarantee the order of certain column sets, indexes that satisfy the query order are superior to indexes that do not satisfy on this dimension.

- How many access conditions are covered by the indexed columns. An “access condition” is a where condition that can be converted to a column range. And the more access conditions an indexed column set covers, the better it is in this dimension.

For these three dimensions, if an index named idx_a is not worse than the index named idx_b in all three dimensions and one of the dimensions is better than idx_b, then idx_a is preferred.
- How many access conditions are covered by the indexed columns. An "access condition" is a where condition that can be converted to a column range. And the more access conditions an indexed column set covers, the better it is in this dimension.

Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved
For these three dimensions, if the index `idx_a` is not worse than the index `idx_b` in all three dimensions and one of the dimensions is better than `idx_b`, then `idx_a` is preferred. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if Skyline-pruning excludes some indexes, TiDB will output a NOTE level warning listing the reserved indexes after Skyline-pruning's exclusion.

In the following example, the index `idx_b` and `idx_e` are both inferior to `idx_b_c`, so they are excluded by Skyline-pruning. The returned result of `SHOW WARNING` displays the remaining indexes after Skyline-pruning.

```sql
mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, d INT, e INT, INDEX idx_b(b), INDEX idx_b_c(b, c), INDEX idx_e(e));
Query OK, 0 rows affected (0.01 sec)
mysql> EXPLAIN FORMAT = 'verbose' SELECT * FROM t WHERE b = 2 AND c > 4;
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved
+-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+
| id | estRows | estCost | task | access object | operator info |
+-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+
| IndexLookUp_10 | 33.33 | 738.29 | root | | |
| ├─IndexRangeScan_8(Build) | 33.33 | 2370.00 | cop[tikv] | table:t, index:idx_b_c(b, c) | range:(2 4,2 +inf], keep order:false, stats:pseudo |
| └─TableRowIDScan_9(Probe) | 33.33 | 2370.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved
+-------+------+------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+------------------------------------------------------------------------------------------+
| Note | 1105 | [t,idx_b_c] remain after pruning paths for t given Prop{SortItems: [], TaskTp: rootTask} |
+-------+------+------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
```

### Selection based on cost estimation
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

After using the skyline-pruning rule to rule out inappropriate indexes, the selection of indexes is based entirely on the cost estimation. The cost estimation of accessing tables requires the following considerations:
After using the Skyline-pruning rule to rule out inappropriate indexes, the selection of indexes is based entirely on the cost estimation. The cost estimation of accessing tables requires the following considerations:

- The average length of each row of the indexed data in the storage engine.
- The number of rows in the query range generated by the index.
Expand Down