Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update description on rule based index selection #6160

Merged
merged 9 commits into from
Aug 20, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 66 additions & 6 deletions choose-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,19 +29,79 @@ Before introducing index selection, it is important to understand the ways TiDB

## Index selection rules

TiDB provides a heuristic rule named skyline-pruning based on the cost estimation of each operator for accessing tables. It can reduce the probability of wrong index selection caused by wrong estimation.
TiDB selects indexes based on rules or cost. the based rules include pre rules and Skyline-pruning. When selecting an index, TiDB tries the pre rule first. If the existing index satisfies a pre rule, TiDB will directly select the index. Otherwise, TiDB will use Skyline-Pruning to exclude unqualified indexes, and then based on the cost estimation of each operator for accessing tables, select the index with the lowest cost.

### Selection based on rules
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

#### Pre rules
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

TiDB uses the following heuristic pre rules to select index:

+ rule 1: If the existing index satisfies "unique indexes with full match + no need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", directly select this index.
xuyifangreeneyes marked this conversation as resolved.
Show resolved Hide resolved

+ rule 2: If the existing index satisfies "unique indexes + need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", select the index with the smallest number of retrieve rows from a table as the candidate index.
xuyifangreeneyes marked this conversation as resolved.
Show resolved Hide resolved

+ rule 3: If the existing index satisfies "common indexes + no need to retrieve rows from a table + the number of rows to be read is less than the value of a certain threshold", select the index with the smallest number of rows to be read as the candidate index.

+ rule 4: If only one candidate index is selected based on rule 2 and 3, select this index. If two candidate index is separately selected based on rule 2 and 3, select the index with the smallest number of rows to be read (the number of rows with index + the number of rows to be retrieved from a table).
xuyifangreeneyes marked this conversation as resolved.
Show resolved Hide resolved

The "unique indexes with full match" in rule 1 means each indexed column has the equivalent qualification. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if the pre rules matches an index, TiDB will output a NOTE level warning indicating that the index matches the pre rule.
xuyifangreeneyes marked this conversation as resolved.
Show resolved Hide resolved

In the following example, because the index `idx_b` meets the condition "unique indexes + need to retrieve rows from a table" in rule 2, TiDB selects the index `idx_b` as the access path, and `SHOW WARNING` returns a note indicating that the index `idx_b` matches the pre rule.

```sql
mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, UNIQUE INDEX idx_b(b));
Query OK, 0 rows affected (0.01 sec)
mysql> EXPLAIN FORMAT = 'verbose' SELECT b, c FROM t WHERE b = 3 OR b = 6;
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved
+-------------------+---------+---------+------+-------------------------+------------------------------+
| id | estRows | estCost | task | access object | operator info |
+-------------------+---------+---------+------+-------------------------+------------------------------+
| Batch_Point_Get_5 | 2.00 | 8.80 | root | table:t, index:idx_b(b) | keep order:false, desc:false |
+-------------------+---------+---------+------+-------------------------+------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved
+-------+------+-------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------+
| Note | 1105 | unique index idx_b of t is selected since the path only has point ranges with double scan |
+-------+------+-------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
```

### Skyline-pruning

Skyline-pruning is a heuristic filtering rule for indexes. To judge an index, the following three dimensions are needed:
Skyline-pruning is a heuristic filtering rule for indexes, which can reduce the probability of wrong index selection caused by wrong estimation. To judge an index, the following three dimensions are needed:

TomShawn marked this conversation as resolved.
Show resolved Hide resolved
- Whether it needs to retrieve rows from a table when you select the index to access the table (that is, the plan generated by the index is IndexReader operator or IndexLookupReader operator). Indexes that do not retrieve rows from a table are better on this dimension than indexes that do.
- Whether it needs to retrieve rows from a table when you select the index to access the table (that is, the plan generated by the index is IndexReader operator or IndexLookupReader operator). Indexes that do not retrieve rows from a table are better on this dimension than indexes that do. If both indexes need to retrieve rows, compare how many filter conditions are covered by the indexed columns. Filter conditions mean the `where` condition that can be judged based on the index. If the column set of an index covers more access conditions, the smaller the number of retrieved rows from a table, and the better the index is in this dimension.
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

- Select whether the index satisfies a certain order. Because index reading can guarantee the order of certain column sets, indexes that satisfy the query order are superior to indexes that do not satisfy on this dimension.

- How many access conditions are covered by the indexed columns. An “access condition” is a where condition that can be converted to a column range. And the more access conditions an indexed column set covers, the better it is in this dimension.

For these three dimensions, if an index named idx_a is not worse than the index named idx_b in all three dimensions and one of the dimensions is better than idx_b, then idx_a is preferred.
- How many access conditions are covered by the indexed columns. An "access condition" is a where condition that can be converted to a column range. And the more access conditions an indexed column set covers, the better it is in this dimension.

Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved
For these three dimensions, if the index `idx_a` is not worse than the index `idx_b` in all three dimensions and one of the dimensions is better than `idx_b`, then `idx_a` is preferred. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if Skyline-pruning excludes some indexes, TiDB will output a NOTE level warning listing the reserved indexes after Skyline-pruning's exclusion.

In the following example, the index `idx_b` and `idx_e` are both inferior to `idx_b_c`, so they are excluded by Skyline-Pruning. The returned result of `SHOW WARNING` displays the remaining indexes after Skyline pruning.

```sql
mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, d INT, e INT, INDEX idx_b(b), INDEX idx_b_c(b, c), INDEX idx_e(e));
Query OK, 0 rows affected (0.01 sec)
mysql> EXPLAIN FORMAT = 'verbose' SELECT * FROM t WHERE b = 2 AND c > 4;
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved
+-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+
| id | estRows | estCost | task | access object | operator info |
+-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+
| IndexLookUp_10 | 33.33 | 738.29 | root | | |
| ├─IndexRangeScan_8(Build) | 33.33 | 2370.00 | cop[tikv] | table:t, index:idx_b_c(b, c) | range:(2 4,2 +inf], keep order:false, stats:pseudo |
| └─TableRowIDScan_9(Probe) | 33.33 | 2370.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved
+-------+------+------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+------------------------------------------------------------------------------------------+
| Note | 1105 | [t,idx_b_c] remain after pruning paths for t given Prop{SortItems: [], TaskTp: rootTask} |
+-------+------+------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
```

### Selection based on cost estimation
Liuxiaozhen12 marked this conversation as resolved.
Show resolved Hide resolved

Expand Down