Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update description on rule based index selection #6815

Merged
merged 15 commits into from
Aug 12, 2021
72 changes: 66 additions & 6 deletions choose-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,19 +29,79 @@ aliases: ['/docs-cn/dev/choose-index/']

## 索引的选择

TiDB 在选择索引时,会基于每个读表算子的代价估算,在此基础上提供了启发式规则 "Skyline-Pruning",以降低错误估算导致选错索引的概率
TiDB 有基于规则和基于代价两种索引选择的方式。基于规则的索引选择包括前置规则和 Skyline-Pruning。在选择索引时,TiDB 会先尝试前置规则。如果存在索引满足某一条前置规则,则直接选择该索引。否则,TiDB 会采用 Skyline-Pruning 排除不合适的索引,然后基于每个读表算子的代价估算,选择代价最小的索引
xuyifangreeneyes marked this conversation as resolved.
Show resolved Hide resolved

### Skyline-Pruning
### 基于规则选择

Skyline-Pruning 是一个针对索引的启发式过滤规则,评判一个索引的好坏需要从以下三个维度进行衡量:
#### 前置规则

- 选择该索引读表时,是否需要回表(即该索引生成的计划是 IndexReader 还是 IndexLookupReader)。不用回表的索引在这个维度上优于需要回表的索引。
TiDB 采用如下的启发式前置规则来选择索引:

- 选择该索引是否能满足一定的顺序。因为索引的读取可以保证某些列集合的顺序,所以满足查询要求顺序的索引在这个维度上优于不满足的索引。
1. 存在索引满足“唯一性索引全匹配 + 不需要回表(即该索引生成的计划是 IndexReader)”时,直接选择该索引。
xuyifangreeneyes marked this conversation as resolved.
Show resolved Hide resolved

2. 存在索引满足“唯一性索引 + 需要回表(即该索引生成的计划是 IndexLookupReader)”时,选择满足条件的回表行数最小的索引作为候选索引。
xuyifangreeneyes marked this conversation as resolved.
Show resolved Hide resolved

3. 存在索引满足“普通索引不需要回表 + 读取行数小于一定阈值”时,选择满足条件的读取行数最小的索引作为候选索引。
xuyifangreeneyes marked this conversation as resolved.
Show resolved Hide resolved

4. 如果规则 2 和 3 仅有一条选出候选索引,则选择该候选索引。如果规则 2 和 3 均选出候选索引,选择读取行数(读索引行数 + 回表行数)较小的索引。
xuyifangreeneyes marked this conversation as resolved.
Show resolved Hide resolved

上述规则中的“索引全匹配”指每个索引列上均存在等值条件。在执行 `EXPLAIN FORMAT = 'verbose' ...` 语句时,如果前置规则匹配了某一索引, TiDB 会输出一条 NOTE 级别的 warning 提示该索引匹配了前置规则。在以下示例中,因为索引 `idx_b` 满足规则 2 中“唯一性索引 + 需要回表”的条件,TiDB 选择索引 `idx_b` 作为访问路径,`SHOW WARNING` 给出了索引 `idx_b` 命中前置规则的提示。
xuyifangreeneyes marked this conversation as resolved.
Show resolved Hide resolved

```sql
mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, UNIQUE INDEX idx_b(b));
Query OK, 0 rows affected (0.01 sec)

mysql> EXPLAIN FORMAT = 'verbose' SELECT b, c FROM t WHERE b = 3 OR b = 6;
+-------------------+---------+---------+------+-------------------------+------------------------------+
| id | estRows | estCost | task | access object | operator info |
+-------------------+---------+---------+------+-------------------------+------------------------------+
| Batch_Point_Get_5 | 2.00 | 8.80 | root | table:t, index:idx_b(b) | keep order:false, desc:false |
+-------------------+---------+---------+------+-------------------------+------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+-------+------+-------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------+
| Note | 1105 | unique index idx_b of t is selected since the path only has point ranges with double scan |
+-------+------+-------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
```

#### Skyline-Pruning

Skyline-Pruning 是一个针对索引的启发式过滤规则,能降低错误估算导致选错索引的概率。 Skyline-Pruning 会从以下三个维度衡量一个索引的好坏:
xuyifangreeneyes marked this conversation as resolved.
Show resolved Hide resolved

- 索引的列涵盖了多少访问条件。“访问条件”指的是可以转化为某列范围的 `where` 条件,如果某个索引的列集合涵盖的访问条件越多,那么它在这个维度上更优。

对于这三种维度,如果某个索引 `idx_a` 在**三个维度上都不比 `idx_b` 差**,且**有一个维度比 `idx_b` 好**,那么就会优先选择 `idx_a`。
- 选择该索引读表时,是否需要回表(即该索引生成的计划是 IndexReader 还是 IndexLookupReader)。不用回表的索引在这个维度上优于需要回表的索引。如果均需要回表,则比较索引的列涵盖了多少过滤条件。过滤条件指的是可以根据索引判断的 `where` 条件。如果某个索引的列集合涵盖的访问条件越多,则回表数量越少,那么它在这个维度上越优。
Comment on lines 77 to +79
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“访问条件”的双引号需要去掉吗


- 选择该索引是否能满足一定的顺序。因为索引的读取可以保证某些列集合的顺序,所以满足查询要求顺序的索引在这个维度上优于不满足的索引。
TomShawn marked this conversation as resolved.
Show resolved Hide resolved

对于这三种维度,如果某个索引 `idx_a` 在**三个维度上都不比 `idx_b` 差**,且**有一个维度比 `idx_b` 好**,那么就会优先选择 `idx_a`。在执行 `EXPLAIN FORMAT = 'verbose' ...` 语句时,如果 Skyline-Pruning 排除了某些索引, TiDB 会输出一条 NOTE 级别的 warning 提示哪些索引在 Skyline-Pruning 之后保留下来。在以下示例中,索引 `idx_b` 和 `idx_e` 均劣于 `idx_b_c`,因而被 Skyline-Pruning 排除,`SHOW WARNING` 显示了经过 Skyline-Pruning 后剩余的索引。
xuyifangreeneyes marked this conversation as resolved.
Show resolved Hide resolved

```sql
mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, d INT, e INT, INDEX idx_b(b), INDEX idx_b_c(b, c), INDEX idx_e(e));
Query OK, 0 rows affected (0.01 sec)

mysql> EXPLAIN FORMAT = 'verbose' SELECT * FROM t WHERE b = 2 AND c > 4;
+-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+
| id | estRows | estCost | task | access object | operator info |
+-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+
| IndexLookUp_10 | 33.33 | 738.29 | root | | |
| ├─IndexRangeScan_8(Build) | 33.33 | 2370.00 | cop[tikv] | table:t, index:idx_b_c(b, c) | range:(2 4,2 +inf], keep order:false, stats:pseudo |
| └─TableRowIDScan_9(Probe) | 33.33 | 2370.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+
3 rows in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+-------+------+------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+------------------------------------------------------------------------------------------+
| Note | 1105 | [t,idx_b_c] remain after pruning paths for t given Prop{SortItems: [], TaskTp: rootTask} |
+-------+------+------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
```

### 基于代价选择

Expand Down