From a57528011c52e6e4fc8bb9824ee0d85faef60c8e Mon Sep 17 00:00:00 2001 From: Xiaozhen Liu Date: Fri, 13 Aug 2021 14:57:43 +0800 Subject: [PATCH 1/9] Update description on rule based index selection --- choose-index.md | 72 ++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 66 insertions(+), 6 deletions(-) diff --git a/choose-index.md b/choose-index.md index b00824139bba4..3e5a78c598556 100644 --- a/choose-index.md +++ b/choose-index.md @@ -29,19 +29,79 @@ Before introducing index selection, it is important to understand the ways TiDB ## Index selection rules -TiDB provides a heuristic rule named skyline-pruning based on the cost estimation of each operator for accessing tables. It can reduce the probability of wrong index selection caused by wrong estimation. +TiDB selects indexes based on rules or cost. the based rules include pre rules and Skyline-pruning. When selecting an index, TiDB tries the pre rule first. If the existing index satisfies a pre rule, TiDB will directly select the index. Otherwise, TiDB will use Skyline-Pruning to exclude unqualified indexes, and then based on the cost estimation of each operator for accessing tables, select the index with the lowest cost. + +### Selection based on rules + +#### Pre rules + +TiDB uses the following heuristic pre rules to select index: + ++ rule 1: If the existing index satisfies "unique indexes with full match + no need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", directly select this index. + ++ rule 2: If the existing index satisfies "unique indexes + need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", select the index with the smallest number of retrieve rows from a table as the candidate index. + ++ rule 3: If the existing index satisfies "common indexes + no need to retrieve rows from a table + the number of rows to be read is less than the value of a certain threshold", select the index with the smallest number of rows to be read as the candidate index. + ++ rule 4: If only one candidate index is selected based on rule 2 and 3, select this index. If two candidate index is separately selected based on rule 2 and 3, select the index with the smallest number of rows to be read (the number of rows with index + the number of rows to be retrieved from a table). + +The "unique indexes with full match" in rule 1 means each indexed column has the equivalent qualification. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if the pre rules matches an index, TiDB will output a NOTE level warning indicating that the index matches the pre rule. + +In the following example, because the index `idx_b` meets the condition "unique indexes + need to retrieve rows from a table" in rule 2, TiDB selects the index `idx_b` as the access path, and `SHOW WARNING` returns a note indicating that the index `idx_b` matches the pre rule. + +```sql +mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, UNIQUE INDEX idx_b(b)); +Query OK, 0 rows affected (0.01 sec) +mysql> EXPLAIN FORMAT = 'verbose' SELECT b, c FROM t WHERE b = 3 OR b = 6; ++-------------------+---------+---------+------+-------------------------+------------------------------+ +| id | estRows | estCost | task | access object | operator info | ++-------------------+---------+---------+------+-------------------------+------------------------------+ +| Batch_Point_Get_5 | 2.00 | 8.80 | root | table:t, index:idx_b(b) | keep order:false, desc:false | ++-------------------+---------+---------+------+-------------------------+------------------------------+ +1 row in set, 1 warning (0.00 sec) +mysql> SHOW WARNINGS; ++-------+------+-------------------------------------------------------------------------------------------+ +| Level | Code | Message | ++-------+------+-------------------------------------------------------------------------------------------+ +| Note | 1105 | unique index idx_b of t is selected since the path only has point ranges with double scan | ++-------+------+-------------------------------------------------------------------------------------------+ +1 row in set (0.00 sec) +``` ### Skyline-pruning -Skyline-pruning is a heuristic filtering rule for indexes. To judge an index, the following three dimensions are needed: +Skyline-pruning is a heuristic filtering rule for indexes, which can reduce the probability of wrong index selection caused by wrong estimation. To judge an index, the following three dimensions are needed: -- Whether it needs to retrieve rows from a table when you select the index to access the table (that is, the plan generated by the index is IndexReader operator or IndexLookupReader operator). Indexes that do not retrieve rows from a table are better on this dimension than indexes that do. +- Whether it needs to retrieve rows from a table when you select the index to access the table (that is, the plan generated by the index is IndexReader operator or IndexLookupReader operator). Indexes that do not retrieve rows from a table are better on this dimension than indexes that do. If both indexes need to retrieve rows, compare how many filter conditions are covered by the indexed columns. Filter conditions mean the `where` condition that can be judged based on the index. If the column set of an index covers more access conditions, the smaller the number of retrieved rows from a table, and the better the index is in this dimension. - Select whether the index satisfies a certain order. Because index reading can guarantee the order of certain column sets, indexes that satisfy the query order are superior to indexes that do not satisfy on this dimension. -- How many access conditions are covered by the indexed columns. An “access condition” is a where condition that can be converted to a column range. And the more access conditions an indexed column set covers, the better it is in this dimension. - -For these three dimensions, if an index named idx_a is not worse than the index named idx_b in all three dimensions and one of the dimensions is better than idx_b, then idx_a is preferred. +- How many access conditions are covered by the indexed columns. An "access condition" is a where condition that can be converted to a column range. And the more access conditions an indexed column set covers, the better it is in this dimension. + +For these three dimensions, if the index `idx_a` is not worse than the index `idx_b` in all three dimensions and one of the dimensions is better than `idx_b`, then `idx_a` is preferred. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if Skyline-pruning excludes some indexes, TiDB will output a NOTE level warning listing the reserved indexes after Skyline-pruning's exclusion. + +In the following example, the index `idx_b` and `idx_e` are both inferior to `idx_b_c`, so they are excluded by Skyline-Pruning. The returned result of `SHOW WARNING` displays the remaining indexes after Skyline pruning. + +```sql +mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, d INT, e INT, INDEX idx_b(b), INDEX idx_b_c(b, c), INDEX idx_e(e)); +Query OK, 0 rows affected (0.01 sec) +mysql> EXPLAIN FORMAT = 'verbose' SELECT * FROM t WHERE b = 2 AND c > 4; ++-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+ +| id | estRows | estCost | task | access object | operator info | ++-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+ +| IndexLookUp_10 | 33.33 | 738.29 | root | | | +| ├─IndexRangeScan_8(Build) | 33.33 | 2370.00 | cop[tikv] | table:t, index:idx_b_c(b, c) | range:(2 4,2 +inf], keep order:false, stats:pseudo | +| └─TableRowIDScan_9(Probe) | 33.33 | 2370.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+ +3 rows in set, 1 warning (0.00 sec) +mysql> SHOW WARNINGS; ++-------+------+------------------------------------------------------------------------------------------+ +| Level | Code | Message | ++-------+------+------------------------------------------------------------------------------------------+ +| Note | 1105 | [t,idx_b_c] remain after pruning paths for t given Prop{SortItems: [], TaskTp: rootTask} | ++-------+------+------------------------------------------------------------------------------------------+ +1 row in set (0.00 sec) +``` ### Selection based on cost estimation From 56651765f33edbdd612dc288e574c67cee6db748 Mon Sep 17 00:00:00 2001 From: Xiaozhen Liu Date: Wed, 18 Aug 2021 10:43:42 +0800 Subject: [PATCH 2/9] fix improper expression --- choose-index.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/choose-index.md b/choose-index.md index 3e5a78c598556..e473c14926bb3 100644 --- a/choose-index.md +++ b/choose-index.md @@ -29,7 +29,7 @@ Before introducing index selection, it is important to understand the ways TiDB ## Index selection rules -TiDB selects indexes based on rules or cost. the based rules include pre rules and Skyline-pruning. When selecting an index, TiDB tries the pre rule first. If the existing index satisfies a pre rule, TiDB will directly select the index. Otherwise, TiDB will use Skyline-Pruning to exclude unqualified indexes, and then based on the cost estimation of each operator for accessing tables, select the index with the lowest cost. +TiDB selects indexes based on rules or cost. the based rules include pre rules and Skyline-pruning. When selecting an index, TiDB tries the pre rule first. If an index satisfies a pre rule, TiDB will directly select the index. Otherwise, TiDB will use Skyline-Pruning to exclude unqualified indexes, and then based on the cost estimation of each operator for accessing tables, select the index with the lowest cost. ### Selection based on rules @@ -37,11 +37,11 @@ TiDB selects indexes based on rules or cost. the based rules include pre rules a TiDB uses the following heuristic pre rules to select index: -+ rule 1: If the existing index satisfies "unique indexes with full match + no need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", directly select this index. ++ rule 1: If an index satisfies "unique indexes with full match + no need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", directly select this index. -+ rule 2: If the existing index satisfies "unique indexes + need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", select the index with the smallest number of retrieve rows from a table as the candidate index. ++ rule 2: If an index satisfies "unique indexes + need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", select the index with the smallest number of retrieve rows from a table as the candidate index. -+ rule 3: If the existing index satisfies "common indexes + no need to retrieve rows from a table + the number of rows to be read is less than the value of a certain threshold", select the index with the smallest number of rows to be read as the candidate index. ++ rule 3: If an index satisfies "common indexes + no need to retrieve rows from a table + the number of rows to be read is less than the value of a certain threshold", select the index with the smallest number of rows to be read as the candidate index. + rule 4: If only one candidate index is selected based on rule 2 and 3, select this index. If two candidate index is separately selected based on rule 2 and 3, select the index with the smallest number of rows to be read (the number of rows with index + the number of rows to be retrieved from a table). From 6eb3baaeeb8c300d3ce32217aca393adb361f40c Mon Sep 17 00:00:00 2001 From: Xiaozhen Liu Date: Wed, 18 Aug 2021 10:49:27 +0800 Subject: [PATCH 3/9] rows to be retrieved --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index e473c14926bb3..23dc980f6a3c2 100644 --- a/choose-index.md +++ b/choose-index.md @@ -39,7 +39,7 @@ TiDB uses the following heuristic pre rules to select index: + rule 1: If an index satisfies "unique indexes with full match + no need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", directly select this index. -+ rule 2: If an index satisfies "unique indexes + need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", select the index with the smallest number of retrieve rows from a table as the candidate index. ++ rule 2: If an index satisfies "unique indexes + need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", select the index with the smallest number of rows to be retrieved from a table as the candidate index. + rule 3: If an index satisfies "common indexes + no need to retrieve rows from a table + the number of rows to be read is less than the value of a certain threshold", select the index with the smallest number of rows to be read as the candidate index. From ff5bc24bae4a5739cd0b3393e1c08f4ce92f65cb Mon Sep 17 00:00:00 2001 From: Xiaozhen Liu Date: Wed, 18 Aug 2021 11:04:53 +0800 Subject: [PATCH 4/9] fix typo --- choose-index.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/choose-index.md b/choose-index.md index 23dc980f6a3c2..f96d0339f7ded 100644 --- a/choose-index.md +++ b/choose-index.md @@ -29,25 +29,25 @@ Before introducing index selection, it is important to understand the ways TiDB ## Index selection rules -TiDB selects indexes based on rules or cost. the based rules include pre rules and Skyline-pruning. When selecting an index, TiDB tries the pre rule first. If an index satisfies a pre rule, TiDB will directly select the index. Otherwise, TiDB will use Skyline-Pruning to exclude unqualified indexes, and then based on the cost estimation of each operator for accessing tables, select the index with the lowest cost. +TiDB selects indexes based on rules or cost. the based rules include pre rules and Skyline-pruning. When selecting an index, TiDB tries the pre rule first. If an index satisfies a pre rule, TiDB directly selects the index. Otherwise, TiDB uses Skyline-pruning to exclude unqualified indexes, and then based on the cost estimation of each operator for accessing tables, selects the index with the lowest cost. ### Selection based on rules #### Pre rules -TiDB uses the following heuristic pre rules to select index: +TiDB uses the following heuristic pre rules to select indexes: -+ rule 1: If an index satisfies "unique indexes with full match + no need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", directly select this index. ++ rule 1: If an index satisfies "unique index with full match + no need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", directly select this index. -+ rule 2: If an index satisfies "unique indexes + need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", select the index with the smallest number of rows to be retrieved from a table as the candidate index. ++ rule 2: If an index satisfies "unique index + need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", select the index with the smallest number of rows to be retrieved from a table as the candidate index. -+ rule 3: If an index satisfies "common indexes + no need to retrieve rows from a table + the number of rows to be read is less than the value of a certain threshold", select the index with the smallest number of rows to be read as the candidate index. ++ rule 3: If an index satisfies "common index + no need to retrieve rows from a table + the number of rows to be read is less than the value of a certain threshold", select the index with the smallest number of rows to be read as the candidate index. -+ rule 4: If only one candidate index is selected based on rule 2 and 3, select this index. If two candidate index is separately selected based on rule 2 and 3, select the index with the smallest number of rows to be read (the number of rows with index + the number of rows to be retrieved from a table). ++ rule 4: If only one candidate index is selected based on rule 2 and 3, select this index. If two candidate indexes are separately selected based on rule 2 and 3, select the index with the smaller number of rows to be read (the number of rows with index + the number of rows to be retrieved from a table). The "unique indexes with full match" in rule 1 means each indexed column has the equivalent qualification. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if the pre rules matches an index, TiDB will output a NOTE level warning indicating that the index matches the pre rule. -In the following example, because the index `idx_b` meets the condition "unique indexes + need to retrieve rows from a table" in rule 2, TiDB selects the index `idx_b` as the access path, and `SHOW WARNING` returns a note indicating that the index `idx_b` matches the pre rule. +In the following example, because the index `idx_b` meets the condition "unique index + need to retrieve rows from a table" in rule 2, TiDB selects the index `idx_b` as the access path, and `SHOW WARNING` returns a note indicating that the index `idx_b` matches the pre rule. ```sql mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, UNIQUE INDEX idx_b(b)); @@ -80,7 +80,7 @@ Skyline-pruning is a heuristic filtering rule for indexes, which can reduce the For these three dimensions, if the index `idx_a` is not worse than the index `idx_b` in all three dimensions and one of the dimensions is better than `idx_b`, then `idx_a` is preferred. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if Skyline-pruning excludes some indexes, TiDB will output a NOTE level warning listing the reserved indexes after Skyline-pruning's exclusion. -In the following example, the index `idx_b` and `idx_e` are both inferior to `idx_b_c`, so they are excluded by Skyline-Pruning. The returned result of `SHOW WARNING` displays the remaining indexes after Skyline pruning. +In the following example, the index `idx_b` and `idx_e` are both inferior to `idx_b_c`, so they are excluded by Skyline-pruning. The returned result of `SHOW WARNING` displays the remaining indexes after Skyline-pruning. ```sql mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, d INT, e INT, INDEX idx_b(b), INDEX idx_b_c(b, c), INDEX idx_e(e)); @@ -105,7 +105,7 @@ mysql> SHOW WARNINGS; ### Selection based on cost estimation -After using the skyline-pruning rule to rule out inappropriate indexes, the selection of indexes is based entirely on the cost estimation. The cost estimation of accessing tables requires the following considerations: +After using the Skyline-pruning rule to rule out inappropriate indexes, the selection of indexes is based entirely on the cost estimation. The cost estimation of accessing tables requires the following considerations: - The average length of each row of the indexed data in the storage engine. - The number of rows in the query range generated by the index. From 6f02adbdb0c0a60591d1c159f850713352710a69 Mon Sep 17 00:00:00 2001 From: Xiaozhen Liu Date: Wed, 18 Aug 2021 11:17:36 +0800 Subject: [PATCH 5/9] address comments --- choose-index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/choose-index.md b/choose-index.md index f96d0339f7ded..6f37462e007be 100644 --- a/choose-index.md +++ b/choose-index.md @@ -45,7 +45,7 @@ TiDB uses the following heuristic pre rules to select indexes: + rule 4: If only one candidate index is selected based on rule 2 and 3, select this index. If two candidate indexes are separately selected based on rule 2 and 3, select the index with the smaller number of rows to be read (the number of rows with index + the number of rows to be retrieved from a table). -The "unique indexes with full match" in rule 1 means each indexed column has the equivalent qualification. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if the pre rules matches an index, TiDB will output a NOTE level warning indicating that the index matches the pre rule. +The "index with full match" in the above rules means each indexed column has the equal condition. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if the pre rules matches an index, TiDB will output a NOTE level warning indicating that the index matches the pre rule. In the following example, because the index `idx_b` meets the condition "unique index + need to retrieve rows from a table" in rule 2, TiDB selects the index `idx_b` as the access path, and `SHOW WARNING` returns a note indicating that the index `idx_b` matches the pre rule. From 40a51d14d5d26fe58f28df41c5497e8902a58888 Mon Sep 17 00:00:00 2001 From: Xiaozhen Liu Date: Wed, 18 Aug 2021 16:34:30 +0800 Subject: [PATCH 6/9] address comments --- choose-index.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/choose-index.md b/choose-index.md index 6f37462e007be..34a1a7388544c 100644 --- a/choose-index.md +++ b/choose-index.md @@ -29,7 +29,7 @@ Before introducing index selection, it is important to understand the ways TiDB ## Index selection rules -TiDB selects indexes based on rules or cost. the based rules include pre rules and Skyline-pruning. When selecting an index, TiDB tries the pre rule first. If an index satisfies a pre rule, TiDB directly selects the index. Otherwise, TiDB uses Skyline-pruning to exclude unqualified indexes, and then based on the cost estimation of each operator for accessing tables, selects the index with the lowest cost. +TiDB selects indexes based on rules or cost. The based rules include pre rules and skyline-pruning. When selecting an index, TiDB tries the pre rule first. If an index satisfies a pre rule, TiDB directly selects the index. Otherwise, TiDB uses skyline-pruning to exclude unsuitable indexes, and then based on the cost estimation of each operator for accessing tables, selects the index with the lowest cost. ### Selection based on rules @@ -78,9 +78,9 @@ Skyline-pruning is a heuristic filtering rule for indexes, which can reduce the - How many access conditions are covered by the indexed columns. An "access condition" is a where condition that can be converted to a column range. And the more access conditions an indexed column set covers, the better it is in this dimension. -For these three dimensions, if the index `idx_a` is not worse than the index `idx_b` in all three dimensions and one of the dimensions is better than `idx_b`, then `idx_a` is preferred. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if Skyline-pruning excludes some indexes, TiDB will output a NOTE level warning listing the reserved indexes after Skyline-pruning's exclusion. +For these three dimensions, if the index `idx_a` is not worse than the index `idx_b` in all three dimensions and one of the dimensions is better than `idx_b`, then `idx_a` is preferred. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if skyline-pruning excludes some indexes, TiDB will output a NOTE level warning listing the reserved indexes after skyline-pruning's exclusion. -In the following example, the index `idx_b` and `idx_e` are both inferior to `idx_b_c`, so they are excluded by Skyline-pruning. The returned result of `SHOW WARNING` displays the remaining indexes after Skyline-pruning. +In the following example, the index `idx_b` and `idx_e` are both inferior to `idx_b_c`, so they are excluded by skyline-pruning. The returned result of `SHOW WARNING` displays the remaining indexes after skyline-pruning. ```sql mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, d INT, e INT, INDEX idx_b(b), INDEX idx_b_c(b, c), INDEX idx_e(e)); @@ -105,7 +105,7 @@ mysql> SHOW WARNINGS; ### Selection based on cost estimation -After using the Skyline-pruning rule to rule out inappropriate indexes, the selection of indexes is based entirely on the cost estimation. The cost estimation of accessing tables requires the following considerations: +After using the skyline-pruning rule to rule out inappropriate indexes, the selection of indexes is based entirely on the cost estimation. The cost estimation of accessing tables requires the following considerations: - The average length of each row of the indexed data in the storage engine. - The number of rows in the query range generated by the index. From 84c19abb5613e23d28b8f7d08a92a81fcd121c5e Mon Sep 17 00:00:00 2001 From: Liuxiaozhen12 <82579298+Liuxiaozhen12@users.noreply.github.com> Date: Thu, 19 Aug 2021 17:59:51 +0800 Subject: [PATCH 7/9] Apply suggestions from code review Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> --- choose-index.md | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/choose-index.md b/choose-index.md index 34a1a7388544c..e74ba9e0dee0d 100644 --- a/choose-index.md +++ b/choose-index.md @@ -29,29 +29,30 @@ Before introducing index selection, it is important to understand the ways TiDB ## Index selection rules -TiDB selects indexes based on rules or cost. The based rules include pre rules and skyline-pruning. When selecting an index, TiDB tries the pre rule first. If an index satisfies a pre rule, TiDB directly selects the index. Otherwise, TiDB uses skyline-pruning to exclude unsuitable indexes, and then based on the cost estimation of each operator for accessing tables, selects the index with the lowest cost. +TiDB selects indexes based on rules or cost. The based rules include pre-rules and skyline-pruning. When selecting an index, TiDB tries the pre-rule first. If an index satisfies a pre-rule, TiDB directly selects this index. Otherwise, TiDB uses skyline-pruning to exclude unsuitable indexes, and then selects the index with the lowest cost based on the cost estimation of each operator that accesses tables. -### Selection based on rules +### Rule-based selection -#### Pre rules +#### Pre-rules TiDB uses the following heuristic pre rules to select indexes: -+ rule 1: If an index satisfies "unique index with full match + no need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", directly select this index. ++ Rule 1: If an index satisfies "unique index with full match + no need to retrieve rows from a table (which means that the plan generated by the index is the IndexReader operator)", TiDB directly selects this index. -+ rule 2: If an index satisfies "unique index + need to retrieve rows from a table (which means the plan generated by the index is IndexReader operator)", select the index with the smallest number of rows to be retrieved from a table as the candidate index. ++ Rule 2: If an index satisfies "unique index + the need to retrieve rows from a table (which means that the plan generated by the index is the IndexReader operator)", TiDB selects the index with the smallest number of rows to be retrieved from a table as a candidate index. -+ rule 3: If an index satisfies "common index + no need to retrieve rows from a table + the number of rows to be read is less than the value of a certain threshold", select the index with the smallest number of rows to be read as the candidate index. ++ Rule 3: If an index satisfies "ordinary index + no need to retrieve rows from a table + the number of rows to be read is less than the value of a certain threshold", TiDB selects the index with the smallest number of rows to be read as a candidate index. -+ rule 4: If only one candidate index is selected based on rule 2 and 3, select this index. If two candidate indexes are separately selected based on rule 2 and 3, select the index with the smaller number of rows to be read (the number of rows with index + the number of rows to be retrieved from a table). ++ Rule 4: If only one candidate index is selected based on rule 2 and 3, select this candidate index. If two candidate indexes are respectively selected based on rule 2 and 3, select the index with the smaller number of rows to be read (the number of rows with index + the number of rows to be retrieved from a table). -The "index with full match" in the above rules means each indexed column has the equal condition. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if the pre rules matches an index, TiDB will output a NOTE level warning indicating that the index matches the pre rule. +The "index with full match" in the above rules means each indexed column has the equal condition. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if the pre-rules match an index, TiDB outputs a NOTE-level warning indicating that the index matches the pre-rule. -In the following example, because the index `idx_b` meets the condition "unique index + need to retrieve rows from a table" in rule 2, TiDB selects the index `idx_b` as the access path, and `SHOW WARNING` returns a note indicating that the index `idx_b` matches the pre rule. +In the following example, because the index `idx_b` meets the condition "unique index + the need to retrieve rows from a table" in rule 2, TiDB selects the index `idx_b` as the access path, and `SHOW WARNING` returns a note indicating that the index `idx_b` matches the pre-rule. ```sql mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, UNIQUE INDEX idx_b(b)); Query OK, 0 rows affected (0.01 sec) + mysql> EXPLAIN FORMAT = 'verbose' SELECT b, c FROM t WHERE b = 3 OR b = 6; +-------------------+---------+---------+------+-------------------------+------------------------------+ | id | estRows | estCost | task | access object | operator info | @@ -59,6 +60,7 @@ mysql> EXPLAIN FORMAT = 'verbose' SELECT b, c FROM t WHERE b = 3 OR b = 6; | Batch_Point_Get_5 | 2.00 | 8.80 | root | table:t, index:idx_b(b) | keep order:false, desc:false | +-------------------+---------+---------+------+-------------------------+------------------------------+ 1 row in set, 1 warning (0.00 sec) + mysql> SHOW WARNINGS; +-------+------+-------------------------------------------------------------------------------------------+ | Level | Code | Message | @@ -72,19 +74,18 @@ mysql> SHOW WARNINGS; Skyline-pruning is a heuristic filtering rule for indexes, which can reduce the probability of wrong index selection caused by wrong estimation. To judge an index, the following three dimensions are needed: -- Whether it needs to retrieve rows from a table when you select the index to access the table (that is, the plan generated by the index is IndexReader operator or IndexLookupReader operator). Indexes that do not retrieve rows from a table are better on this dimension than indexes that do. If both indexes need to retrieve rows, compare how many filter conditions are covered by the indexed columns. Filter conditions mean the `where` condition that can be judged based on the index. If the column set of an index covers more access conditions, the smaller the number of retrieved rows from a table, and the better the index is in this dimension. +- Whether it needs to retrieve rows from a table when you select the index to access the table (that is, the plan generated by the index is IndexReader operator or IndexLookupReader operator). Indexes that do not retrieve rows from a table are better on this dimension than indexes that do. If both indexes need TiDB to retrieve rows from the table, compare how many filtering conditions are covered by the indexed columns. Filtering conditions mean the `where` condition that can be judged based on the index. If the column set of an index covers more access conditions, the smaller the number of retrieved rows from a table, and the better the index is in this dimension. - Select whether the index satisfies a certain order. Because index reading can guarantee the order of certain column sets, indexes that satisfy the query order are superior to indexes that do not satisfy on this dimension. -- How many access conditions are covered by the indexed columns. An "access condition" is a where condition that can be converted to a column range. And the more access conditions an indexed column set covers, the better it is in this dimension. - -For these three dimensions, if the index `idx_a` is not worse than the index `idx_b` in all three dimensions and one of the dimensions is better than `idx_b`, then `idx_a` is preferred. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if skyline-pruning excludes some indexes, TiDB will output a NOTE level warning listing the reserved indexes after skyline-pruning's exclusion. +For these three dimensions above, if the index `idx_a` performs no worse than the index `idx_b` in all three dimensions and performs better than `idx_b` in one dimension, then `idx_a` is preferred. When executing the `EXPLAIN FORMAT = 'verbose' ...` statement, if skyline-pruning excludes some indexes, TiDB outputs a NOTE-level warning listing the remaining indexes after the skyline-pruning exclusion. -In the following example, the index `idx_b` and `idx_e` are both inferior to `idx_b_c`, so they are excluded by skyline-pruning. The returned result of `SHOW WARNING` displays the remaining indexes after skyline-pruning. +In the following example, the indexes `idx_b` and `idx_e` are both inferior to `idx_b_c`, so they are excluded by skyline-pruning. The returned result of `SHOW WARNING` displays the remaining indexes after skyline-pruning. ```sql mysql> CREATE TABLE t(a INT PRIMARY KEY, b INT, c INT, d INT, e INT, INDEX idx_b(b), INDEX idx_b_c(b, c), INDEX idx_e(e)); Query OK, 0 rows affected (0.01 sec) + mysql> EXPLAIN FORMAT = 'verbose' SELECT * FROM t WHERE b = 2 AND c > 4; +-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+ | id | estRows | estCost | task | access object | operator info | @@ -94,6 +95,7 @@ mysql> EXPLAIN FORMAT = 'verbose' SELECT * FROM t WHERE b = 2 AND c > 4; | └─TableRowIDScan_9(Probe) | 33.33 | 2370.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | +-------------------------------+---------+---------+-----------+------------------------------+----------------------------------------------------+ 3 rows in set, 1 warning (0.00 sec) + mysql> SHOW WARNINGS; +-------+------+------------------------------------------------------------------------------------------+ | Level | Code | Message | From 27ea66c93b7c3abfd8b4c8b93e74f5db24ae5663 Mon Sep 17 00:00:00 2001 From: Liuxiaozhen12 <82579298+Liuxiaozhen12@users.noreply.github.com> Date: Fri, 20 Aug 2021 15:08:52 +0800 Subject: [PATCH 8/9] Apply suggestions from code review Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> --- choose-index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/choose-index.md b/choose-index.md index e74ba9e0dee0d..108c6f2efa9bd 100644 --- a/choose-index.md +++ b/choose-index.md @@ -35,7 +35,7 @@ TiDB selects indexes based on rules or cost. The based rules include pre-rules a #### Pre-rules -TiDB uses the following heuristic pre rules to select indexes: +TiDB uses the following heuristic pre-rules to select indexes: + Rule 1: If an index satisfies "unique index with full match + no need to retrieve rows from a table (which means that the plan generated by the index is the IndexReader operator)", TiDB directly selects this index. @@ -105,7 +105,7 @@ mysql> SHOW WARNINGS; 1 row in set (0.00 sec) ``` -### Selection based on cost estimation +### Cost estimation-based selection After using the skyline-pruning rule to rule out inappropriate indexes, the selection of indexes is based entirely on the cost estimation. The cost estimation of accessing tables requires the following considerations: From d5545abe9f429a0110284ac55ff663c8e2d94420 Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Fri, 20 Aug 2021 15:10:57 +0800 Subject: [PATCH 9/9] Update choose-index.md --- choose-index.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/choose-index.md b/choose-index.md index 108c6f2efa9bd..b2ebd30f61146 100644 --- a/choose-index.md +++ b/choose-index.md @@ -74,6 +74,8 @@ mysql> SHOW WARNINGS; Skyline-pruning is a heuristic filtering rule for indexes, which can reduce the probability of wrong index selection caused by wrong estimation. To judge an index, the following three dimensions are needed: +- How many access conditions are covered by the indexed columns. An “access condition” is a where condition that can be converted to a column range. And the more access conditions an indexed column set covers, the better it is in this dimension. + - Whether it needs to retrieve rows from a table when you select the index to access the table (that is, the plan generated by the index is IndexReader operator or IndexLookupReader operator). Indexes that do not retrieve rows from a table are better on this dimension than indexes that do. If both indexes need TiDB to retrieve rows from the table, compare how many filtering conditions are covered by the indexed columns. Filtering conditions mean the `where` condition that can be judged based on the index. If the column set of an index covers more access conditions, the smaller the number of retrieved rows from a table, and the better the index is in this dimension. - Select whether the index satisfies a certain order. Because index reading can guarantee the order of certain column sets, indexes that satisfy the query order are superior to indexes that do not satisfy on this dimension.