Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer selects merge join over nested loop join incorrectly #8563

Closed
morgo opened this issue Dec 3, 2018 · 25 comments
Closed

Optimizer selects merge join over nested loop join incorrectly #8563

morgo opened this issue Dec 3, 2018 · 25 comments
Assignees

Comments

@morgo
Copy link
Contributor

morgo commented Dec 3, 2018

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
  1. What did you expect to see?

Because there are few a.b_id values that are NOT NULL, I would have expected a nested loop join. If I force one, the execution time is 13.55 sec:

tidb> EXPLAIN ANALYZE select /*+ TIDB_INLJ(a, b) */ count(*) from a inner join b on a.b_id = b.id;
+--------------------------+--------------+------+------------------------------------------------------------------------------+--------------------------------------------------+
| id                       | count        | task | operator info                                                                | execution info                                   |
+--------------------------+--------------+------+------------------------------------------------------------------------------+--------------------------------------------------+
| StreamAgg_11             | 1.00         | root | funcs:count(1)                                                               | time:13.551441368s, loops:2, rows:1              |
| └─IndexJoin_21           | 125440380.00 | root | inner join, inner:IndexReader_20, outer key:test.a.b_id, inner key:test.b.id | time:13.551430575s, loops:1, rows:0              |
|   ├─TableReader_23       | 100352304.00 | root | data:TableScan_22                                                            | time:4.427147678s, loops:3136012, rows:100352304 |
|   │ └─TableScan_22       | 100352304.00 | cop  | table:a, range:[-inf,+inf], keep order:false                                 | time:0s, loops:0, rows:100352304                 |
|   └─IndexReader_20       | 1.41         | root | index:IndexScan_19                                                           | time:202.838681ms, loops:4018, rows:0            |
|     └─IndexScan_19       | 1.41         | cop  | table:b, index:id, range: decided by [test.a.b_id], keep order:false         | time:0s, loops:0, rows:0                         |
+--------------------------+--------------+------+------------------------------------------------------------------------------+--------------------------------------------------+
6 rows in set (13.55 sec)
  1. What did you see instead?

A MergeJoin was selected at an execution time of 40.54 sec:

tidb> EXPLAIN ANALYZE select count(*) from a inner join b on a.b_id = b.id;
+--------------------------+--------------+------+--------------------------------------------------------------+--------------------------------------------------+
| id                       | count        | task | operator info                                                | execution info                                   |
+--------------------------+--------------+------+--------------------------------------------------------------+--------------------------------------------------+
| StreamAgg_11             | 1.00         | root | funcs:count(1)                                               | time:40.531643354s, loops:2, rows:1              |
| └─MergeJoin_236          | 125440380.00 | root | inner join, left key:test.a.b_id, right key:test.b.id        | time:40.531631488s, loops:2, rows:1              |
|   ├─IndexReader_77       | 100352304.00 | root | index:IndexScan_76                                           | time:5.491457598s, loops:3136012, rows:100352304 |
|   │ └─IndexScan_76       | 100352304.00 | cop  | table:a, index:b_id, c77, range:[NULL,+inf], keep order:true | time:0s, loops:0, rows:100352304                 |
|   └─IndexReader_231      | 95030682.00  | root | index:IndexScan_230                                          | time:12.138771394s, loops:2969648, rows:95028736 |
|     └─IndexScan_230      | 95030682.00  | cop  | table:b, index:id, range:[NULL,+inf], keep order:true        | time:0s, loops:0, rows:95030682                  |
+--------------------------+--------------+------+--------------------------------------------------------------+--------------------------------------------------+
6 rows in set (40.54 sec)
  1. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?
tidb> select tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: v2.1.0-rc.3-253-g4bcdb6d
Git Commit Hash: 4bcdb6d8ee7906947008603d95c543cd978b1106
Git Branch: master
UTC Build Time: 2018-11-29 06:18:39
GoVersion: go version go1.11.2 linux/amd64
Race Enabled: false
TiKV Min Version: 2.1.0-alpha.1-ff3dd160846b7d1aed9079c389fc188f7f5ea13e
Check Table Before Drop: false
1 row in set (0.00 sec)
@morgo morgo changed the title Optimiser choses merge join over nested loop join incorrectly Optimizer selects merge join over nested loop join incorrectly Dec 3, 2018
@zz-jason zz-jason self-assigned this Dec 4, 2018
@zz-jason zz-jason added the sig/planner SIG: Planner label Dec 4, 2018
@winoros
Copy link
Member

winoros commented Dec 4, 2018

Thanks @morgo .
Would you please upload the statistics information? You can use curl http://TIDB_IP:TIDB_STATUS_PORT/stats/dump/DATABASE_NAME/TABLE_NAME to get it.

@morgo
Copy link
Contributor Author

morgo commented Dec 4, 2018

The files have been uploaded here:
https://github.com/morgo/tidb-microbench/blob/master/100M-row-join/a.json
https://github.com/morgo/tidb-microbench/blob/master/100M-row-join/b.json

Edit: I ran ANALYZE TABLE on both tables prior to dumping stats.

@winoros
Copy link
Member

winoros commented Dec 4, 2018

This is caused by that TiDB don't explicitly add NOT NULL constraint on join key. So the calculated row count also contains NULL values thus TiDB chose a wrong plan.

There're some ways to fix it.
- After #8097 merged and we enable it by session variable. The row count estimation will be correct. But it needs some test to decide whether we can open it by default.
- Add explicit NOT NULL constraint(like adding a builtin function notnull(key)).
- Record NULL COUNT information in plan's StatsInfo and modify the cost estimation formula.

It will take some time to make TiDB work properly without changing any session variable.

@winoros
Copy link
Member

winoros commented Dec 4, 2018

The previous comment only considers row count estimation. If we take real execution into consideration, NULL values does need to be filtered out before join.
So the best way is explicitly add NOT NULL constraint on join key.

@morgo
Copy link
Contributor Author

morgo commented Dec 4, 2018

Would it be correct to call this 2 separate bugs? This one can relate to planning, and then there is a second inefficiency in execution. Should it be in scope for #8470 , or independent work?

@zz-jason
Copy link
Member

zz-jason commented Dec 4, 2018

Yes, this issue and #8470 are two independent works. This one is to generate a better execution plan, the other is to efficiently execute the execution plan.

@morgo
Copy link
Contributor Author

morgo commented Dec 4, 2018

So a quick update. If I add explicit NOT NULL the query performance improves dramatically:

tidb> explain analyze select /*+ TIDB_INLJ(a, b) */ count(*) from a inner join b on a.b_id = b.id WHERE a.b_id IS NOT NULL AND b.id IS NOT NULL;
+--------------------------+--------------+------+------------------------------------------------------------------------------+----------------------------------------+
| id                       | count        | task | operator info                                                                | execution info                         |
+--------------------------+--------------+------+------------------------------------------------------------------------------+----------------------------------------+
| StreamAgg_12             | 1.00         | root | funcs:count(1)                                                               | time:175.189653ms, loops:2, rows:1     |
| └─IndexJoin_31           | 125314939.62 | root | inner join, inner:IndexReader_30, outer key:test.a.b_id, inner key:test.b.id | time:175.185979ms, loops:1, rows:0     |
|   ├─IndexReader_36       | 100251951.70 | root | index:IndexScan_35                                                           | time:165.131448ms, loops:34, rows:1000 |
|   │ └─IndexScan_35       | 100251951.70 | cop  | table:a, index:b_id, c77, range:[-inf,+inf], keep order:false                | time:0s, loops:0, rows:1000            |
|   └─IndexReader_30       | 0.00         | root | index:Selection_29                                                           | time:21.052184ms, loops:5, rows:0      |
|     └─Selection_29       | 0.00         | cop  | not(isnull(test.b.id))                                                       |                                        |
|       └─IndexScan_28     | 1.00         | cop  | table:b, index:id, range: decided by [test.a.b_id], keep order:false         | time:0s, loops:0, rows:0               |
+--------------------------+--------------+------+------------------------------------------------------------------------------+----------------------------------------+
7 rows in set (0.17 sec)

But I need to keep the hint there. If I remove it, it still reverts to MergeJoin (but is faster than the earlier mergejoin):

tidb> explain analyze select count(*) from a inner join b on a.b_id = b.id WHERE a.b_id IS NOT NULL AND b.id IS NOT NULL;
+--------------------------+--------------+------+--------------------------------------------------------------+--------------------------------------------------+
| id                       | count        | task | operator info                                                | execution info                                   |
+--------------------------+--------------+------+--------------------------------------------------------------+--------------------------------------------------+
| StreamAgg_12             | 1.00         | root | funcs:count(1)                                               | time:30.756390136s, loops:2, rows:1              |
| └─MergeJoin_250          | 125314939.62 | root | inner join, left key:test.a.b_id, right key:test.b.id        | time:30.756383126s, loops:1, rows:0              |
|   ├─IndexReader_81       | 100251951.70 | root | index:IndexScan_80                                           | time:261.258µs, loops:33, rows:1000              |
|   │ └─IndexScan_80       | 100251951.70 | cop  | table:a, index:b_id, c77, range:[-inf,+inf], keep order:true | time:0s, loops:0, rows:1000                      |
|   └─IndexReader_237      | 95030682.00  | root | index:IndexScan_236                                          | time:12.805748168s, loops:2969648, rows:95028736 |
|     └─IndexScan_236      | 95030682.00  | cop  | table:b, index:id, range:[-inf,+inf], keep order:true        | time:0s, loops:0, rows:95030682                  |
+--------------------------+--------------+------+--------------------------------------------------------------+--------------------------------------------------+
6 rows in set (30.76 sec)

@winoros
Copy link
Member

winoros commented Dec 4, 2018

|   ├─IndexReader_81       | 100251951.70 | root | index:IndexScan_80                                           | time:261.258µs, loops:33, rows:1000              |

Oh, this is a very strange result. Its estimation row count is 100251951.70 while its actual row count is 1000.
What's the explain result of select * from a where a.b_id IS NOT NULL?

BTW, i've tried the stats in your repo. It chose index join properly by only adding NOT NULL constraint.

@morgo
Copy link
Contributor Author

morgo commented Dec 4, 2018

The actual result is 1000. The explain result is:

mysql [127.0.0.1] {root} (test) > explain analyze select * from a where a.b_id IS NOT NULL;
+---------------------+--------------+------+----------------------------------------------+-----------------------------------------+
| id                  | count        | task | operator info                                | execution info                          |
+---------------------+--------------+------+----------------------------------------------+-----------------------------------------+
| TableReader_7       | 100251951.70 | root | data:Selection_6                             | time:31.661231388s, loops:33, rows:1000 |
| └─Selection_6       | 100251951.70 | cop  | not(isnull(test.a.b_id))                     |                                         |
|   └─TableScan_5     | 100352304.00 | cop  | table:a, range:[-inf,+inf], keep order:false | time:0s, loops:0, rows:100352304        |
+---------------------+--------------+------+----------------------------------------------+-----------------------------------------+
3 rows in set (31.66 sec)

@zz-jason zz-jason assigned winoros and unassigned zz-jason Dec 5, 2018
@morgo
Copy link
Contributor Author

morgo commented Dec 5, 2018

I have a clear test-case for part of this problem on a smaller data-set. So I've created: #8587

In my smaller test-case, an index join was correctly chosen as part of the added transformation. On my larger one, it wasn't - so there might be two issues.

@morgo
Copy link
Contributor Author

morgo commented Feb 13, 2019

I have confirmed that the optimizer is now picking INLJ correctly! I am going to close this issue now.

@morgo morgo closed this as completed Feb 13, 2019
@morgo
Copy link
Contributor Author

morgo commented Feb 13, 2019

I closed this bug too quickly. At small volumes of data it is correctly picking INLJ. As I ramp up the volume, it picks SMJ incorrectly:

mysql> explain analyze select /*+ TIDB_INLJ(a, b) */  count(*) from a inner join b on a.b_id = b.id;
+--------------------------+-----------+------+------------------------------------------------------------------------------+------------------------------------+
| id                       | count     | task | operator info                                                                | execution info                     |
+--------------------------+-----------+------+------------------------------------------------------------------------------+------------------------------------+
| StreamAgg_11             | 1.00      | root | funcs:count(1)                                                               | time:9.939494ms, loops:2, rows:1   |
| └─IndexJoin_24           | 173659.31 | root | inner join, inner:IndexReader_23, outer key:test.a.b_id, inner key:test.b.id | time:9.936418ms, loops:1, rows:0   |
|   ├─IndexReader_29       | 138927.45 | root | index:IndexScan_28                                                           | time:1.984726ms, loops:2, rows:470 |
|   │ └─IndexScan_28       | 138927.45 | cop  | table:a, index:b_id, c77, range:[-inf,+inf], keep order:false                | time:0s, loops:0, rows:470         |
|   └─IndexReader_23       | 1.00      | root | index:IndexScan_22                                                           | time:5.872931ms, loops:1, rows:0   |
|     └─IndexScan_22       | 1.00      | cop  | table:b, index:id, range: decided by [test.a.b_id], keep order:false         | time:0s, loops:0, rows:0           |
+--------------------------+-----------+------+------------------------------------------------------------------------------+------------------------------------+
6 rows in set (0.01 sec)

mysql> explain analyze select  count(*) from a inner join b on a.b_id = b.id;
+--------------------------+-----------+------+--------------------------------------------------------------+-------------------------------------------+
| id                       | count     | task | operator info                                                | execution info                            |
+--------------------------+-----------+------+--------------------------------------------------------------+-------------------------------------------+
| StreamAgg_12             | 1.00      | root | funcs:count(1)                                               | time:534.208215ms, loops:2, rows:1        |
| └─MergeJoin_241          | 173659.31 | root | inner join, left key:test.a.b_id, right key:test.b.id        | time:534.20505ms, loops:1, rows:0         |
|   ├─IndexReader_79       | 138927.45 | root | index:IndexScan_78                                           | time:64.712µs, loops:2, rows:470          |
|   │ └─IndexScan_78       | 138927.45 | cop  | table:a, index:b_id, c77, range:[-inf,+inf], keep order:true | time:0s, loops:0, rows:470                |
|   └─IndexReader_233      | 873474.00 | root | index:IndexScan_232                                          | time:394.426198ms, loops:850, rows:870400 |
|     └─IndexScan_232      | 873474.00 | cop  | table:b, index:id, range:[NULL,+inf], keep order:true        | time:0s, loops:0, rows:873474             |
+--------------------------+-----------+------+--------------------------------------------------------------+-------------------------------------------+
6 rows in set (0.54 sec)

I am still using the same schema and generator. I have ran ANALYZE TABLE to no effect:

mysql> ANALYZE TABLE a,b;
Query OK, 0 rows affected (23.05 sec)

mysql> explain analyze select  count(*) from a inner join b on a.b_id = b.id;
+--------------------------+-----------+------+--------------------------------------------------------------+-------------------------------------------+
| id                       | count     | task | operator info                                                | execution info                            |
+--------------------------+-----------+------+--------------------------------------------------------------+-------------------------------------------+
| StreamAgg_12             | 1.00      | root | funcs:count(1)                                               | time:521.387007ms, loops:2, rows:1        |
| └─MergeJoin_241          | 173659.31 | root | inner join, left key:test.a.b_id, right key:test.b.id        | time:521.383621ms, loops:1, rows:0        |
|   ├─IndexReader_79       | 138927.45 | root | index:IndexScan_78                                           | time:66.315µs, loops:2, rows:470          |
|   │ └─IndexScan_78       | 138927.45 | cop  | table:a, index:b_id, c77, range:[-inf,+inf], keep order:true | time:0s, loops:0, rows:470                |
|   └─IndexReader_233      | 873474.00 | root | index:IndexScan_232                                          | time:373.642335ms, loops:850, rows:870400 |
|     └─IndexScan_232      | 873474.00 | cop  | table:b, index:id, range:[NULL,+inf], keep order:true        | time:0s, loops:0, rows:873474             |
+--------------------------+-----------+------+--------------------------------------------------------------+-------------------------------------------+
6 rows in set (0.53 sec)

@morgo morgo reopened this Feb 13, 2019
@morgo
Copy link
Contributor Author

morgo commented Feb 13, 2019

mysql> select count(*) from a;
+----------+
| count(*) |
+----------+
|   874470 |
+----------+
1 row in set (0.56 sec)

mysql> SELECT count(*) FROM a WHERE b_id IS NULL;
+----------+
| count(*) |
+----------+
|   874000 |
+----------+
1 row in set (0.58 sec)

mysql> SELECT COUNT(*) FROM b;
+----------+
| COUNT(*) |
+----------+
|   873474 |
+----------+
1 row in set (0.56 sec)

@eurekaka
Copy link
Contributor

eurekaka commented Feb 14, 2019

@morgo from the explain, we can see that we have internally derived a.b_id is not null and pushed it down to IndexScan_78. For b.id, since it is primary key and has not null flag, we don't derive is not null for it. The problem here should be the wrongly estimated count of IndexScan_78. From the count(*) query you pasted, IndexScan_78 should return 470 rows instead, that may impact the join choice.

@eurekaka
Copy link
Contributor

@morgo Please update the latest stats of the table so we can check why the estimation is wrong.

@morgo
Copy link
Contributor Author

morgo commented Feb 14, 2019

Here is the full test case (3 MiB compressed)

@eurekaka
Copy link
Contributor

eurekaka commented Apr 2, 2019

@morgo you can try latest master branch to check if this problem is fixed or not. Note that, the fix can only improve null estimation for single-column index, so to make the fix work for this issue, you have to add a single-column index for column a.b_id.

@zz-jason
Copy link
Member

zz-jason commented Apr 2, 2019

@eurekaka for composite index, can we maintain the null count for each prefix index? for example, a composite index (a, b, c), we can maintain 3 null counts:

  1. null count for (a)
  2. null count for (a, b), where a=null and b = null
  3. null count for (a, b, c), where a=null and b=null and c=null

@eurekaka
Copy link
Contributor

eurekaka commented Apr 2, 2019

@zz-jason yes, this is a feasible approach, but we have to modify the storage of histograms, because we only use one int field to store the NullCount of indexes now.

@morgo
Copy link
Contributor Author

morgo commented Apr 2, 2019

@eurekaka confirming it is not fixed:

mysql> ANALYZE TABLE a,b;
Query OK, 0 rows affected (10.70 sec)

mysql> explain analyze select  count(*) from a inner join b on a.b_id = b.id;
+--------------------------+-----------+------+--------------------------------------------------------------+-------------------------------------------+
| id                       | count     | task | operator info                                                | execution info                            |
+--------------------------+-----------+------+--------------------------------------------------------------+-------------------------------------------+
| StreamAgg_12             | 1.00      | root | funcs:count(1)                                               | time:208.699795ms, loops:2, rows:1        |
| └─MergeJoin_31           | 32843.35  | root | inner join, left key:test.a.b_id, right key:test.b.id        | time:208.69703ms, loops:1, rows:0         |
|   ├─IndexReader_24       | 32564.48  | root | index:IndexScan_23                                           | time:120.719µs, loops:2, rows:406         |
|   │ └─IndexScan_23       | 32564.48  | cop  | table:a, index:b_id, c77, range:[-inf,+inf], keep order:true | time:0s, loops:0, rows:406                |
|   └─IndexReader_26       | 193386.00 | root | index:IndexScan_25                                           | time:166.281301ms, loops:188, rows:192512 |
|     └─IndexScan_25       | 193386.00 | cop  | table:b, index:id, range:[NULL,+inf], keep order:true        | time:0s, loops:0, rows:193386             |
+--------------------------+-----------+------+--------------------------------------------------------------+-------------------------------------------+
6 rows in set (0.21 sec)

mysql> explain analyze select /*+ TIDB_INLJ(a, b) */  count(*) from a inner join b on a.b_id = b.id;
+--------------------------+----------+------+------------------------------------------------------------------------------+------------------------------------+
| id                       | count    | task | operator info                                                                | execution info                     |
+--------------------------+----------+------+------------------------------------------------------------------------------+------------------------------------+
| StreamAgg_11             | 1.00     | root | funcs:count(1)                                                               | time:11.639277ms, loops:2, rows:1  |
| └─IndexJoin_25           | 32516.87 | root | inner join, inner:IndexReader_24, outer key:test.a.b_id, inner key:test.b.id | time:11.635751ms, loops:1, rows:0  |
|   ├─IndexReader_21       | 32240.78 | root | index:IndexScan_20                                                           | time:1.600473ms, loops:2, rows:406 |
|   │ └─IndexScan_20       | 32240.78 | cop  | table:a, index:b_id, c77, range:[-inf,+inf], keep order:false                | time:0s, loops:0, rows:406         |
|   └─IndexReader_24       | 1.00     | root | index:IndexScan_23                                                           | time:8.512852ms, loops:1, rows:0   |
|     └─IndexScan_23       | 1.00     | cop  | table:b, index:id, range: decided by [test.a.b_id], keep order:false         | time:0s, loops:0, rows:0           |
+--------------------------+----------+------+------------------------------------------------------------------------------+------------------------------------+
6 rows in set (0.01 sec)

mysql> select tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: v3.0.0-beta.1-49-g4cbe896b5
Git Commit Hash: 4cbe896b5b589f38bd2a69fdca0bb21d480a6672
Git Branch: master
UTC Build Time: 2019-04-02 01:39:53
GoVersion: go version go1.12.1 linux/amd64
Race Enabled: false
TiKV Min Version: 2.1.0-alpha.1-ff3dd160846b7d1aed9079c389fc188f7f5ea13e
Check Table Before Drop: false
1 row in set (0.00 sec)

@eurekaka
Copy link
Contributor

eurekaka commented Apr 3, 2019

@morgo

Note that, the fix can only improve null estimation for single-column index, so to make the fix work for this issue, you have to add a single-column index for column a.b_id.

Could you please add a single-column index on a.b_id and check if the fix works?

@morgo
Copy link
Contributor Author

morgo commented Apr 3, 2019

Could you please add a single-column index on a.b_id and check if the fix works?

Yes, this fixes it. Sorry, I missed your instruction in the earlier comment to do this :-)

@morgo
Copy link
Contributor Author

morgo commented Apr 3, 2019

I am going to close this issue. Adding a single-column index on a.b_id is an acceptable solution.

@morgo morgo closed this as completed Apr 3, 2019
@eurekaka
Copy link
Contributor

eurekaka commented Apr 4, 2019

@morgo thanks.

@zz-jason I revisited the approach mentioned for composite index

for composite index, can we maintain the null count for each prefix index? for example, a composite index (a, b, c), we can maintain 3 null counts:

  • null count for (a)
  • null count for (a, b), where a=null and b = null
  • null count for (a, b, c), where a=null and b=null and c=null

it can solve the problem in this issue without forcing to add single-column index on a.b_id, but for other queries like where a = 1 and b is null, we cannot utilize any null counts maintained.

Currently, for the above 3 null counts, only null count for (a) is useful actually because we are not treating is null as equal condition now, so if condition is like a is null and b is null, only a is null is detached as access condition.

mysql> explain select * from t use index(idx) where b is null and c is null;
+-----------------------+-------+------+---------------------------------------------------------------------------+
| id                    | count | task | operator info                                                             |
+-----------------------+-------+------+---------------------------------------------------------------------------+
| IndexLookUp_8         | 0.00  | root |                                                                           |
| ├─Selection_7         | 0.00  | cop  | isnull(test.t.c)                                                          |
| │ └─IndexScan_5       | 0.00  | cop  | table:t, index:b, c, d, range:[NULL,NULL], keep order:false, stats:pseudo |
| └─TableScan_6         | 0.00  | cop  | table:t, keep order:false                                                 |
+-----------------------+-------+------+---------------------------------------------------------------------------+

so IMHO this approach for composite index is not that cost-effective?

@zz-jason
Copy link
Member

zz-jason commented Apr 4, 2019

Got it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants