-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizer selects merge join over nested loop join incorrectly #8563
Comments
Thanks @morgo . |
The files have been uploaded here: Edit: I ran ANALYZE TABLE on both tables prior to dumping stats. |
This is caused by that TiDB don't explicitly add
It will take some time to make TiDB work properly without changing any session variable. |
The previous comment only considers row count estimation. If we take real execution into consideration, |
Would it be correct to call this 2 separate bugs? This one can relate to planning, and then there is a second inefficiency in execution. Should it be in scope for #8470 , or independent work? |
Yes, this issue and #8470 are two independent works. This one is to generate a better execution plan, the other is to efficiently execute the execution plan. |
So a quick update. If I add explicit
But I need to keep the hint there. If I remove it, it still reverts to MergeJoin (but is faster than the earlier mergejoin):
|
Oh, this is a very strange result. Its estimation row count is BTW, i've tried the stats in your repo. It chose index join properly by only adding |
The actual result is
|
I have a clear test-case for part of this problem on a smaller data-set. So I've created: #8587 In my smaller test-case, an index join was correctly chosen as part of the added transformation. On my larger one, it wasn't - so there might be two issues. |
I have confirmed that the optimizer is now picking |
I closed this bug too quickly. At small volumes of data it is correctly picking
I am still using the same schema and generator. I have ran
|
|
@morgo from the explain, we can see that we have internally derived |
@morgo Please update the latest stats of the table so we can check why the estimation is wrong. |
Here is the full test case (3 MiB compressed) |
@morgo you can try latest master branch to check if this problem is fixed or not. Note that, the fix can only improve null estimation for single-column index, so to make the fix work for this issue, you have to add a single-column index for column |
@eurekaka for composite index, can we maintain the null count for each prefix index? for example, a composite index (a, b, c), we can maintain 3 null counts:
|
@zz-jason yes, this is a feasible approach, but we have to modify the storage of histograms, because we only use one int field to store the |
@eurekaka confirming it is not fixed:
|
Could you please add a single-column index on |
Yes, this fixes it. Sorry, I missed your instruction in the earlier comment to do this :-) |
I am going to close this issue. Adding a single-column index on |
@morgo thanks. @zz-jason I revisited the approach mentioned for composite index
it can solve the problem in this issue without forcing to add single-column index on Currently, for the above 3 null counts, only
so IMHO this approach for composite index is not that cost-effective? |
Got it. |
Bug Report
Please answer these questions before submitting your issue. Thanks!
I imported the schema from https://github.com/morgo/tidb-microbench/blob/master/100M-row-join/schema.sql
I then generated 100M rows in table a and b, using my generator script: https://github.com/morgo/tidb-microbench/blob/master/100M-row-join/bench/generator.go
I then ran the following query:
EXPLAIN ANALYZE select count(*) from a inner join b on a.b_id = b.id;
My configuration is x1 tidb, x1 pd x1 tikv on a single host. I've also used toxiproxy to inject 1-10ms latency, but it didn't measurably make a difference.
Because there are few
a.b_id
values that areNOT NULL
, I would have expected a nested loop join. If I force one, the execution time is13.55 sec
:A MergeJoin was selected at an execution time of
40.54 sec
:tidb-server -V
or runselect tidb_version();
on TiDB)?The text was updated successfully, but these errors were encountered: