Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join on index has incorrect range and will contain NULL value #7226

Closed
birdstorm opened this issue Aug 1, 2018 · 6 comments
Closed

Join on index has incorrect range and will contain NULL value #7226

birdstorm opened this issue Aug 1, 2018 · 6 comments
Assignees
Labels
sig/execution SIG execution type/bug The issue is confirmed as a bug.

Comments

@birdstorm
Copy link
Contributor

birdstorm commented Aug 1, 2018

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.

Make a simple join test which has NULL value with this file. IndexTest.txt

select a.id_dt, a.tp_decimal from full_data_type_table_idx a join full_data_type_table_idx b on a.tp_decimal = b.tp_decimal order by a.tp_decimal;
  1. What did you expect to see?
mysql> select a.id_dt, a.tp_decimal from full_data_type_table_idx a join full_data_type_table_idx b on a.tp_decimal = b.tp_decimal order by a.tp_decimal;
+-------+------------+
| id_dt | tp_decimal |
+-------+------------+
|    -1 |          0 |
|    -1 |          0 |
|     0 |          0 |
........
|    98 |         98 |
|    99 |         99 |
|   100 |        100 |
+-------+------------+
110 rows in set (0.02 sec)
  1. What did you see instead?
mysql> select a.id_dt, a.tp_decimal from full_data_type_table_idx a join full_data_type_table_idx b on a.tp_decimal = b.tp_decimal order by a.tp_decimal;
+-------+------------+
| id_dt | tp_decimal |
+-------+------------+
| -1000 |       NULL |
|    -1 |          0 |
|    -1 |          0 |
|     0 |          0 |
..........
|    98 |         98 |
|    99 |         99 |
|   100 |        100 |
+-------+------------+
111 rows in set (0.02 sec)
mysql> explain select a.id_dt, a.tp_decimal from full_data_type_table_idx a join full_data_type_table_idx b on a.tp_decimal = b.tp_decimal order by a.tp_decimal;
+--------------------------+--------+------+---------------------------------------------------------------+
| id                       | count  | task | operator info                                                 |
+--------------------------+--------+------+---------------------------------------------------------------+
| Projection_910           | 128.75 | root | a.id_dt, a.tp_decimal                                         |
| └─MergeJoin_911          | 128.75 | root | inner join, left key:a.tp_decimal, right key:b.tp_decimal     |
|   ├─IndexReader_44       | 103.00 | root | index:IndexScan_43                                            |
|   │ └─IndexScan_43       | 103.00 | cop  | table:a, index:tp_decimal, range:[NULL,+inf], keep order:true |
|   └─IndexReader_481      | 103.00 | root | index:IndexScan_480                                           |
|     └─IndexScan_480      | 103.00 | cop  | table:b, index:tp_decimal, range:[NULL,+inf], keep order:true |
+--------------------------+--------+------+---------------------------------------------------------------+
6 rows in set (0.00 sec)

You can see that the range is [NULL, +inf]

  1. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?

Latest master build. It also AFFECTS release-2.0 branch

In addition, Hash join is correct. @winoros

mysql> select /*+ TIDB_HJ(a, b) */ a.id_dt, a.tp_decimal from full_data_type_table_idx a use index(idx_tp_decimal) join full_data_type_table_idx b use index(idx_tp_decimal) on a.tp_decimal = b.tp_decimal order by a.tp_decimal;
+-------+------------+
| id_dt | tp_decimal |
+-------+------------+
|    -1 |          0 |
|    -1 |          0 |
|     0 |          0 |
........
|    98 |         98 |
|    99 |         99 |
|   100 |        100 |
+-------+------------+
110 rows in set (0.02 sec)

mysql> explain select /*+ TIDB_HJ(a, b) */ a.id_dt, a.tp_decimal from full_data_type_table_idx a use index(idx_tp_decimal) join full_data_type_table_idx b use index(idx_tp_decimal) on a.tp_decimal = b.tp_decimal order by a.tp_decimal;
+----------------------------+--------+------+--------------------------------------------------------------------------+
| id                         | count  | task | operator info                                                            |
+----------------------------+--------+------+--------------------------------------------------------------------------+
| Sort_6                     | 128.75 | root | a.tp_decimal:asc                                                         |
| └─Projection_8             | 128.75 | root | a.id_dt, a.tp_decimal                                                    |
|   └─HashLeftJoin_16        | 128.75 | root | inner join, inner:IndexReader_21, equal:[eq(a.tp_decimal, b.tp_decimal)] |
|     ├─IndexReader_19       | 103.00 | root | index:IndexScan_18                                                       |
|     │ └─IndexScan_18       | 103.00 | cop  | table:a, index:tp_decimal, range:[NULL,+inf], keep order:false           |
|     └─IndexReader_21       | 103.00 | root | index:IndexScan_20                                                       |
|       └─IndexScan_20       | 103.00 | cop  | table:b, index:tp_decimal, range:[NULL,+inf], keep order:false           |
+----------------------------+--------+------+--------------------------------------------------------------------------+
7 rows in set (0.01 sec)
@zhexuany
Copy link
Contributor

zhexuany commented Aug 1, 2018

@winoros It seems you are the right person to ping.

@zz-jason
Copy link
Member

zz-jason commented Aug 2, 2018

@zhexuany The behavior of MergeJoin is not correct, I think you can take a look and try to fix it.

@zz-jason zz-jason added type/bug The issue is confirmed as a bug. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. sig/execution SIG execution and removed help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Aug 2, 2018
@zz-jason
Copy link
Member

zz-jason commented Aug 2, 2018

I'll fix this as soon as possible. Hope it can be fixed in 2.0.6.

@zz-jason
Copy link
Member

zz-jason commented Aug 2, 2018

Seems like both "Index Join" and "Merge Join" have the same problem:

TiDB(localhost:4000) > desc select /*+ TIDB_INLJ(t1) */ t1.a from t t1 join t t2 on t1.a=t2.a order by t1.a;
+--------------------------+-------+------+-----------------------------------------------------------------------------+
| id                       | count | task | operator info                                                               |
+--------------------------+-------+------+-----------------------------------------------------------------------------+
| Projection_15            | 3.75  | root | t1.a                                                                        |
| └─IndexJoin_19           | 3.75  | root | inner join, inner:IndexReader_18, outer key:t1.a, inner key:t2.a            |
|   ├─IndexReader_22       | 3.00  | root | index:IndexScan_21                                                          |
|   │ └─IndexScan_21       | 3.00  | cop  | table:t1, index:a, range:[NULL,+inf], keep order:true, stats:pseudo         |
|   └─IndexReader_18       | 0.00  | root | index:IndexScan_17                                                          |
|     └─IndexScan_17       | 0.00  | cop  | table:t2, index:a, range: decided by [t1.a], keep order:false, stats:pseudo |
+--------------------------+-------+------+-----------------------------------------------------------------------------+
6 rows in set (0.00 sec)

TiDB(localhost:4000) > select /*+ TIDB_INLJ(t1) */ t1.a from t t1 join t t2 on t1.a=t2.a order by t1.a;
+------+
| a    |
+------+
| NULL |
| 1.01 |
| 2.01 |
+------+
3 rows in set (0.00 sec)

TiDB(localhost:4000) > desc select /*+ TIDB_SMJ(t1) */ t1.a from t t1 join t t2 on t1.a=t2.a order by t1.a;
+--------------------------+-------+------+---------------------------------------------------------------------+
| id                       | count | task | operator info                                                       |
+--------------------------+-------+------+---------------------------------------------------------------------+
| Projection_16            | 3.75  | root | t1.a                                                                |
| └─MergeJoin_17           | 3.75  | root | inner join, left key:t1.a, right key:t2.a                           |
|   ├─IndexReader_12       | 3.00  | root | index:IndexScan_11                                                  |
|   │ └─IndexScan_11       | 3.00  | cop  | table:t1, index:a, range:[NULL,+inf], keep order:true, stats:pseudo |
|   └─IndexReader_15       | 3.00  | root | index:IndexScan_14                                                  |
|     └─IndexScan_14       | 3.00  | cop  | table:t2, index:a, range:[NULL,+inf], keep order:true, stats:pseudo |
+--------------------------+-------+------+---------------------------------------------------------------------+
6 rows in set (0.00 sec)

TiDB(localhost:4000) > select /*+ TIDB_SMJ(t1) */ t1.a from t t1 join t t2 on t1.a=t2.a order by t1.a;
+------+
| a    |
+------+
| NULL |
| 1.01 |
| 2.01 |
+------+
3 rows in set (0.00 sec)

TiDB(localhost:4000) > desc select /*+ TIDB_HJ(t1) */ t1.a from t t1 join t t2 on t1.a=t2.a order by t1.a;
+----------------------------+-------+------+-------------------------------------------------------------+
| id                         | count | task | operator info                                               |
+----------------------------+-------+------+-------------------------------------------------------------+
| Sort_6                     | 3.75  | root | t1.a:asc                                                    |
| └─Projection_8             | 3.75  | root | t1.a                                                        |
|   └─HashLeftJoin_16        | 3.75  | root | inner join, inner:TableReader_21, equal:[eq(t1.a, t2.a)]    |
|     ├─TableReader_19       | 3.00  | root | data:TableScan_18                                           |
|     │ └─TableScan_18       | 3.00  | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo |
|     └─TableReader_21       | 3.00  | root | data:TableScan_20                                           |
|       └─TableScan_20       | 3.00  | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo |
+----------------------------+-------+------+-------------------------------------------------------------+
7 rows in set (0.00 sec)

TiDB(localhost:4000) > select /*+ TIDB_HJ(t1) */ t1.a from t t1 join t t2 on t1.a=t2.a order by t1.a;
+------+
| a    |
+------+
| 1.01 |
| 2.01 |
+------+
2 rows in set (0.01 sec)

@zz-jason
Copy link
Member

zz-jason commented Aug 2, 2018

For "Merge Join", the compare function is gained from "util/chunk/compare.go":

108 func cmpMyDecimal(l Row, lCol int, r Row, rCol int) int {
109     lNull, rNull := l.IsNull(lCol), r.IsNull(rCol)
110     if lNull || rNull {
111         return cmpNull(lNull, rNull)
112     }
113     lDec, rDec := l.GetMyDecimal(lCol), r.GetMyDecimal(rCol)
114     return lDec.Compare(rDec)
115 }

And the behavior of cmpNull is:

 58 func cmpNull(lNull, rNull bool) int {
 59     if lNull && rNull {
 60         return 0
 61     }
 62     if lNull {
 63         return -1
 64     }
 65     return 1
 66 }

In a word, these functions have this common behavior: NULL is equal to NULL. We should check whether the join key has any NULL value before comparison.

@birdstorm
Copy link
Contributor Author

fixed by #7255

@zz-jason zz-jason self-assigned this Aug 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

4 participants