Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated columns index replacement rule not support JSON functions like MySQL #35490

Closed
Tracked by #36993
sunnyfun888 opened this issue Jun 18, 2022 · 9 comments · Fixed by #37779
Closed
Tracked by #36993

Generated columns index replacement rule not support JSON functions like MySQL #35490

sunnyfun888 opened this issue Jun 18, 2022 · 9 comments · Fixed by #37779
Assignees
Labels
component/json severity/moderate sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.

Comments

@sunnyfun888
Copy link

Bug Report

Please answer these questions before submitting your issue. Thanks!

In MySQL, use this way, optimizer will look for compatible virtual columns with indexes that match the expression in JSON queries:

mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.26    |
+-----------+
1 row in set (0.00 sec)

mysql> CREATE TABLE person (
    -> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    -> NAME VARCHAR(255) NOT NULL,
    -> address_info JSON,
    -> city VARCHAR(64) AS (JSON_UNQUOTE(address_info->"$.city")),
    -> KEY (city)
    -> );
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO `person` (`id`, `NAME`, `address_info`) VALUES('1','a','{\"city\": \"Beijing\"}');
Query OK, 1 row affected (0.01 sec)

mysql> EXPLAIN ANALYZE SELECT NAME, id, city FROM person WHERE address_info->>"$.city" = 'Beijing';
+----------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                              |
+----------------------------------------------------------------------------------------------------------------------+
| -> Index lookup on person using city (city='Beijing')  (cost=0.35 rows=1) (actual time=0.026..0.029 rows=1 loops=1)
|
+----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

But same SQL in TiDB, the query will be TableFullScan:

SELECT VERSION();
+--------------------+
| VERSION()          |
+--------------------+
| 5.7.25-TiDB-v6.1.0 |
+--------------------+
1 row in set (0.00 sec)

mysql> CREATE TABLE person (
    -> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    -> NAME VARCHAR(255) NOT NULL,
    -> address_info JSON,
    -> city VARCHAR(64) AS (JSON_UNQUOTE(address_info->"$.city")),
    -> KEY (city)
    -> );

EXPLAIN ANALYZE SELECT NAME, id, city FROM person WHERE address_info->>"$.city" = 'Beijing';
Query OK, 0 rows affected (0.08 sec)

mysql> INSERT INTO `person` (`id`, `NAME`, `address_info`) VALUES('1','a','{\"city\": \"Beijing\"}');
Query OK, 1 row affected (0.01 sec)

mysql> EXPLAIN ANALYZE SELECT NAME, id, city FROM person WHERE address_info->>"$.city" = 'Beijing';
+---------------------------+----------+---------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------+-----------+------+
| id                        | estRows  | actRows | task      | access object | execution info                                                                                                                                                                                                                           | operator info                                                                                             | memory    | disk |
+---------------------------+----------+---------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------+-----------+------+
| Projection_4              | 8000.00  | 1       | root      |               | time:640µs, loops:2, Concurrency:5                                                                                                                                                                                                      | test.person.name, test.person.id, test.person.city                                                        | 19.2 KB   | N/A  |
| └─TableReader_7       | 8000.00  | 1       | root      |               | time:586.1µs, loops:2, cop_task: {num: 1, max: 489.7µs, proc_keys: 1, rpc_num: 1, rpc_time: 472.4µs, copr_cache_hit_ratio: 0.00}                                                                                                      | data:Selection_6                                                                                          | 328 Bytes | N/A  |
|   └─Selection_6       | 8000.00  | 1       | cop[tikv] |               | tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 72, total_keys: 2, rocksdb: {delete_skipped_count: 1, key_skipped_count: 2, block: {cache_hit_count: 1, read_count: 0, read_byte: 0 Bytes}}} | eq(json_unquote(cast(json_extract(test.person.address_info, "$.city"), var_string(16777216))), "Beijing") | N/A       | N/A  |
|     └─TableFullScan_5 | 10000.00 | 1       | cop[tikv] | table:person  | tikv_task:{time:0s, loops:1}                                                                                                                                                                                                             | keep order:false, stats:pseudo                                                                            | N/A       | N/A  |
+---------------------------+----------+---------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------+-----------+------+

It seems the Generated columns index replacement rule not work in this case.

2. What did you expect to see? (Required)

Generated columns index replacement rule supports JSON functions, like MySQL.

3. What did you see instead (Required)

Current version of TiDB not support.

4. What is your TiDB version? (Required)

6.1.0

@sunnyfun888 sunnyfun888 added the type/bug The issue is confirmed as a bug. label Jun 18, 2022
@seiya-annie seiya-annie added severity/moderate type/enhancement The issue or PR belongs to an enhancement. and removed type/bug The issue is confirmed as a bug. labels Jun 21, 2022
@wjhuang2016
Copy link
Member

Hi, @sunnyfun888
JSON_UNQUTE is a special function that hard to apply the optimization.
You can work around with

mysql> alter table person add index idx((cast(json_extract(`address_info`, _utf8mb4'$.city') as char(64))));
Query OK, 0 rows affected (2.54 sec)

mysql> EXPLAIN ANALYZE SELECT NAME, id, city FROM person WHERE cast(json_extract(`address_info`, _utf8mb4'$.city') as char(64)) = 'Beijing';
+------------------------------------+---------+---------+-----------+-------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+-----------+------+
| id                                 | estRows | actRows | task      | access object                                                                             | execution info                                                                                                                                                          | operator info                                                                                                            | memory    | disk |
+------------------------------------+---------+---------+-----------+-------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+-----------+------+
| Projection_4                       | 10.00   | 0       | root      |                                                                                           | time:892.8µs, loops:1, Concurrency:OFF                                                                                                                                  | test.person.name, test.person.id, test.person.city                                                                       | 2.23 KB   | N/A  |
| └─Projection_12                    | 10.00   | 0       | root      |                                                                                           | time:889µs, loops:1, Concurrency:OFF                                                                                                                                    | test.person.id, test.person.name, test.person.city, cast(json_extract(test.person.address_info, $.city), var_string(64)) | 2.86 KB   | N/A  |
|   └─IndexLookUp_11                 | 10.00   | 0       | root      |                                                                                           | time:887.8µs, loops:1,                                                                                                                                                  |                                                                                                                          | 163 Bytes | N/A  |
|     ├─IndexRangeScan_9(Build)      | 10.00   | 0       | cop[tikv] | table:person, index:idx(cast(json_extract(`address_info`, _utf8mb4'$.city') as char(64))) | time:577.9µs, loops:1, cop_task: {num: 1, max: 376.3µs, proc_keys: 0, rpc_num: 1, rpc_time: 301.6µs, copr_cache_hit_ratio: 0.00}, tikv_task:{time:229.8µs, loops:0}     | range:["Beijing","Beijing"], keep order:false, stats:pseudo                                                              | N/A       | N/A  |
|     └─TableRowIDScan_10(Probe)     | 10.00   | 0       | cop[tikv] | table:person                                                                              |                                                                                                                                                                         | keep order:false, stats:pseudo                                                                                           | N/A       | N/A  |
+------------------------------------+---------+---------+-----------+-------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+-----------+------+
5 rows in set (0.01 sec)

@seiya-annie seiya-annie added the sig/planner SIG: Planner label Jun 22, 2022
@sunnyfun888
Copy link
Author

Hi @wjhuang2016,
We're trying to migrate a legacy system from MySQL to TiDB. During the migration research, this issue was found. There're a variety of data types in the json(string, number, datetime), all was fine when use virtual columns with indexes in MySQL, if use this workaround, the application need to record every data types instead of DB, maybe the cost will be a little big.

@wjhuang2016
Copy link
Member

Hi @wjhuang2016, We're trying to migrate a legacy system from MySQL to TiDB. During the migration research, this issue was found. There're a variety of data types in the json(string, number, datetime), all was fine when use virtual columns with indexes in MySQL, if use this workaround, the application need to record every data types instead of DB, maybe the cost will be a little big.

The problem is that the expression type and the generated column type need to be strictly equal.
MySQL doesn't follow it, and it causes a bug in MySQL.
For example, the result of JSON_UNQUOTE(address_info->"$.city") may be longer than 64, so it cannot be stored in city without truncation.
If we replace it by city, the result may be wrong.

@sunnyfun888
Copy link
Author

The problem is that the expression type and the generated column type need to be strictly equal. MySQL doesn't follow it, and it causes a bug in MySQL. For example, the result of JSON_UNQUOTE(address_info->"$.city") may be longer than 64, so it cannot be stored in city without truncation. If we replace it by city, the result may be wrong.

I think all other virtual columns use expressions will meet the same problem, infact in MySQL if the string is longer than 64 when insert or update, will get Data too long for column error, and the truncation is rollbacked. The behavior of MySQL is more like syntactic sugar.

@wjhuang2016
Copy link
Member

The problem is that the expression type and the generated column type need to be strictly equal. MySQL doesn't follow it, and it causes a bug in MySQL. For example, the result of JSON_UNQUOTE(address_info->"$.city") may be longer than 64, so it cannot be stored in city without truncation. If we replace it by city, the result may be wrong.

I think all other virtual columns use expressions will meet the same problem, infact in MySQL if the string is longer than 64 when insert or update, will get Data too long for column error, and the truncation is rollbacked. The behavior of MySQL is more like syntactic sugar.

@sunnyfun888 For the strict sql-mode, it would report an error. But if in the empty sql-mode, it can insert successfully.

@wjhuang2016
Copy link
Member

wjhuang2016 commented Jun 24, 2022

 if the string is longer than 64 when insert or update, will get Data too long for column error, and the truncation will be rollbacked

It's not true if the sql_mode is "".

set @@sql_mode="";

If users can accept the possible wrong result, it's ok the do this optimization. But when designed this feature, we tend to choose the correctness.

@sunnyfun888
Copy link
Author

If users can accept the possible wrong result, it's ok the do this optimization. But when designed this feature, we tend to choose the correctness.

I agree and can use a switch then leave the choice to users.

@xiongjiwei xiongjiwei self-assigned this Aug 9, 2022
@xiongjiwei
Copy link
Contributor

xiongjiwei commented Aug 11, 2022

it is a mysql bug, consider the following case

CREATE TABLE person (id INT PRIMARY KEY,address_info JSON,city VARCHAR(2) AS (JSON_UNQUOTE(address_info->"$.city")),KEY (city));
set @@sql_mode='';
INSERT INTO `person` (`id`, `address_info`) VALUES('1','{\"city\": \"Beijing\"}');  -- success with warning
set @@sql_mode=default;
SELECT id FROM person ignore index(`city`) WHERE address_info->>"$.city" = 'Beijing';  -- 1 record
SELECT id FROM person force index(`city`) WHERE address_info->>"$.city" = 'Beijing';  -- 0 record

use or not use an index will get a different result, it is absolutely a bug

@xiongjiwei
Copy link
Contributor

close due to MySQL has confirmed it is a bug https://bugs.mysql.com/bug.php?id=108114

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/json severity/moderate sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants