Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeat calculation when aggregate function parameters are the same #39576

Open
Leavrth opened this issue Dec 2, 2022 · 1 comment
Open

Repeat calculation when aggregate function parameters are the same #39576

Leavrth opened this issue Dec 2, 2022 · 1 comment
Labels
sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@Leavrth
Copy link
Contributor

Leavrth commented Dec 2, 2022

Feature Request

Is your feature request related to a problem? Please describe:

Suppose I have a table like this:

mysql> SHOW CREATE TABLE `test`.`tbl`;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                               |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tbl   | CREATE TABLE `tbl` (
  `s_i_id` int(11) NOT NULL,
  `s_w_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`s_i_id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Then calculate a value for each line according to the specified function:

mysql> SELECT CAST(CRC32(CONCAT_WS(',', `s_i_id`, `s_w_id`, CONCAT(ISNULL(`s_i_id`), ISNULL(`s_w_id`))))AS UNSIGNED) as CRC32RES FROM `test`.`tbl`;
+------------+
| CRC32RES   |
+------------+
| 3896458847 |
| 3591842708 |
| 2157756552 |
+------------+
3 rows in set (0.01 sec)

Finally, I want to calculate SUM and BITXOR of CRC32RES at the same time:

mysql> SELECT
    ->     COUNT(1) as CNT, SUM(CRC32RES) as SUM_CHECKSUM, BIT_XOR(CRC32RES) as BIT_XOR_CHECKSUM 
    -> FROM
    ->     (SELECT
    ->         CAST(CRC32(CONCAT_WS(',', `s_i_id`, `s_w_id`, CONCAT(ISNULL(`s_i_id`), ISNULL(`s_w_id`))))AS UNSIGNED) as CRC32RES
    ->     FROM `test`.`tbl`) as CRC32LIST;
+-----+--------------+------------------+
| CNT | SUM_CHECKSUM | BIT_XOR_CHECKSUM |
+-----+--------------+------------------+
|   3 |   9646058107 |       3199522115 |
+-----+--------------+------------------+
1 row in set (0.01 sec)

This is OK when the table is very small. However, when the each line of the table is very large, there will be double delay than only use one aggregation.

See the explain of this SQL:

mysql> explain SELECT
    ->     COUNT(1) as CNT, SUM(CRC32RES) as SUM_CHECKSUM, BIT_XOR(CRC32RES) as BIT_XOR_CHECKSUM 
    -> FROM
    ->     (SELECT
    ->         CAST(CRC32(CONCAT_WS(',', `s_i_id`, `s_w_id`, CONCAT(ISNULL(`s_i_id`), ISNULL(`s_w_id`))))AS UNSIGNED) as CRC32RES
    ->     FROM `test`.`tbl`) as CRC32LIST;
+----------------------------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                         | estRows | task      | access object | operator info                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+----------------------------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| StreamAgg_17               | 1.00    | root      |               | funcs:count(Column#11)->Column#4, funcs:sum(Column#12)->Column#5, funcs:bit_xor(Column#13)->Column#6                                                                                                                                                                                                                                                                                                                                                                            |
| └─TableReader_18           | 1.00    | root      |               | data:StreamAgg_9                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|   └─StreamAgg_9            | 1.00    | cop[tikv] |               | funcs:count(1)->Column#11, funcs:sum(cast(crc32(concat_ws(",", cast(test.tbl.s_i_id, var_string(20)), cast(test.tbl.s_w_id, var_string(20)), concat("0", cast(isnull(test.tbl.s_w_id), var_string(20))))), bigint(22) UNSIGNED BINARY))->Column#12, funcs:bit_xor(cast(crc32(concat_ws(",", cast(test.tbl.s_i_id, var_string(20)), cast(test.tbl.s_w_id, var_string(20)), concat("0", cast(isnull(test.tbl.s_w_id), var_string(20))))), bigint(22) UNSIGNED BINARY))->Column#13 |
|     └─TableFullScan_16     | 3.00    | cop[tikv] | table:tbl     | keep order:false, stats:pseudo                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+----------------------------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

It seems that this function is calculated twice, which costs double time.

cast(crc32(concat_ws(",", cast(test.tbl.s_i_id, var_string(20)), cast(test.tbl.s_w_id, var_string(20)), concat("0", cast(isnull(test.tbl.s_w_id), var_string(20))))), bigint(22) UNSIGNED BINARY)

Describe the feature you'd like:

Can this be optimized?

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

@Leavrth Leavrth added the type/feature-request Categorizes issue or PR as related to a new feature. label Dec 2, 2022
@fzzf678 fzzf678 added the sig/planner SIG: Planner label Dec 2, 2022
@fixdb
Copy link
Contributor

fixdb commented Dec 7, 2022

Yes, when the expression appears more than once, the planner should be extract common expression into a Projection operator. In this case, the planner should not inline the expressions.
@XuHuaiyu I know TiFlash can automatically reuse the result of duplicate expressions, currently does TiKV do the same optimization internally?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants