-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planner: rewrite count star to count not null column #39197
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
/run-build |
/run-check_dev |
/run-check_dev_2 |
/run-mysql-test |
/run-unit-test |
/run-all-tests |
1 similar comment
/run-all-tests |
/run-mysql-test tidb-test=2026 |
/run-mysql-test tidb-test=2026 |
/run-mysql-test tidb-test=pr/2026 |
2 similar comments
/run-mysql-test tidb-test=pr/2026 |
/run-mysql-test tidb-test=pr/2026 |
/run-mysql-test tidb-test=pr/2026 |
1 similar comment
/run-mysql-test tidb-test=pr/2026 |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 83dd8a3
|
/run-mysql-test tidb-test=pr/2026 |
TiDB MergeCI notify🔴 Bad News! New failing [2] after this pr merged.
|
Case2 there is no columns from datasource | ||
Query: select count(*) from table | ||
ColumnPruningRule: pick k1 as the narrowest not null column from origin table @Function.preferNotNullColumnFromTable | ||
datasource.columns: k1 | ||
CountStarRewriterRule: rewrite count(*) -> count(k1) | ||
Rewritten Query: select count(k1) from table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any benchmark for this case? In TiFlash, there are extra three columns <row_id, version, delmark> which are used for MVCC. Therefore, if there are multi version of data, we need to scan <row_id, version, delmark> each query additionally. For count(1)
we need to scan <row_id, version, delmark> and return row_id to aggregation, but for count(k)
we need to scan <row_id, version, delmark, k1> and return k1 to aggregation. So the performance may degrade when there are multi version of data in TiFlash and k1 is wide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am doing performance testing.
Do you mean that in the case of multiple versions, even if the count(*) does not select the row_id column, the final read will still read the row_id column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we need to use it to filter out other data of other versions.
What problem does this PR solve?
Issue Number: close #37165
Problem Summary:
What is changed and how it works?
The main purpose of this pr is to improve the execution speed of count(), and rewrite count() to count (the narrowest non-null column) at the planning layer to achieve the purpose of improving performance.
countStarRewriter
logical rule responsible for rewriting count(*) to count(the narrowest non-null column).For detailed rewriting steps, see comments of
countStarRewriter
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.