Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistic: the default value of CMSketch is not initialized correctly when using Analyze #19343

Closed
qw4990 opened this issue Aug 20, 2020 · 3 comments · Fixed by #19455
Closed
Assignees
Labels
component/statistics severity/major sig/planner SIG: Planner type/bug The issue is confirmed as a bug. type/enhancement The issue or PR belongs to an enhancement.

Comments

@qw4990
Copy link
Contributor

qw4990 commented Aug 20, 2020

Development Task

How to reproduce:

create table t (a int, key(a));
insert into t values (0);
insert into t values (0);
...
insert into t values (0); // insert 2048 rows

insert into t values (1);
insert into t values (2);
insert into t values (3);

analyze table t with 0 topn;
explain select * from t where a = 1;
explain select * from t where a = 2;
explain select * from t where a = 3;

The estimated rows for a = 1/2/3 is zero, which is not accurate.

This is a bad case for the heuristic rule of removing noise when estimating point get:

		noise := (c.count - uint64(c.table[i][j])) / (uint64(c.width) - 1)
		if uint64(c.table[i][j]) < noise {
			vals[i] = 0
		} else {
			vals[i] = c.table[i][j] - uint32(noise)
		}
@qw4990 qw4990 added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner component/statistics labels Aug 20, 2020
@qw4990 qw4990 self-assigned this Aug 20, 2020
@qw4990 qw4990 changed the title statistic: a bad case for the heuristic rule of removing noise when estimating point get with CMSKetch statistic: a bad case for the heuristic rule of removing noise when estimating point get in CMSKetch Aug 20, 2020
@qw4990
Copy link
Contributor Author

qw4990 commented Aug 20, 2020

In this case, we should use the default value of this CMSketch, but it is zero. We should investigate more to know why the default value is zero in this case.

@qw4990
Copy link
Contributor Author

qw4990 commented Aug 20, 2020

After reviewing the code, I find that the default value of CMSKetch is not initialized correctly in Analyze while Fast Analyze handles it rightly.
So if we use Fast Analyze, you can see the accurate estRows:

... // insert data

mysql> set tidb_enable_fast_analyze=1;
Query OK, 0 rows affected (0.00 sec)

mysql> analyze table t with 0 topn;
Query OK, 0 rows affected (0.02 sec)

mysql> explain select * from t where a = 3;
+------------------------+---------+-----------+---------------------+-------------------------------+
| id                     | estRows | task      | access object       | operator info                 |
+------------------------+---------+-----------+---------------------+-------------------------------+
| IndexReader_6          | 1.00    | root      |                     | index:IndexRangeScan_5        |
| └─IndexRangeScan_5     | 1.00    | cop[tikv] | table:t, index:a(a) | range:[3,3], keep order:false |
+------------------------+---------+-----------+---------------------+-------------------------------+
2 rows in set (0.00 sec)

@qw4990
Copy link
Contributor Author

qw4990 commented Aug 20, 2020

I will fix this issue soon.

@qw4990 qw4990 changed the title statistic: a bad case for the heuristic rule of removing noise when estimating point get in CMSKetch statistic: a bad case for the heuristic rule of removing noise in CMSKetch Aug 20, 2020
@qw4990 qw4990 changed the title statistic: a bad case for the heuristic rule of removing noise in CMSKetch statistic: the default value of CMSketch is not initialized correctly when using Analyze Aug 20, 2020
@qw4990 qw4990 added the type/bug The issue is confirmed as a bug. label Aug 21, 2020
rebelice added a commit to rebelice/tidb that referenced this issue Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics severity/major sig/planner SIG: Planner type/bug The issue is confirmed as a bug. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants