Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: zero estimation result if all values are in TopN #47400

Closed
Tracked by #54816
qw4990 opened this issue Oct 5, 2023 · 2 comments · Fixed by #55077, #55242 or #56848
Closed
Tracked by #54816

planner: zero estimation result if all values are in TopN #47400

qw4990 opened this issue Oct 5, 2023 · 2 comments · Fixed by #55077, #55242 or #56848
Assignees
Labels
affects-6.5 affects-7.1 affects-7.5 affects-8.1 affects-8.5 epic/cardinality-estimation the optimizer cardinality estimation report/customer Customers have encountered this bug. severity/major sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.

Comments

@qw4990
Copy link
Contributor

qw4990 commented Oct 5, 2023

Enhancement

create table t (a int, key(a));
insert into t values (1), (3);
analyze table t with 2 topn;
insert into t values (2);

explain select * from t where a=2;
+------------------------+---------+-----------+---------------------+-------------------------------+
| id                     | estRows | task      | access object       | operator info                 |
+------------------------+---------+-----------+---------------------+-------------------------------+
| IndexReader_6          | 0.00    | root      |                     | index:IndexRangeScan_5        |
| └─IndexRangeScan_5     | 0.00    | cop[tikv] | table:t, index:a(a) | range:[2,2], keep order:false |
+------------------------+---------+-----------+---------------------+-------------------------------+
2 rows in set (0.00 sec)

After v6.5, in the case above, the estimation result for a=2 is zero, which seems dangerous.

If no TopN, it'll be 1:

create table t (a int, key(a));
insert into t values (1), (3);
analyze table t with 0 topn;
insert into t values (2);

mysql> explain select * from t where a=2;
+------------------------+---------+-----------+---------------------+-------------------------------+
| id                     | estRows | task      | access object       | operator info                 |
+------------------------+---------+-----------+---------------------+-------------------------------+
| IndexReader_6          | 1.00    | root      |                     | index:IndexRangeScan_5        |
| └─IndexRangeScan_5     | 1.00    | cop[tikv] | table:t, index:a(a) | range:[2,2], keep order:false |
+------------------------+---------+-----------+---------------------+-------------------------------+

Should we avoid using 0 in this case (this value is not in TopN and there is no Histogram)?

@qw4990 qw4990 added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner epic/cardinality-estimation the optimizer cardinality estimation labels Oct 5, 2023
@AilinKid
Copy link
Contributor

AilinKid commented Nov 2, 2023

the related code about this is as follows

	// 3. use uniform distribution assumption for the rest (even when this value is not covered by the range of stats)
	histNDV := float64(c.Histogram.NDV - int64(c.TopN.Num()))
	if histNDV <= 0 {
		return 0, nil
	}
	return c.Histogram.notNullCount() / histNDV, nil

in first case, since topN is collected, and it is overlapped with the NDV indication in Histogram, so the computation here is meaningless, leading histNDV := float64(c.Histogram.NDV - int64(c.TopN.Num())) = 2-2 = 0

in the second case, since topN is not collected, histNDV := float64(c.Histogram.NDV - int64(c.TopN.Num())) = 2-0 = 2, and the Histogram.notNullCount() is 2 as well , causing the returned value as 2/2 = 1

@seiya-annie
Copy link

/report customer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment