Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics: inaccurate modify-count/row-count after analyzing a table #22934

Open
qw4990 opened this issue Feb 25, 2021 · 3 comments
Open

statistics: inaccurate modify-count/row-count after analyzing a table #22934

qw4990 opened this issue Feb 25, 2021 · 3 comments

Comments

@qw4990
Copy link
Contributor

qw4990 commented Feb 25, 2021

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

create table t(a int, b int);
insert into t values(1,1),(3,3),(4,4),(2,2),(5,5);
analyze table t ;
show stats_meta where table_name = 't'; // modify_count, row_count = 0, 5
// wait 2 minutes
show stats_meta where table_name = 't'; // modify_count, row_count = 5, 10

2. What did you expect to see? (Required)

These two show stats_meta statements should have the same result.

3. What did you see instead (Required)

They have different results.

4. What is your TiDB version? (Required)

Master

5. The Root cause of this problem.

Background: at the end of an insert statement, TiDB records the number of new-inserted rows as StatsDelta which will be updated into ModifiedCount and RowCount later by a background goroutine.

The Root Cause: when the analyze-table statement is processed, the StatsDelta generated by the previous insert statement has not been handled. When the analyze-table statement finished, we got ModifyCount, RowCount = 0, 5. And later when TiDB dealt with StatsDelta, the five rows were count again, and then we got ModifyCount, RowCount = 5, 10.

#15779 has the same root cause.

6. How to solve this problem.

We should abandon all StatsDelta whose generated-time is before the time of the last analyze-statement.
Here is another test:

func (s *statsSerialSuite) TestWrongModifyRowcount(c *C) {
	defer cleanEnv(c, s.store, s.do)
	tk := testkit.NewTestKit(c, s.store)
	tk.MustExec("use test")
	tk.MustExec("create table t(a int, b int)")
	tk.MustExec("insert into t values(1,1),(3,3),(4,4),(2,2),(5,5)")
	tk.MustExec("analyze table t")
	rs := tk.MustQuery("show stats_meta where table_name = 't'").Rows()
	for _, r := range rs {
		fmt.Println("--->>> ", r)
	}

	c.Assert(s.do.StatsHandle().DumpStatsDeltaToKV(handle.DumpAll), IsNil)
	c.Assert(s.do.StatsHandle().Update(s.do.InfoSchema()), IsNil)
	rs = tk.MustQuery("show stats_meta where table_name = 't'").Rows()
	for _, r := range rs {
		fmt.Println("--->>> ", r)
	}
}
@sunby
Copy link

sunby commented Mar 12, 2021

I want to have a try and I have few questions. Should we use the InitTime in TableDelta to record the generated-time? If so and we execute these sqls:

create table t(a int, b int);
insert into t values(1,1),(3,3),(4,4),(2,2),(5,5);
analyze table t ;
insert into t values(6,6);

These two inserts will use the same TableDelta whose generated-time is before the analyze-statement. Should we abandon the StatsDelta?

@jebter jebter added this to the v5.0.0 ga milestone Mar 21, 2021
@zimulala zimulala removed this from the v5.0.0 ga milestone Mar 23, 2021
@xuyifangreeneyes
Copy link
Contributor

I want to have a try and I have few questions. Should we use the InitTime in TableDelta to record the generated-time? If so and we execute these sqls:

create table t(a int, b int);
insert into t values(1,1),(3,3),(4,4),(2,2),(5,5);
analyze table t ;
insert into t values(6,6);

These two inserts will use the same TableDelta whose generated-time is before the analyze-statement. Should we abandon the StatsDelta?

IMHO, one way is that we abandon the TableDelta (or dump it to disk) before starting analyze so the second insert statement will use a new TableDelta. However, this method have problems when there are several TiDB instances. It may be hard to let one instance, who wants to execute analyze, ask another instance to abandon or dump TableDelta.

@qw4990
Copy link
Contributor Author

qw4990 commented Oct 9, 2023

A long-term solution should be removing modify count: https://docs.pingcap.com/tidb/dev/system-variables#tidb_opt_objective-new-in-v740

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants