You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TiDB maintains some real-time statistics, like the real-time table-level row count and the modify count. They are collected in the DML statements and periodically dumped to update the statistics.
This mechanism is for keeping the statistics more up-to-date in addition to the ANALYZE. However, for some users, this mechanism turns out to make some execution plans unstable and make some diagnoses harder. Therefore, we want to provide a mode that the process of plan generation will not be affected by real-time statistics.
Design
As usual, this mode will be provided through a session/global level system variable. After discussion with the PM, we will use a relatively broad naming here.
Name: tidb_opt_objective
Type: string
Possible values: "moderate", "determinate" Moderate: the default value. The behavior is not changed. determinate: Optimizer tends to be conservative. The optimizer will ignore real-time statistics.
Implementation and other details
To make the optimizer ignore the real-time stats, we need to ignore the updates in the real-time row count and the modify count.
It might seem straightforward to add a method to access the two fields and check the system variable in it. However, there are too many places accessing them, and some of them should be controlled by the variable, others should not (because they are not in the plan generation process). In some places, we are even unable to access the session variable.
On the other hand, there's only one place where we fetch stats for tables from the stats cache for generating execution plans, which is getStatsTable(). We can copy the stats table and reset the two fields when real-time stats need to be ignored. In this approach, we can implement this without too many changes and intrusions.
The out-of-range estimation logic strongly relies on the modify count. The modify count acts as an upper bound to prevent the estimation from producing some ridiculous estimation result.
Now that we can't rely on the modify count, we have to come up with some other method to act as the upper bound. Obviously, it's impossible to have a reliable upper bound in all cases. After some tests, 1/NDV (only considering the histogram part) seems a reasonable and good enough upper bound for now. And we will use that as the upper bound when real-time stats need to be ignored.
Tests
Test the new out-of-range estimation logic when ignoring the modify count
Create a table with 3000 rows of data distributed evenly in [300, 900) on column a.
Analyze this table.
In determinate mode:
The estimated row count for a <= 300 should be 10.
The estimated row count for a <= 500 should be 1010.
The estimated row count for a > 900 should be 5.
The estimated row count for a <= 900 should be 3000.
Background
TiDB maintains some real-time statistics, like the real-time table-level row count and the modify count. They are collected in the DML statements and periodically dumped to update the statistics.
This mechanism is for keeping the statistics more up-to-date in addition to the ANALYZE. However, for some users, this mechanism turns out to make some execution plans unstable and make some diagnoses harder. Therefore, we want to provide a mode that the process of plan generation will not be affected by real-time statistics.
Design
As usual, this mode will be provided through a session/global level system variable. After discussion with the PM, we will use a relatively broad naming here.
Name:
tidb_opt_objective
Type: string
Possible values:
"moderate"
,"determinate"
Moderate
: the default value. The behavior is not changed.determinate
: Optimizer tends to be conservative. The optimizer will ignore real-time statistics.Implementation and other details
It might seem straightforward to add a method to access the two fields and check the system variable in it. However, there are too many places accessing them, and some of them should be controlled by the variable, others should not (because they are not in the plan generation process). In some places, we are even unable to access the session variable.
On the other hand, there's only one place where we fetch stats for tables from the stats cache for generating execution plans, which is
getStatsTable()
. We can copy the stats table and reset the two fields when real-time stats need to be ignored. In this approach, we can implement this without too many changes and intrusions.Now that we can't rely on the modify count, we have to come up with some other method to act as the upper bound. Obviously, it's impossible to have a reliable upper bound in all cases. After some tests, 1/NDV (only considering the histogram part) seems a reasonable and good enough upper bound for now. And we will use that as the upper bound when real-time stats need to be ignored.
Tests
Test the new out-of-range estimation logic when ignoring the modify count
[300, 900)
on columna
.determinate
mode:a <= 300
should be10
.a <= 500
should be1010
.a > 900
should be5
.a <= 900
should be3000
.Tested in
TestOutOfRangeEstimationAfterDelete
https://github.com/pingcap/tidb/pull/43988/files#diff-3cc7504d8fc864235df45cc223e1d7aa6e22c60ab1fd4f3e433da5a7ea6ee515R176
https://github.com/pingcap/tidb/pull/43988/files#diff-a6395d69455edb95e9af67348ee34af0c8d45297207da83043db186c5c735d1fR115-R119
https://github.com/pingcap/tidb/pull/43988/files#diff-b48131289978eff040416b28f5b809754186d5402e40c3451de456724b025cd9R238
Test basic estimation behaviors in both modes
moderate
mode, theTableFullScan
has the correct row count.determinate
mode, theTableFullScan
has a row count of 10000.TableFullScan
has the correct row count.moderate
mode, theTableFullScan
has the correct row count.determinate
mode, the execution plan is the same as in 4.Tested in
TestIgnoreRealtimeStats
https://github.com/pingcap/tidb/pull/43988/files#diff-3cc7504d8fc864235df45cc223e1d7aa6e22c60ab1fd4f3e433da5a7ea6ee515R1335
The text was updated successfully, but these errors were encountered: