-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planner: improve skyline pruning #26271
planner: improve skyline pruning #26271
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
/cc @winoros |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bad case:
create table t(a int, b int, c int, d int, index i(a,b,c));
explain select a,b,c from t where (a=1 and b=1 and c=1) or (a=1 and b=1 and c=2) order by c;
+--------------------------+---------+-----------+---------------------------+-----------------------------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------+---------+-----------+---------------------------+-----------------------------------------------------+
| Sort_5 | 0.03 | root | | test.t.c |
| └─IndexReader_9 | 0.03 | root | | index:IndexRangeScan_8 |
| └─IndexRangeScan_8 | 0.03 | cop[tikv] | table:t, index:i(a, b, c) | range:[1 1 1,1 1 2], keep order:false, stats:pseudo |
+--------------------------+---------+-----------+---------------------------+-----------------------------------------------------+
It's because the EqualCols
is not calculated correctly.
Seems like we also need to handle the pointRes
case in detachCNFCondAndBuildRangeForIndex
and the DNF case in detachCondAndBuildRangeForCols
.
@@ -619,6 +710,9 @@ type DetachRangeResult struct { | |||
AccessConds []expression.Expression | |||
// RemainedConds is the filter conditions which should be kept after access. | |||
RemainedConds []expression.Expression | |||
// ColumnValues records the constant column values for all index columns. | |||
// For the ith column, if it is evaluated as constant, ColumnValues[i] is its value. Otherwise ColumnValues[i] is nil. | |||
ColumnValues []*valueInfo | |||
// EqCondCount is the number of equal conditions extracted. | |||
EqCondCount int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This var can be deleted now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is slight difference between how ColumnValues
and EqCondCount
are calculated. For example,
mysql> create table t2(a int, b int, c int, d int, index idx_a_b_c_d(a, b, c, d));
Query OK, 0 rows affected (0.01 sec)
mysql> explain select * from t2 where ((a = 1 and b = 1 and d < 3) or (a = 1 and b = 1 and d > 6)) and c = 3 order by d;
+---------------------------+---------+-----------+-----------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+---------------------------+---------+-----------+-----------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| IndexReader_14 | 0.00 | root | | index:Selection_13 |
| └─Selection_13 | 0.00 | cop[tikv] | | eq(test.t2.c, 3), or(and(eq(test.t2.a, 1), and(eq(test.t2.b, 1), lt(test.t2.d, 3))), and(eq(test.t2.a, 1), and(eq(test.t2.b, 1), gt(test.t2.d, 6)))) |
| └─IndexRangeScan_12 | 10.00 | cop[tikv] | table:t2, index:idx_a_b_c_d(a, b, c, d) | range:[1,1], keep order:true, stats:pseudo |
+---------------------------+---------+-----------+-----------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
In the case, ColumnValues
detect a, b, c
are constant columns while EqCondCount
only detect a
is a constant column. Do we need to improve it in this pr or unify ColumnValues
and EqCondCount
in another pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can improve it later, not in this PR. What's your opinion? @qw4990 @time-and-fate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK to improve it later.
func compareIndexBack(lhs, rhs *candidatePath) (int, bool) { | ||
result := compareBool(lhs.isSingleScan, rhs.isSingleScan) | ||
if result == 0 && !lhs.isSingleScan { | ||
// if both lhs and rhs need to access table after IndexScan, we use the set of columns that occurred in IndexFilters | ||
// to compare how many table rows will be accessed. | ||
return compareColumnSet(lhs.indexFiltersColSet, rhs.indexFiltersColSet) | ||
} | ||
return result, true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any test case that can cover this rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. https://github.com/xuyifangreeneyes/tidb/blob/116e60b8ad38f7ebfa5ea68b09e87054723a9394/planner/core/logical_plan_test.go#L1703-L1710 In the two cases f_g
is better than f
due to this rule.
planner/core/find_best_task.go
Outdated
break | ||
} else if i >= path.EqCondCount { | ||
} | ||
if path.ConstCols == nil || !path.ConstCols[i] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding one more condition || i >= len(path.ConstCols) ||
for safety?
util/ranger/detacher.go
Outdated
return r, offset, columnValues, nil | ||
} | ||
|
||
func unionColumnValues(lhs, rhs []*valueInfo, numCols int) []*valueInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the numCols
should have become useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The logic in the util/ranger
is quite complicated though.
/merge |
This pull request has been accepted and is ready to merge. Commit hash: cf39a2c
|
What problem does this PR solve?
closes #26320
What is changed and how it works?
What's Changed & How it Works:
IndexFilters
, if the column set of path A is the subset of that of path B, then path B has less table rows to access than path A, so path B is better than path A on the factor. If there is no subset relation, then the two paths are incomparable.isMatchProp
factor,idx_a_b_c
actually matches the ordera, c
sinceb = 4
is constant. This pr takes the situation into consideration. (ps: this pr can also close #26017)Check List
Tests
Side effects
Documentation
Release note