-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: restrict index range mem usage #37754
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest lgtm
util/ranger/ranger.go
Outdated
// Estimate whether rangeMaxSize will be exceeded first before appending points to ranges. | ||
if rangeMaxSize > 0 && len(origin) > 0 && len(rangePoints) > 0 { | ||
startPoint, err := convertPoint(sctx, rangePoints[0], ft) | ||
if err != nil { | ||
return nil, false, errors.Trace(err) | ||
} | ||
endPoint, err := convertPoint(sctx, rangePoints[1], ft) | ||
if err != nil { | ||
return nil, false, errors.Trace(err) | ||
} | ||
ran := appendPoint2Range(origin[0], startPoint, endPoint, ft) | ||
if ran.MemUsage()*int64(len(origin))*int64(len(rangePoints))/2 > rangeMaxSize { | ||
return origin, true, nil | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the first rangePoint has large/tiny data which caused the estimate difference too big?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the way to estimate mem usage of ranges. Now we iterating origin
and rangePoints
to sum up mem usage of datums.
} | ||
colEqConstant := isColEqConstant(filter, path.IdxCols[i]) | ||
if i == eqOrInCount && colEqConstant { | ||
// If there is a col-eq-constant condition for path.IdxCols[eqOrInCount], it means that range fallback happens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't understand this. Why does a col = constant
condition mean fallback happened?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If range fallback doesn't happen, DetachCondAndBuildRangeForIndex
must put col = constant
to path.AccessConds
rather than path.TableFilters
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i == eqOrInCount && colEqConstant
IF is to prevent the following case:
create table t (a int, b int, c int, index idx_a_b(a, b));
set @@tidb_opt_range_max_size=1000;
explain format='brief' select /*+ use_index(t, idx_a_b) */ * from t where a in (1, 3, 5) and b = 2;
DetachCondAndBuildRangeForIndex
puts a in (1, 3, 5)
into path.AccessConds
and puts b = 2
into path.TableFilters
. Without i == eqOrInCount && colEqConstant
IF, SplitCorColAccessCondFromFilters
adds b = 2
back into path.AccessConds
but doesn't rebuild ranges, which leads to wrong results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
As you said, I think another reason we need to do this here is that, if the additional AccessConds
added here are not caused by correlated columns, then we will not try to build ranges using the additional conditions, which will lead to incomplete range and then wrong results.
I think we can clarify it in the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The IF branch ensures that there must be some col-eq-corcol
condition in access
if len(access) > 0
. Actually the first one in access
must be some col-eq-corcol
condition. I add it to comment in code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a test case with expressions like (a,b) in ((1,2),(3,4)) and c = 5
to cover more considerDNF
code path of detachCNFCondAndBuildRangeForIndex()
.
} | ||
colEqConstant := isColEqConstant(filter, path.IdxCols[i]) | ||
if i == eqOrInCount && colEqConstant { | ||
// If there is a col-eq-constant condition for path.IdxCols[eqOrInCount], it means that range fallback happens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
As you said, I think another reason we need to do this here is that, if the additional AccessConds
added here are not caused by correlated columns, then we will not try to build ranges using the additional conditions, which will lead to incomplete range and then wrong results.
I think we can clarify it in the comments.
util/ranger/ranger.go
Outdated
} | ||
// Estimate whether rangeMaxSize will be exceeded first before appending ranges to point ranges. | ||
if rangeMaxSize > 0 && estimateMemUsageForAppendRanges2PointRanges(pointRanges, ranges) > rangeMaxSize { | ||
return ranges, true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we should return pointRanges
here to be consistent with other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Fixed.
util/ranger/ranger.go
Outdated
if len1 == 0 || len2 == 0 { | ||
return 0 | ||
} | ||
getDatumSize := func(rs Ranges) int64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we repeated the "adding up all memory usage of Datum
s in a Range
" logic several times (including the one in the MemUsage()
), I think we can unify them maybe some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use getRangesTotalDatumSize
and getPointsTotalDatumSize
to reuse common logic.
I Add some test cases to cover |
/run-unit-test |
/run-unit-test |
/run-unit-test |
/build |
/rebuild |
/build |
/run-build |
/run-build |
/run-unit-test |
/run-build |
1 similar comment
/run-build |
/run-mysql-test |
/run-build |
1 similar comment
/run-build |
/run-unit-test |
/run-unit-test |
/run-unit-test |
/run-build |
TiDB MergeCI notify✅ Well Done! New fixed [1] after this pr merged.
|
What problem does this PR solve?
Issue Number: ref #37176
Problem Summary:
What is changed and how it works?
Restrict index range mem usage. Part 3 of #37160.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.