*: re-implement partition pruning for better performance #14679

tiancaiamao · 2020-02-07T14:42:16Z

What problem does this PR solve?

Re-implement partition pruning for better performance.

The old partition pruning's implementation uses an algorithm we called constraint propagate which is powerful, yet slow.

It works like this:

For each partition:
    construct EXPR =  (partition's expression AND query's expression)
    if FixPoint(constraint propagate EXPR) == AlwaysFalse
            Prune

It's powerful as it can propagate something like a > b and b > 3 => a > 3 and it also include some propagate rules for functions x > const1, f(x) < const2, f is monotonous => false.

It's possible to handle some cases like:

create table t (a int, b int) partition by (a) ...
select * from t where a > b and b > 3;

But the process is slow because constructing new expressions involve too many object allocation, and the constraint propagation process is also heavy. If there are many partitions, 2048 for example, the whole time spent on partition pruning would be significant.

What is changed and how it works?

The new partition pruning algorithm is much faster by using binary search and avoid constructing expressions.

The query expression would be abstract to f(col) op const, where op is one of > < = >= <=, in the code it's represent as dataForPrune.

The 'partition p0 less than xxx, partition p1 less than xxx, ...' forms an array [p0 p1 p2 ... maxvalue], represented as the lessThanData.

The new algorithm uses a binary search pruneUseBinarySearch to locate const in the array.

A simple benchmark,
Before:

session git:(partition-prune) ✗ go test -test.bench BenchmarkPartitionPruning -test.run Ignore
BenchmarkPartitionPruning-4   	       8	 128010294 ns/op

After:

session git:(partition-prune) ✗ go test -test.bench BenchmarkPartitionPruning -test.run Ignore
BenchmarkPartitionPruning-4   	      21	  57979203 ns/op

Some side notes:

As null values are all located in the first partition, the first partition expression in the old implementation is val < xxx or val is null. The or val is null condition is hard to eliminate so we can't prune the first partition in many cases.
The new implementation doesn't consider the null value, so the first partition is pruned.
There is a relax operation during the pruning, and it may lead to less accurate pruning results.
For example,

create table t (a datetime) partition by to_days(a) ...

This condition doesn't always hold:

a < const => to_days(a) < to_days(const)

A counterexample is:

2020-02-12 10:08:00 < 2020-02-12 23:59:59 => to_days(2020-02-12 10:08:00) = to_days(2020-02-12 23:59:59)

So we have to relax < to <= to handle functions.

a < const => to_days(a) <= to_days(const)

Check List

Tests

Unit test

Related changes

Need to cherry-pick to the release branch

Release note

Write release note for bug-fix or new feature.

The new partition pruning algorithm is not as powerful as the original one. However, it's much faster by using binary search and avoid constructing expressions.

tiancaiamao · 2020-02-12T02:15:06Z

/run-unit-test

PTAL @imtbkcat @zz-jason

planner/core/rule_partition_processor.go

planner/core/partition_pruning_test.go

imtbkcat

LGTM

tiancaiamao · 2020-02-14T03:57:33Z

Just now I run a benchmark on my own laptop and compare the result
(latency in milliseconds, the lower the better).

The old version:

Mean:  3650.8462739778993
Quantile 80:  4975.331340000001
Quantile 95:  7505.103714999999
Quantile 99:  9457.780304
Total Succ: 181

and the new version:

Mean:  2072.399426448846
Quantile 80:  3150.99298025
Quantile 95:  4735.244528499999
Quantile 99:  6287.017979
Total Succ: 303

lysu

rest LGTM

lysu · 2020-02-14T04:34:58Z

planner/core/rule_partition_processor.go

+func fullRange(end int) partitionRangeOR {
+	var reduceAllocation [3]partitionRange
+	reduceAllocation[0] = partitionRange{0, end}
+	return partitionRangeOR(reduceAllocation[:1])


Suggested change

return partitionRangeOR(reduceAllocation[:1])

return reduceAllocation[:1]

lysu · 2020-02-14T07:19:53Z

planner/core/rule_partition_processor.go

+	// Let M = intersection, U = union, then
+	// a M (b U c) == (a M b) U (a M c)
+	ret := or[:0]
+	for _, r1 := range or {


just a question: can we make sure or be sorted?

if so maybe we can more optimize to avoid {6, 7} U {3, 9} call in {0, 4}, {6, 7}, {8, 11} U {3, 9} or {3, +INF} U {6, 7} call in {3, +INF} U {0, 4}, {6, 7}- -?

it seems leaf exp (e.g. a > 3 in a > 3 and b < 5) can take benfit from this(although intersectionRange is fast)

Maintain the sorted condition is more restrictive.
The or array is usually small, so I guess loop over the whole array VS maintain the sorted array and break in advance don't have a big difference.

lysu

LGTM

…5628)

…5678)

…) (pingcap#15628)

*: re-implement partition pruning for better performance

7086301

The new partition pruning algorithm is not as powerful as the original one. However, it's much faster by using binary search and avoid constructing expressions.

tiancaiamao added the status/WIP label Feb 7, 2020

tiancaiamao requested review from a team as code owners February 7, 2020 14:42

ghost requested review from eurekaka and winoros and removed request for a team February 7, 2020 14:42

tiancaiamao added sig/planner SIG: Planner type/performance type/enhancement The issue or PR belongs to an enhancement. labels Feb 7, 2020

tiancaiamao changed the title *: re-implement partition pruning for better performance *: re-implement partition pruning for better performance [WIP] Feb 7, 2020

tiancaiamao added 7 commits February 10, 2020 15:24

handle OR expression

f427d4d

fix a test case

aeff030

tiny clean up

20087f4

Merge branch 'master' into partition-prune

cdbab0c

go mod tidy

8db6f9e

add the relax operation and fix 'const op col' order

4370f59

add tests

c1eb6f4

tiancaiamao removed the status/WIP label Feb 11, 2020

tiancaiamao changed the title *: re-implement partition pruning for better performance [WIP] *: re-implement partition pruning for better performance Feb 11, 2020

tiancaiamao added 3 commits February 12, 2020 09:02

Merge branch 'master' into partition-prune

ad5db90

make lint tools happy

e914d1a

make vet happy

21c6313

imtbkcat self-requested a review February 12, 2020 03:15

imtbkcat reviewed Feb 12, 2020

View reviewed changes

planner/core/rule_partition_processor.go Show resolved Hide resolved

imtbkcat reviewed Feb 12, 2020

View reviewed changes

planner/core/partition_pruning_test.go Show resolved Hide resolved

SunRunAway removed the request for review from a team February 13, 2020 08:06

tiancaiamao added 2 commits February 13, 2020 16:46

add some benchmark code

7d5586d

address comment

d5821c3

imtbkcat reviewed Feb 14, 2020

View reviewed changes

imtbkcat added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 14, 2020

lysu reviewed Feb 14, 2020

View reviewed changes

tiancaiamao added 2 commits February 14, 2020 18:54

address comment

378e424

Merge branch 'master' into partition-prune

d384f6c

lysu approved these changes Feb 14, 2020

View reviewed changes

tiancaiamao merged commit 9543a0f into pingcap:master Feb 14, 2020

tiancaiamao deleted the partition-prune branch February 14, 2020 11:18

tiancaiamao mentioned this pull request Feb 18, 2020

planner,expression,table: clean up the old partition pruning code #14834

Merged

tiancaiamao mentioned this pull request Mar 6, 2020

planner/core: get back range columns pruning after last refactor #15169

Merged

tiancaiamao added a commit to tiancaiamao/tidb that referenced this pull request Mar 13, 2020

*: re-implement partition pruning for better performance (pingcap#14679)

ad569a4

tiancaiamao mentioned this pull request Mar 13, 2020

*: re-implement partition pruning for better performance (#14679) #15368

Closed

tiancaiamao added a commit to tiancaiamao/tidb that referenced this pull request Mar 24, 2020

*: re-implement partition pruning for better performance (pingcap#14679)

16f8a36

tiancaiamao mentioned this pull request Mar 24, 2020

*: re-implement partition pruning for better performance (#14679) #15628

Merged

tiancaiamao added a commit to tiancaiamao/tidb that referenced this pull request Mar 25, 2020

*: re-implement partition pruning for better performance (pingcap#14679)

dc9e828

tiancaiamao mentioned this pull request Mar 25, 2020

*: re-implement partition pruning for better performance (#14679) #15678

Merged

sre-bot pushed a commit that referenced this pull request Mar 26, 2020

*: re-implement partition pruning for better performance (#14679) (#1…

a490e0e

…5628)

sre-bot pushed a commit that referenced this pull request Mar 30, 2020

*: re-implement partition pruning for better performance (#14679) (#1…

793c731

…5678)

tiancaiamao added a commit to tiancaiamao/tidb that referenced this pull request Apr 23, 2020

*: re-implement partition pruning for better performance (pingcap#14679…

09d5ae3

…) (pingcap#15628)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: re-implement partition pruning for better performance #14679

*: re-implement partition pruning for better performance #14679

tiancaiamao commented Feb 7, 2020 •

edited

Loading

tiancaiamao commented Feb 12, 2020

imtbkcat left a comment

tiancaiamao commented Feb 14, 2020 •

edited

Loading

lysu left a comment

lysu Feb 14, 2020

lysu Feb 14, 2020 •

edited

Loading

tiancaiamao Feb 14, 2020

lysu left a comment

	return partitionRangeOR(reduceAllocation[:1])
	return reduceAllocation[:1]

*: re-implement partition pruning for better performance #14679

*: re-implement partition pruning for better performance #14679

Conversation

tiancaiamao commented Feb 7, 2020 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

tiancaiamao commented Feb 12, 2020

imtbkcat left a comment

Choose a reason for hiding this comment

tiancaiamao commented Feb 14, 2020 • edited Loading

lysu left a comment

Choose a reason for hiding this comment

lysu Feb 14, 2020

Choose a reason for hiding this comment

lysu Feb 14, 2020 • edited Loading

Choose a reason for hiding this comment

tiancaiamao Feb 14, 2020

Choose a reason for hiding this comment

lysu left a comment

Choose a reason for hiding this comment

tiancaiamao commented Feb 7, 2020 •

edited

Loading

tiancaiamao commented Feb 14, 2020 •

edited

Loading

lysu Feb 14, 2020 •

edited

Loading