Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plan, partition: re-implement hash partition pruning to support in and or and some other functions #18574

Merged
merged 20 commits into from
Aug 28, 2020

Conversation

imtbkcat
Copy link

@imtbkcat imtbkcat commented Jul 15, 2020

What problem does this PR solve?

Issue Number: issue

Problem Summary:

Currently hash partition pruning using a naive method to get query range. This method just support find single point value. For condition like in (val1, val2, val3) and expression or expression, there is no way to prune. Moreover, this prune method just support very few partition expression like +-*/ and YEAR, MONTH.

What is changed and how it works?

What's Changed:

The new prune method using ranger to get query range, and for point range, using partition expression to get its partition.
Support prune sql like:
select * from hash_example where key in (1, 2, 3, 4).
select * from hash_example where (key1, key2) in ((1, 2), (3, 4))
select * from hash_example where key = 1 or key = 2

How it Works:

The pseudo code is like this:

prune_hash() {
        for each interval {
    	if (the interval has form "(t1.a, t1.b) = (const1, const2)" ) {
		calculate HASH(part_func(t1.a, t1.b));
		find which partition has records with this hash value and mark it as used;
              } else	{
           		mark all partitions as used;
           		break;
       	}
    }
}

Firstly, exacting columns from partition expression and construct a virtual index.
Next, using DetachCondAndBuildRangeForIndex to resolve the range of this virtual index.
If a range is a point, using expression to eval its value.

Related changes

  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test

Side effects

  • Breaking backward compatibility

Release note

  • Support in and or for hash partition pruning

@imtbkcat imtbkcat added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner labels Jul 15, 2020
@imtbkcat imtbkcat requested review from a team as code owners July 15, 2020 05:26
@imtbkcat imtbkcat requested review from SunRunAway and removed request for a team July 15, 2020 05:26
@codecov
Copy link

codecov bot commented Jul 15, 2020

Codecov Report

Merging #18574 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #18574   +/-   ##
===========================================
  Coverage   79.5832%   79.5832%           
===========================================
  Files           541        541           
  Lines        146277     146277           
===========================================
  Hits         116412     116412           
  Misses        20538      20538           
  Partials       9327       9327           

Copy link
Contributor

@tiancaiamao tiancaiamao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step1: use query condition to calculate range
Step2: take out range HighValue row, pass it to the partition expression to calculate pos
Step3: pos % hashNum to get the final partition

So it is just use calculate range to handle in, or expression and so on...
Calculate range itself is heavy, I doubt the performance of this version is worse than before?

@@ -6,7 +6,6 @@
"explain select * from t3 where id = 9 and a = 1",
"explain select * from t2 where id = 9 and a = -110",
"explain select * from t1 where id = -17",
"explain select * from t2 where id = a and a = b and b = 2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest comment it out make it not to run, instead of remove this.

{
"SQL": "explain select * from t6 where a = 7 or a = 6",
"Result": [
"PartitionUnion_9 60.00 root ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of this case is not the best?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, but i need to make ranger/pruner.go much simpler, there are many repeated there

if len(used) > 0 && used[0] == -1 {
return s.makeUnionAllChildren(ds, pi, fullRange(len(pi.Definitions)))
}
children := make([]LogicalPlan, 0, len(pi.Definitions))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid repeated code

@imtbkcat
Copy link
Author

@tiancaiamao I think this method cost nearly as same as the original one. The cost of build range is decide on the number of element in condition. For simple condition like naive equal, they are same and for complex condition, this method avoid full scan.

children := make([]LogicalPlan, 0, len(pi.Definitions))
for _, pos := range used {
if len(ds.partitionNames) > 0 && !s.findByName(ds.partitionNames, pi.Definitions[pos].Name.L) {
// For condition like `from t partition (p1) where a = 5`, but they are conflict, return TableDual directly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used is an array, if all of the used partitions are not in the selection, return table dual

from t partition (p1) where a = 5 or a = 10 ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, currently i use makeUnionAllChildren and remove convertToRangeOr to convert used into partitionRangeOR

@lysu
Copy link
Contributor

lysu commented Aug 19, 2020

ping @imtbkcat please resolve conflict if free

util/ranger/ranger.go Outdated Show resolved Hide resolved
@lysu lysu requested a review from qw4990 August 26, 2020 07:15
@lysu
Copy link
Contributor

lysu commented Aug 26, 2020

LGTM

@ti-srebot
Copy link
Contributor

@lysu,Thanks for your review. However, LGTM is restricted to Reviewers or higher roles.See the corresponding SIG page for more information. Related SIGs: planner(slack).

@lysu
Copy link
Contributor

lysu commented Aug 26, 2020

@winoros @qw4990 please help look, special how to reduce duplicate between ranger.go and pruner.go

@lysu lysu requested a review from tiancaiamao August 28, 2020 03:20
@lysu
Copy link
Contributor

lysu commented Aug 28, 2020

/run-all-tests tidb-test=pr/1079

@tiancaiamao
Copy link
Contributor

LGTM

@ti-srebot
Copy link
Contributor

@tiancaiamao,Thanks for your review. However, LGTM is restricted to Reviewers or higher roles.See the corresponding SIG page for more information. Related SIGs: planner(slack).

@winoros winoros added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 28, 2020
@winoros
Copy link
Member

winoros commented Aug 28, 2020

/run-unit-test

4 similar comments
@winoros
Copy link
Member

winoros commented Aug 28, 2020

/run-unit-test

@tiancaiamao
Copy link
Contributor

/run-unit-test

@imtbkcat
Copy link
Author

/run-unit-test

@imtbkcat
Copy link
Author

/run-unit-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants