Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/design: add proposal for skyline pruning #9184

Merged
merged 6 commits into from
Feb 12, 2019
Merged

Conversation

alivxxx
Copy link
Contributor

@alivxxx alivxxx commented Jan 25, 2019

What problem does this PR solve?

Add proposal for skyline pruning.

What is changed and how it works?

Check List

Tests

  • No code

Code changes

  • No code

Side effects

  • None

Related changes

  • None

@alivxxx alivxxx added type/enhancement The issue or PR belongs to an enhancement. component/docs labels Jan 25, 2019
@codecov-io
Copy link

codecov-io commented Jan 25, 2019

Codecov Report

Merging #9184 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9184      +/-   ##
==========================================
- Coverage   67.24%   67.24%   -0.01%     
==========================================
  Files         371      371              
  Lines       77223    77223              
==========================================
- Hits        51932    51930       -2     
+ Misses      20652    20651       -1     
- Partials     4639     4642       +3
Impacted Files Coverage Δ
util/systimemon/systime_mon.go 80% <0%> (-20%) ⬇️
executor/join.go 77.86% <0%> (-1.05%) ⬇️
executor/distsql.go 73% <0%> (+0.46%) ⬆️
expression/schema.go 94.53% <0%> (+0.78%) ⬆️
infoschema/infoschema.go 77.63% <0%> (+1.31%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b363f1...3ce656c. Read the comment docs.


From the query and schema, we can know that the access condition of `idx1` could strictly covers `idx2`, therefore the number of rows scanned by `idx1` will be no more than `idx2`, so `idx1` will be better than `idx2` in this case.

So how can we combine these factors to prune the access paths? Consider two access paths `x` and `y`, if `x` is not worse than `y` at all factors, and there exists one factor that `x` is better than `y`, then we can prune `y` before referring to the statistics, because `x` will works better than `y` at all circumstances. This is also called skyline pruning.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add all the rules that you decided to implement in the proposed skyline pruning process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually I mean this:

how can we compare the scan row count without statistics

For now, there is only one rule:

the access condition of idx1 could strictly covers idx2, therefore the number of rows scanned by idx1 will be no more than idx2, so idx1 will be better than idx2 in this case.

Is there any other rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the three factors will be rules, but the rule of double read and required properties in quite simple so I did not mention them. Do I need to describe them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first factor the number of rows that need to be scanned, is there any other heuristic rules now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, currently there is only one rule for the first factor.


## Proposal

The most important factors to choose the access paths are the number of rows that need to be scanned, whether or not it matches the physical property and does it require a double scan. Among these three factors, only the scan row count depends on statistics. So how can we compare the scan row count without statistics? Let's take a look at an example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these two factors have different priorities?
single read/ double read
required properties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are equal.

@shenli
Copy link
Member

shenli commented Jan 27, 2019

Is there any reference for the skyline pruning?

@alivxxx
Copy link
Contributor Author

alivxxx commented Jan 28, 2019

@shenli Yes, there is one paper that describes the skyline, but it is named skyline operator. Do we need to add it in the proposal?

docs/design/2019-01-25-skyline-pruning.md Outdated Show resolved Hide resolved
docs/design/2019-01-25-skyline-pruning.md Outdated Show resolved Hide resolved
docs/design/2019-01-25-skyline-pruning.md Outdated Show resolved Hide resolved
address comments

Co-Authored-By: lamxTyler <[email protected]>
Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason
Copy link
Member

@shenli Yes, there is one paper that describes the skyline, but it is named skyline operator. Do we need to add it in the proposal?

@lamxTyler Yes, we can.

@alivxxx alivxxx added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 11, 2019
Copy link
Contributor

@eurekaka eurekaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eurekaka eurekaka added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Feb 12, 2019
@alivxxx alivxxx merged commit 935a440 into pingcap:master Feb 12, 2019
@alivxxx alivxxx deleted the skyline branch February 12, 2019 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/docs status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants