-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs/design: add proposal for skyline pruning #9184
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9184 +/- ##
==========================================
- Coverage 67.24% 67.24% -0.01%
==========================================
Files 371 371
Lines 77223 77223
==========================================
- Hits 51932 51930 -2
+ Misses 20652 20651 -1
- Partials 4639 4642 +3
Continue to review full report at Codecov.
|
|
||
From the query and schema, we can know that the access condition of `idx1` could strictly covers `idx2`, therefore the number of rows scanned by `idx1` will be no more than `idx2`, so `idx1` will be better than `idx2` in this case. | ||
|
||
So how can we combine these factors to prune the access paths? Consider two access paths `x` and `y`, if `x` is not worse than `y` at all factors, and there exists one factor that `x` is better than `y`, then we can prune `y` before referring to the statistics, because `x` will works better than `y` at all circumstances. This is also called skyline pruning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add all the rules that you decided to implement in the proposed skyline pruning process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a detailed explanation of the three factors at https://github.com/pingcap/tidb/pull/9184/files#diff-136059bc4631ad80be06dc8299b85b6dR17?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually I mean this:
how can we compare the scan row count without statistics
For now, there is only one rule:
the access condition of
idx1
could strictly coversidx2
, therefore the number of rows scanned byidx1
will be no more thanidx2
, soidx1
will be better thanidx2
in this case.
Is there any other rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the three factors will be rules, but the rule of double read
and required properties
in quite simple so I did not mention them. Do I need to describe them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first factor the number of rows that need to be scanned
, is there any other heuristic rules now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, currently there is only one rule for the first factor.
|
||
## Proposal | ||
|
||
The most important factors to choose the access paths are the number of rows that need to be scanned, whether or not it matches the physical property and does it require a double scan. Among these three factors, only the scan row count depends on statistics. So how can we compare the scan row count without statistics? Let's take a look at an example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these two factors have different priorities?
single read/ double read
required properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they are equal.
Is there any reference for the |
address comments Co-Authored-By: lamxTyler <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What problem does this PR solve?
Add proposal for skyline pruning.
What is changed and how it works?
Check List
Tests
Code changes
Side effects
Related changes