Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider region distribution in optimization phase #19312

Open
qw4990 opened this issue Aug 20, 2020 · 5 comments
Open

consider region distribution in optimization phase #19312

qw4990 opened this issue Aug 20, 2020 · 5 comments
Assignees
Labels
feature/accepted This feature request is accepted by product managers sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement. type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@qw4990
Copy link
Contributor

qw4990 commented Aug 20, 2020

Feature Request

Is your feature request related to a problem? Please describe:

In the IMDB test, we found a case that two full index scans on different indexes with limitation have a huge gap on performance.

image
image

In this case, we only need the first cop-task's data since there is a Limit 100. And because these two indexes have different region distributions, the number of rows need to scan to finish the first cop-task are different.

Describe the feature you'd like:

Consider region distribution in optimization phase for this case.

NOTE: introducing this physical information(region distribution) into the optimizer seems tricky, we need more discussion about this feature.

Describe alternatives you've considered:

No

Teachability, Documentation, Adoption, Migration Strategy:

@qw4990 qw4990 added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature. labels Aug 20, 2020
@qw4990 qw4990 self-assigned this Aug 20, 2020
@qw4990
Copy link
Contributor Author

qw4990 commented Aug 20, 2020

What are your opinions? @zz-jason @eurekaka @winoros

@eurekaka
Copy link
Contributor

That is what I mean in https://github.com/pingcap/tidb/blob/master/planner/core/find_best_task.go#L1577. Would the result change if we use stream coprocessor request?

@zz-jason
Copy link
Member

highly recommended to support this in the query optimizer:

  • region distribution stored in PD is nearly the real-time information, we can use it to adjust the selectivity estimation if possible
  • we can also take the network rpc times into the cost model
  • the regions can be regarded as a huge equal-depth histogram because of the region size limitation, if we can store moge information in a single region info, like NDV, avg mvcc keys, we can make the cost model more accurate as well.

@zz-jason zz-jason added the feature/accepted This feature request is accepted by product managers label Aug 25, 2020
@qw4990
Copy link
Contributor Author

qw4990 commented Sep 3, 2020

@tangwz and I will draft a proposal for this feature soon.

@tangwz
Copy link
Contributor

tangwz commented Sep 3, 2020

/assign.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/accepted This feature request is accepted by product managers sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement. type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants