Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: refactor hotspot scheduler #2099

Merged
merged 10 commits into from
Jan 20, 2020
Merged

Conversation

Luffbee
Copy link
Contributor

@Luffbee Luffbee commented Jan 16, 2020

What problem does this PR solve?

The hotspot scheduler has duplicate code and its logic is not very clear.

What is changed and how it works?

Refactored the hotScheduler. Mainly three part:

  • Rewrite pendingOpInfos related code (introduced by PR client: supports to add gRPC dial options (#2035) (#2043) #2046 and rename it to regionPendings. (the 1st commit)
  • Move duplicate code in balanceHot{Read, Write}Regions() into balanceBy{Peer, Leader}(). (the 2ed commit)
  • Use balanceSolver to replace balanceBy{Peer, Leader}(). (the 4th & 5th commit).

The reason for introducing balanceSolver is that the logic of balanceBy{Peer, Leader}() are similar, which can be described generally as balanceSolver.solve().

Check List

Tests

  • Unit test

@Luffbee Luffbee added component/schedule Scheduling logic. type/refactor The issue belongs to a refactor work. labels Jan 16, 2020
@Luffbee Luffbee added this to the v4.0.0-beta.1 milestone Jan 16, 2020
@codecov-io
Copy link

codecov-io commented Jan 16, 2020

Codecov Report

Merging #2099 into master will decrease coverage by 0.14%.
The diff coverage is 51.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2099      +/-   ##
==========================================
- Coverage   76.66%   76.51%   -0.15%     
==========================================
  Files         194      194              
  Lines       19756    19756              
==========================================
- Hits        15146    15117      -29     
- Misses       3478     3507      +29     
  Partials     1132     1132
Impacted Files Coverage Δ
server/config/config.go 87.5% <ø> (ø) ⬆️
server/statistics/store_collection.go 59.25% <100%> (ø) ⬆️
server/api/config.go 66.94% <100%> (ø) ⬆️
server/config/option.go 88.31% <42.85%> (ø) ⬆️
server/server.go 75.81% <48.63%> (-0.45%) ⬇️
pkg/dashboard/uiserver/uiserver.go 62.5% <62.5%> (ø) ⬆️
server/cluster/coordinator.go 67.3% <0%> (-11.27%) ⬇️
server/schedulers/shuffle_hot_region.go 64.58% <0%> (-5.21%) ⬇️
server/tso/tso.go 77.37% <0%> (-5.11%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c01db9...54460d9. Read the comment docs.

Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you test this version?

server/schedulers/hot_region.go Show resolved Hide resolved
}
}
if id != 0 && bs.cluster.GetStore(id) == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a metrics when cannot select src store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code do not have such metric, so I don't think this should be added in this PR.

pendingOpInfos: pendingOpInfos,
}
ret := newHotScheduler(opController)
ret.name = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why reset name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code didn't set name, so it should be empty string.

if len(candidateStoreIDs) == 0 {
continue
if pendings[movePeer] != nil ||
(pendings[transferLeader] != nil && !pendings[transferLeader].isDone()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add it in another PR.

@nolouch nolouch added the status/can-merge Indicates a PR has been approved by a committer. label Jan 20, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Jan 20, 2020

/run-all-tests

@sre-bot sre-bot merged commit 865ac28 into tikv:master Jan 20, 2020
@Luffbee Luffbee deleted the refactor-hot-region branch January 20, 2020 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/schedule Scheduling logic. status/can-merge Indicates a PR has been approved by a committer. type/refactor The issue belongs to a refactor work.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants