Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: region score are different between log and metrics #5570

Open
lhy1024 opened this issue Sep 30, 2022 · 4 comments · May be fixed by #8741
Open

scheduler: region score are different between log and metrics #5570

lhy1024 opened this issue Sep 30, 2022 · 4 comments · May be fixed by #8741

Comments

@lhy1024
Copy link
Contributor

lhy1024 commented Sep 30, 2022

Bug Report

What did you do?

add a new node

What did you expect to see?

the cluster is balance

What did you see instead?

the cluster is no balance and their region score differ a lot, but pd doesn't create operator.

image

but log is different with metrics
[2022/09/30 06:54:42.576 +00:00] [DEBUG] [utils.go:124] ["skip balance region"] [scheduler=balance-region-scheduler] [region-id=193894476] [source-store=115121122] [target-store=717892230] [source-size=3230518] [source-score=3590635.5235656626] [source-influence=0] [target-size=68663] [target-score=1193743611.955962] [target-influence=0] [average-region-size=73] [tolerant-resource=16028]

Because, the score of metrics is RegionScore(s.opt.GetRegionScoreFormulaVersion(), s.opt.GetHighSpaceRatio(), s.opt.GetLowSpaceRatio(), 0)
the score of log is RegionScore(opts.GetRegionScoreFormulaVersion(), opts.GetHighSpaceRatio(), opts.GetLowSpaceRatio(), targetDelta)

this makes us cannot know why pd doesn't create operator by metrics.

What version of PD are you using (pd-server -V)?

master

@lhy1024 lhy1024 added the type/bug The issue is confirmed as a bug. label Sep 30, 2022
@ChenPeng2013
Copy link

/severity critical

@matchge-ca
Copy link
Contributor

/assign

@lhy1024
Copy link
Contributor Author

lhy1024 commented Oct 27, 2022

/severity critical

I don't think it's a critical problem, which makes us cannot know why PD doesn't create operators only by metrics. It has no effect on the scheduler.

When delta is nearly with source or date, there will be the similar case, in another word, the capacity of tikv is small. We need more suitable metrics to diagnose schedule in this case.

@mayjiang0203
Copy link

/remove-severity critical
/severity moderate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment