Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restore: use token bucket to balance download requests. #49887

Merged
merged 17 commits into from
Jan 8, 2024

Conversation

3pointer
Copy link
Contributor

@3pointer 3pointer commented Dec 28, 2023

What problem does this PR solve?

Issue Number: ref #49886
Problem Summary:

  1. introduce use token bucket(downloadV2) to balance download requests when granlurity is coarse-grained.

What changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Copy link

ti-chi-bot bot commented Dec 28, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/invalid-title do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-tests-checked labels Dec 28, 2023
@sre-bot
Copy link
Contributor

sre-bot commented Dec 28, 2023

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 28, 2023
Copy link

tiprow bot commented Dec 28, 2023

Hi @3pointer. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@3pointer 3pointer changed the title Per store concurrency restore: use token bucket to balance download requests. Dec 28, 2023
@3pointer
Copy link
Contributor Author

Wait for https://github.com/pingcap/tidb/pull/48244/files merge first

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 4, 2024
@3pointer 3pointer marked this pull request as ready for review January 4, 2024 08:09
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 4, 2024
@3pointer 3pointer requested a review from Leavrth January 4, 2024 08:09
@3pointer
Copy link
Contributor Author

3pointer commented Jan 4, 2024

/ok-to-tests

Copy link

codecov bot commented Jan 4, 2024

Codecov Report

Merging #49887 (498160f) into master (9b0fd9e) will decrease coverage by 12.0160%.
Report is 8 commits behind head on master.
The diff coverage is 13.0281%.

Additional details and impacted files
@@                Coverage Diff                @@
##             master     #49887         +/-   ##
=================================================
- Coverage   79.3229%   67.3070%   -12.0160%     
=================================================
  Files          2447       2558        +111     
  Lines        673700     840148     +166448     
=================================================
+ Hits         534399     565479      +31080     
- Misses       117932     250900     +132968     
- Partials      21369      23769       +2400     
Flag Coverage Δ
integration 36.6604% <12.6760%> (?)
unit 79.2826% <4.9295%> (-0.0404%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 73.6130% <ø> (ø)
parser ∅ <ø> (∅)
br 71.7267% <13.0281%> (+3.2506%) ⬆️

@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge/needs-tests-checked labels Jan 4, 2024
@3pointer
Copy link
Contributor Author

3pointer commented Jan 4, 2024

/test pull-br-integration-test

Copy link

ti-chi-bot bot commented Jan 4, 2024

@3pointer: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test build
  • /test check-dev
  • /test check-dev2
  • /test mysql-test
  • /test pull-integration-ddl-test
  • /test pull-lightning-integration-test
  • /test pull-mysql-client-test
  • /test unit-test

The following commands are available to trigger optional jobs:

  • /test canary-notify-when-compatibility-sections-changed
  • /test pingcap/tidb/canary_ghpr_unit_test
  • /test pull-br-integration-test
  • /test pull-common-test
  • /test pull-e2e-test
  • /test pull-integration-common-test
  • /test pull-integration-copr-test
  • /test pull-integration-jdbc-test
  • /test pull-integration-mysql-test
  • /test pull-integration-nodejs-test
  • /test pull-sqllogic-test
  • /test pull-tiflash-test

Use /test all to run the following jobs that were automatically triggered:

  • pingcap/tidb/ghpr_build
  • pingcap/tidb/ghpr_check
  • pingcap/tidb/ghpr_check2
  • pingcap/tidb/ghpr_mysql_test
  • pingcap/tidb/ghpr_unit_test
  • pingcap/tidb/pull_integration_ddl_test
  • pingcap/tidb/pull_mysql_client_test

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

tiprow bot commented Jan 4, 2024

@3pointer: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test tiprow_fast_test

Use /test all to run all jobs.

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@3pointer
Copy link
Contributor Author

3pointer commented Jan 4, 2024

/test pull-br-integration-test

Copy link

tiprow bot commented Jan 4, 2024

@3pointer: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test tiprow_fast_test

Use /test all to run all jobs.

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@YuJuncen YuJuncen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

Comment on lines 1049 to 1052
for _, sstMeta := range resultMetasMap {
downloadMetas = append(downloadMetas, sstMeta)
}
return downloadMetas, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for _, sstMeta := range resultMetasMap {
downloadMetas = append(downloadMetas, sstMeta)
}
return downloadMetas, nil
return maps.Values(resultMetasMap), nil

}

mu.Lock()
sstMeta, ok := downloadMetasMap[file.Name]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will perfer make a small closure and use defer for unlocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like is a tiny logic flaw, no need to make closure.

sstMeta, ok := downloadMetasMap[file.Name]
if !ok {
mu.Unlock()
return errors.New("not found file key for download sstMeta")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also provide the file name (perhaps also the content of the map) here. (Is this code really reachable? Maybe panic here is also acceptable.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is unreachable. just a protect code.

}
var err error
var resp *import_sstpb.DownloadResponse
err = utils.WithRetry(ctx, func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe try utils.WithRetryV2, which allows us get the return value directly.

return errors.Annotate(berrors.ErrKVDownloadFailed, resp.GetError().GetMessage())
}
if resp.GetIsEmpty() {
return errors.Trace(berrors.ErrKVRangeIsEmpty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also provide the range of the file for debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had log outside

log.Warn("download file skipped",
	logutil.Files(files),
	logutil.Region(info.Region),
	logutil.Key("startKey", startKey),
	logutil.Key("endKey", endKey),
	logutil.Key("file-simple-start", files[0].StartKey),
	logutil.Key("file-simple-end", files[0].EndKey),

Copy link
Contributor

@YuJuncen YuJuncen Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the file's key range won't be surely printed? We only print the first file there. Even in most case there should be only one file, querying log is usually harder than the direct error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrKVRangeIsEmpty is always a retryable error and never break the restoration. it appears somehow doesn't really means error. I think is kind of feature rather than error.

br/pkg/restore/import.go Show resolved Hide resolved
br/pkg/restore/pipeline_items.go Outdated Show resolved Hide resolved
br/pkg/restore/pipeline_items.go Outdated Show resolved Hide resolved
br/tests/br_full/run.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@YuJuncen YuJuncen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

return errors.Annotate(berrors.ErrKVDownloadFailed, resp.GetError().GetMessage())
}
if resp.GetIsEmpty() {
return errors.Trace(berrors.ErrKVRangeIsEmpty)
Copy link
Contributor

@YuJuncen YuJuncen Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the file's key range won't be surely printed? We only print the first file there. Even in most case there should be only one file, querying log is usually harder than the direct error message.

sstMeta, ok := downloadMetasMap[file.Name]
if !ok {
mu.Unlock()
return errors.Errorf("not found file key for download sstMeta", file.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there isn't a printf placeholder in the format string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jan 5, 2024
@3pointer
Copy link
Contributor Author

3pointer commented Jan 8, 2024

/retest

1 similar comment
@3pointer
Copy link
Contributor Author

3pointer commented Jan 8, 2024

/retest

Comment on lines 536 to 542
stores, err := conn.GetAllTiKVStoresWithRetry(ctx, rc.pdClient, util.SkipTiFlash)
if err != nil {
log.Fatal("failed to get stores", zap.Error(err))
}
concurrencyPerStore := 512
concurrencyPerStore := rc.GetConcurrencyPerStore()
for _, store := range stores {
ch := make(chan struct{}, concurrencyPerStore)
for i := 0; i < concurrencyPerStore; i += 1 {
ch <- struct{}{}
}
ch := utils.BuildWorkerTokenChannel(concurrencyPerStore)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually only when useToKenBucket == true, the storeWorkerPoolMap need to be initialized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but I think it's okay to keep it here.

rc.workerPool = utils.NewWorkerPool(c, "file")
rc.concurrencyPerStore = c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need to be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

br/pkg/restore/import.go Outdated Show resolved Hide resolved
br/pkg/restore/import.go Outdated Show resolved Hide resolved
br/pkg/restore/import.go Outdated Show resolved Hide resolved
Copy link

ti-chi-bot bot commented Jan 8, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Leavrth, YuJuncen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 8, 2024
Copy link

ti-chi-bot bot commented Jan 8, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-01-05 08:59:56.437246112 +0000 UTC m=+1786.021499800: ☑️ agreed by YuJuncen.
  • 2024-01-08 07:36:04.745049656 +0000 UTC m=+255954.329303344: ☑️ agreed by Leavrth.

Copy link

tiprow bot commented Jan 8, 2024

@3pointer: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
tiprow_fast_test 498160f link true /test tiprow_fast_test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ti-chi-bot ti-chi-bot bot merged commit 5cc32b6 into pingcap:master Jan 8, 2024
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants