Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[store/tikv] avoid too many times of backoff when we retry for batch cop #18999

Merged
merged 6 commits into from
Aug 5, 2020

Conversation

hanfei1991
Copy link
Member

What problem does this PR solve?

Problem Summary:

When a store has failed and we retry a batch cop request, we apply backoff for every region, which results in too long wait-time. Indeed we should backoff once for every store fail.

What is changed and how it works?

we should backoff once for every store fail.

What's Changed:

How it Works:

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Manual test
    • we simulate the batch cop request which many regions and a store crashes. After change, tidb can switch to another store rapidly.

Side effects

None

Release note

  • avoid too many times of backoff when we retry for batch cop

@codecov
Copy link

codecov bot commented Aug 5, 2020

Codecov Report

Merging #18999 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #18999   +/-   ##
===========================================
  Coverage   79.3846%   79.3846%           
===========================================
  Files           546        546           
  Lines        147943     147943           
===========================================
  Hits         117444     117444           
  Misses        21007      21007           
  Partials       9492       9492           

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Aug 5, 2020

Normally run select count(*) from tpch_1.lineitem takes about 0.06 seconds.
Before this PR, after some TiFlash stores are down, with tidb_allow_batch_cop=1, running count(*) takes about a few seconds to tens of seconds. While with tidb_allow_batch_cop=0, it only takes about 0.1x seconds.
After this PR, there should be no significant difference no matter super batch is enabled or not.

@lysu lysu requested a review from lzmhhh123 August 5, 2020 11:16
Copy link
Contributor

@lysu lysu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 5, 2020
Copy link
Contributor

@lzmhhh123 lzmhhh123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 5, 2020
@lzmhhh123
Copy link
Contributor

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Aug 5, 2020
@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot ti-srebot merged commit 5dec304 into pingcap:master Aug 5, 2020
ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Aug 5, 2020
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #19017

@hanfei1991 hanfei1991 deleted the hanfei/fix-back-off branch August 6, 2020 04:54
ti-srebot added a commit that referenced this pull request Aug 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants