Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

backend/local: serial import engines with range overlap #451

Closed
wants to merge 8 commits into from

Conversation

glorv
Copy link
Contributor

@glorv glorv commented Nov 5, 2020

What problem does this PR solve?

In the current implementation, if several engines' key ranges have overlap, each of them likely split the same range into different regions. Thus if these engines run import at the same time, there will be a lot of Epoch not match errors, and lightning may fail if all the retries failed.

close #436

What is changed and how it works?

Add a btree to record all the running engine kv ranges, if a new engine start import but its range overlap with one or more range in the running engines, this engine will be lock until all the overlapped engine finish import phase.

After add lock for conflict ranges, there shouldn't be EpochNotMatch error anymore:

> grep 'error="epoch not match: EpochNotMatch' lighting-master.log | wc -l
43
> grep 'error="epoch not match: EpochNotMatch' lightning-lock-ranges.log | wc -l
14

Now after serially import engines, the EpochNotMatch error are all caused by big region auto split by tikv which is expected since the engines heavily overlapped with each other.

Benchmark result:
We do the benchmark base on the following three datasets:

  • 1k warehouse tpcc, all data and index engine don't have overlap with each other, so this pr should have no effect on this dataset.
  • dbgen generated SQL file with a random generated numeric primary key(50GB). Will total random primary, all the data engine will fully conflict with each other.
  • the same dataset as test 2, But add two extra secondary indexes, so the data engine will conflict with each other, but index engine won't conflict.
Test Suite Data Size Git Branch Time Cost
tpcc 71G lock-ranges 11m17s
tpcc 71G master 11min14s
dbgen-1-index 50G lock-ranges 8m18s
dbgen-1-index 50G master 6m23s
dbgen-3-index 50G lock-ranges 18m38s
dbgen-3-index 50G master 15m12s

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation
  • Need to update the tidb-ansible repository
  • Need to be included in the release note

@overvenus
Copy link
Member

Please add metrics to record epoch not match error.

@glorv glorv closed this Jan 7, 2021
@glorv glorv deleted the lock-ranges branch February 20, 2021 07:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ingest failed due to EpochNotMatch error
2 participants