This repository has been archived by the owner on Jun 23, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 58
feat(dup): preserve data consistency during replica learn #355
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
上面说的会让learn的log数量变多,是指所有learn的过程是吧?能不能只让正在热备的learn多一点文件,比如不在热备的时候 |
文档写错了,是 learning with duplication,没有热备份的时候,learn 的流程与之前无异 |
hycdong
reviewed
Dec 23, 2019
hycdong
reviewed
Dec 26, 2019
hycdong
approved these changes
Dec 26, 2019
levy5307
reviewed
Dec 26, 2019
levy5307
approved these changes
Dec 26, 2019
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We need to ensure the unduplicated mutation-logs are included during replica learn (Specifically, unduplicated means the mutations that aren't confirmed by meta-server to be duplicated) in order to prevent data-inconsistency between clusters.
The design docs here might be helpful to understand the mechanisms: https://pegasus-kv.github.io/2019/06/09/duplication-design.html#%E6%97%A5%E5%BF%97%E5%AE%8C%E6%95%B4%E6%80%A7
The normal procedure of learning without duplication
First, the learnee calculates
learn_start_decree
, ie. where learning should begin, then replicates the data between [learn_start_decree, the latest committed decree] to the learner.As logs between [100, 500] have been flushed to rocksdb's SSTable, the learnee (primary) can ignore copying those logs. Assume learn_start_decree=200, learnee will copy rocksdb checkpoint first, then copy logs between (500, 900]. Assume learn_start_decree=600, learnee only copies logs [600, 900].
How does the learnee get
learn_start_decree
:For now, learn_start_decree = learner's committed decree + 1. For example, if the learner bootstraps from scratch, learn_start_decree is 1, means to replicate all data.
Learning procedure with duplication
When duplication is enabled, the procedure needs changes. Since:
learn_start_decree = learner's committed decree + 1
may ignore the unduplicated logs, since duplication may be much lagged behind 2PC, speak more specifically, theconfirmed_decree
may much smaller thanlearner's committed_decree
.To fix this problem,
learn_start_decree
should include not only "the data (logs/rdb) which learner doesn't have" but also the "unduplicated logs that learner doesn't have". Seereplica::get_learn_start_decree
.Assume learner bootstrapped from a pure rdb, with decree=500, no private log. Originally only (500, 900] is learned, with duplication enabled, logs (300, 500] must be included. So finally it's [300, 900] of logs to be learned.
get_learn_start_decree
To copy the "unduplicated logs that learner doesn't have", the learnee needs to know the existing logs the learner has. We add
max_gced_decree
inlearn_request
for this reason.If learner's max_gced_decree <= min_confirmed_decree + 1, it means the learner has the unduplicated logs. Therefore the learnee should perform as normal.
get_max_gced_decree_for_learn
On the learner side, there are two log dirs during the learning process, one 'plog/', the normal private-log path, the other 'learn/', the log files learned from learnee. The
max_gced_decree
is calculated from the compound of both dirs, viaget_max_gced_decree_for_learn
.There's a problem: how can I get the max_gced_decree in
learn/
? To solve this we introducedfirst_learn_start_decree
, which is thelearn_start_decree
in the first round of learning, stored inpotential_secondary_context
. It can be used to represent "the max_gced_decree under learn/".We do not use
previous_log_max_decrees
to determine the max_gced_decree. To ensure data safety we cannot trust the files underlearn/
, because they may be stale.Downside
The downside of this change is the increased logs to be copied during learning with duplication enabled, which may lead to slower rebalance.
TODO: We can add an option for duplication to trade-off data consistency with performance.