Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

feat(dup): preserve data consistency during replica learn #355

Merged
merged 24 commits into from
Dec 26, 2019

Conversation

neverchanje
Copy link
Contributor

@neverchanje neverchanje commented Dec 10, 2019

We need to ensure the unduplicated mutation-logs are included during replica learn (Specifically, unduplicated means the mutations that aren't confirmed by meta-server to be duplicated) in order to prevent data-inconsistency between clusters.

The design docs here might be helpful to understand the mechanisms: https://pegasus-kv.github.io/2019/06/09/duplication-design.html#%E6%97%A5%E5%BF%97%E5%AE%8C%E6%95%B4%E6%80%A7

The normal procedure of learning without duplication

First, the learnee calculates learn_start_decree, ie. where learning should begin, then replicates the data between [learn_start_decree, the latest committed decree] to the learner.

100                   900
|       | | | | | | | |
        |
        flushed=500

As logs between [100, 500] have been flushed to rocksdb's SSTable, the learnee (primary) can ignore copying those logs. Assume learn_start_decree=200, learnee will copy rocksdb checkpoint first, then copy logs between (500, 900]. Assume learn_start_decree=600, learnee only copies logs [600, 900].

How does the learnee get learn_start_decree:
For now, learn_start_decree = learner's committed decree + 1. For example, if the learner bootstraps from scratch, learn_start_decree is 1, means to replicate all data.

Learning procedure with duplication

When duplication is enabled, the procedure needs changes. Since:

  1. the original learn_start_decree = learner's committed decree + 1 may ignore the unduplicated logs, since duplication may be much lagged behind 2PC, speak more specifically, the confirmed_decree may much smaller than learner's committed_decree.

To fix this problem, learn_start_decree should include not only "the data (logs/rdb) which learner doesn't have" but also the "unduplicated logs that learner doesn't have". See replica::get_learn_start_decree.

learnee:
100                   900
| | | | | | | | | | | |
    |
    confirmed_decree=300

learner:
|   rdb   |
          |
          committed_decree=500, no private log

learn_start_decree_no_dup = 501
learn_start_decree_for_dup = 301

Assume learner bootstrapped from a pure rdb, with decree=500, no private log. Originally only (500, 900] is learned, with duplication enabled, logs (300, 500] must be included. So finally it's [300, 900] of logs to be learned.

get_learn_start_decree

To copy the "unduplicated logs that learner doesn't have", the learnee needs to know the existing logs the learner has. We add max_gced_decree in learn_request for this reason.

If learner's max_gced_decree <= min_confirmed_decree + 1, it means the learner has the unduplicated logs. Therefore the learnee should perform as normal.

get_max_gced_decree_for_learn

On the learner side, there are two log dirs during the learning process, one 'plog/', the normal private-log path, the other 'learn/', the log files learned from learnee. The max_gced_decree is calculated from the compound of both dirs, via get_max_gced_decree_for_learn.

There's a problem: how can I get the max_gced_decree in learn/? To solve this we introduced first_learn_start_decree, which is the learn_start_decree in the first round of learning, stored in potential_secondary_context. It can be used to represent "the max_gced_decree under learn/".

We do not use previous_log_max_decrees to determine the max_gced_decree. To ensure data safety we cannot trust the files under learn/, because they may be stale.

Downside

The downside of this change is the increased logs to be copied during learning with duplication enabled, which may lead to slower rebalance.
TODO: We can add an option for duplication to trade-off data consistency with performance.

@hycdong
Copy link
Contributor

hycdong commented Dec 17, 2019

The downside of this change is the increased logs to be copied during learning, which may lead to
slower rebalance.

上面说的会让learn的log数量变多,是指所有learn的过程是吧?能不能只让正在热备的learn多一点文件,比如不在热备的时候confirmed_decree是一个特殊的decree

@neverchanje
Copy link
Contributor Author

The downside of this change is the increased logs to be copied during learning, which may lead to
slower rebalance.

上面说的会让learn的log数量变多,是指所有learn的过程是吧?能不能只让正在热备的learn多一点文件,比如不在热备的时候confirmed_decree是一个特殊的decree

文档写错了,是 learning with duplication,没有热备份的时候,learn 的流程与之前无异

src/dist/replication/lib/replica_learn.cpp Show resolved Hide resolved
src/dist/replication/lib/replica_learn.cpp Show resolved Hide resolved
src/dist/replication/lib/replica_learn.cpp Outdated Show resolved Hide resolved
src/dist/replication/lib/replica_learn.cpp Outdated Show resolved Hide resolved
src/dist/replication/lib/replica_learn.cpp Show resolved Hide resolved
src/dist/replication/lib/replica_learn.cpp Show resolved Hide resolved
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants