Raft heartbeat process in event base #438

critical27 · 2021-04-14T08:36:38Z

Our workflow looks like below, as for an write request

leader bgWorker	leader ioThreadPool	leader threadMananger	follower ioThreadPool	follower threadManager
	1. requestReceived
		2. processInThread
(heartbeat)	3. replicate
			4. receive appendLog
				5. process
	6. collectN
		7. commit

From the view of a storaged process, there would be many partition, we could be blocked in any phase if it takes longer time to process. Phase 5 and 7 are the main cause of blocking as before. When pressure is big enough, it is possible that all worker thread is busy. This would raise a lot problems, one of the most notorious among them is leader change.

Block reason:

lock
rocksdb write stall (while holding lock)

IMO, there should be only one phase could be blocked, which is leader commit (phase 7). To achieve that, many works need to do:

The second commit don not hold raft lock when commit will relax the restriction of raft lock. (we have another lock replicatingLogs to prevent concurrent replicate)
The third commit follower delay commit if write stall will make follower commit logs not to block (phase 5, follower could commit in async)
The first commit heartbeat refactor make heartbeat process in event base, even if all worker threads are blocked, there won't be unexpected election.

So, the heartbeat logic is:

If a log is on the fly, only send a heartbeat which will only be processed in eventbase.
If no log is on the fly, both a dummy log(previous behavior) and a heartbeat will be sent.

depends on vesoft-inc/nebula-common#497

CLAassistant · 2021-06-17T12:34:29Z

All committers have signed the CLA.

bright-starry-sky · 2021-06-23T03:19:42Z

src/kvstore/raftex/RaftPart.cpp

-        LogID lastLogIdCanCommit = std::min(lastLogId_, req.get_committed_log_id());
-        CHECK_LE(committedLogId_ + 1, lastLogIdCanCommit);
-        if (commitLogs(wal_->iterator(committedLogId_ + 1, lastLogIdCanCommit))) {
+        auto code = commitLogs(wal_->iterator(committedLogId_ + 1, lastLogIdCanCommit), false);


One question, when wait == false , we will ignore whether the log commits successfully or not.but why do you need to set the committedLogId_ at line 1671?

L1671 only set committedLogId_ in resp, but we didn't update the committedLogId_, when we commit successfully, we update the committedLogId_, and response the new one (Line 1661 1662).

liuyu85cn · 2021-06-23T03:39:38Z

src/kvstore/raftex/Host.cpp

+                pro.setValue(std::move(t.value()));
+            }
+        });
+    return promise.getFuture();


may be moved(not sure about this)?

critical27 added Ready-to-review labels Apr 14, 2021

critical27 requested review from dutor, darionyaphet, sherman-the-tank, liuyu85cn, bright-starry-sky and panda-sheep April 14, 2021 08:36

critical27 changed the title ~~Raft heartbeat in process in event base~~ Raft heartbeat process in event base Apr 14, 2021

critical27 force-pushed the raft_hb branch from b0ce443 to ca0bf07 Compare April 19, 2021 03:33

sherman-the-tank removed depend on common labels Apr 28, 2021

critical27 requested a review from linkensphere201 May 27, 2021 11:29

critical27 force-pushed the raft_hb branch from ca0bf07 to bc80389 Compare May 28, 2021 06:34

critical27 added depend on common PR: this PR depends on PRs in the common repo ready-for-testing PR: ready for the CI test labels May 31, 2021

critical27 force-pushed the raft_hb branch 2 times, most recently from 7719e45 to b89e773 Compare June 22, 2021 02:41

critical27 added 6 commits June 22, 2021 10:42

heartbeat refactor

31ccad9

don not hold raft lock when commit

c7b3775

follower delay commit if write stall

0a2a096

set rocksdb option in store worker

e3102f4

sort out trace_raft log

0243f06

unify canAppendLogs

b89e773

bright-starry-sky reviewed Jun 23, 2021

View reviewed changes

liuyu85cn reviewed Jun 23, 2021

View reviewed changes

address @liuyu85cn's comments

5951ddb

bright-starry-sky approved these changes Jun 23, 2021

View reviewed changes

liuyu85cn approved these changes Jun 23, 2021

View reviewed changes

Merge branch 'master' into raft_hb

c145b1d

critical27 merged commit 43bbea8 into vesoft-inc:master Jun 24, 2021

critical27 deleted the raft_hb branch June 24, 2021 03:42

This was referenced Jun 27, 2021

[test] Weekly Report 2021-06-27 vesoft-inc/nebula-community#11

Closed

Weekly Report 2021-06-27 vesoft-inc/nebula-community#12

Closed

critical27 mentioned this pull request Nov 16, 2021

Leader change often occurs after v2.0.0 vesoft-inc/nebula#2719

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raft heartbeat process in event base #438

Raft heartbeat process in event base #438

critical27 commented Apr 14, 2021 •

edited

Loading

CLAassistant commented Jun 17, 2021 •

edited

Loading

bright-starry-sky Jun 23, 2021

critical27 Jun 23, 2021

liuyu85cn Jun 23, 2021

critical27 Jun 24, 2021

Raft heartbeat process in event base #438

Raft heartbeat process in event base #438

Conversation

critical27 commented Apr 14, 2021 • edited Loading

CLAassistant commented Jun 17, 2021 • edited Loading

bright-starry-sky Jun 23, 2021

Choose a reason for hiding this comment

critical27 Jun 23, 2021

Choose a reason for hiding this comment

liuyu85cn Jun 23, 2021

Choose a reason for hiding this comment

critical27 Jun 24, 2021

Choose a reason for hiding this comment

critical27 commented Apr 14, 2021 •

edited

Loading

CLAassistant commented Jun 17, 2021 •

edited

Loading