Memory lock in raft #3926

liuyu85cn · 2022-02-22T07:27:40Z

What type of PR is this?

bug
feature
enhancement

What problem(s) does this PR solve?

Issue(s) number:

Description:

Let add tag / edge use atomic Op again.

before and include Nebula 2.0, we use atomicOp to deal with some atomic operation,
e.g. change a tag/edge and its index in a batch.

It works, but as we implement this by send raft log in a sync way.
(all atomic op should be sent seperately, even if they are disjoint.) this is really slow.
In 2.6.x we use memory lock to deal with concurrent control.
We check early(in processor) if a request can run or not.
If it can, then we do the get/put as a normal log, which we can treat in batch.
if it can't, we return error.

How ever, some user complain that they meet so many "Conflict error".
That they need to retry, they believe it will slow down the bulk insert.
We explained that those conflict has to be retry either in Nebula it self or client,
but it looks like they didn't agree with us.
So now we implement a hybrid mode for this.
We had a memory lock in raft. just like Solution2. we check every logs to see if it can be combine with previous logs.
If it can, then we send them in batch.
if it can't, then we treat it like the atomicOp way (Solution1).

How do you solve it?

Special notes for your reviewer, ex. impact of this fix, design document, etc:

Checklist:

Tests:

Unit test(positive and negative cases)
Function test
Performance test
N/A

Affects:

Documentation affected (Please add the label if documentation needs to be modified.)
Incompatibility (If it breaks the compatibility, please describe it and add the label.）
If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
Performance impacted: Consumes more CPU/Memory

Release notes:

As describe in the "Description", conflict concurrent insert tag/edge will not report "Data conflict".
But execute in a queue.

critical27

Good job, generally LGTM. We could get rid of the ugly iterator finally...

critical27 · 2022-03-17T09:23:59Z

src/kvstore/raftex/RaftPart.cpp

+      replicatingLogs_ = false;
+      return;
+
+      // // Continue to process the original AppendLogsIterator if necessary


Could check if cache empty here to continue? Otherwise all logs need to wait another round?

critical27 · 2022-03-17T09:31:56Z

src/kvstore/raftex/RaftPart.cpp

+          if (!promiseRef.isFulfilled()) {
+            promiseRef.setValue(code);
+          }
+          return MergeAbleCode::MERGE_BOTH;


How about just drop it instead of still sending out? It is quite easy to do it now.
I am not pretty sure we can survive this case:

atomic op failed

send the log out anyway

leader change

new leader atomic op succeeded...

src/kvstore/raftex/RaftPart.cpp

critical27 · 2022-03-17T10:31:29Z

src/storage/mutate/AddVerticesProcessor.cpp

  }
-}  // namespace storage
+  ret.batch = encodeBatchValue(batchHolder->getBatch());


I'm considering whether we could move all log-encoding into raft later, because we need to first encode here, and decode in raft again, it also introduce some extra string copy.

critical27

Some insert and push_back could be replaced

critical27 · 2022-03-21T04:02:23Z

src/kvstore/raftex/RaftPart.cpp

@@ -1961,10 +1999,10 @@ bool RaftPart::checkAppendLogResult(nebula::cpp2::ErrorCode res) {
    {
      std::lock_guard<std::mutex> lck(logsLock_);
      logs_.clear();
-      cachingPromise_.setValue(res);
-      cachingPromise_.reset();
+      // cachingPromise_.setValue(res);


Do we need to set the promise in logs_ and sendingLogs_?

critical27

Good job, we get rid of it finally...

critical27 · 2022-03-28T11:29:38Z

@kikimo Do I merge it first or wait until test? This is a important change, maybe risky since many code in raft has been modified.

critical27 · 2022-03-28T11:30:23Z

@liuyu85cn could write something about this PR in release note.

kikimo · 2022-03-31T02:26:26Z

@kikimo Do I merge it first or wait until test? This is a important change, maybe risky since many code in raft has been modified.

Hold, don't merge before I do the test.

panda-sheep · 2022-03-31T11:14:17Z

src/kvstore/raftex/RaftPart.cpp

+class AppendLogsIteratorFactory {
+ public:
+  AppendLogsIteratorFactory() = default;
+  static void make(RaftPart::LogCache& cacheLogs, RaftPart::LogCache& sendLogs) {


panda-sheep · 2022-03-31T11:16:26Z

src/storage/test/RebuildIndexTest.cpp

@@ -511,6 +526,10 @@ TEST_F(RebuildIndexTest, RebuildEdgeIndexWithAppend) {
  RebuildIndexTest::env_->rebuildIndexGuard_->clear();
  writer->stop();
  sleep(1);
+  for (int i = 1; i <= 5; ++i) {
+    LOG(INFO) << "sleep for " << i << "s";
+    sleep(1);


Why not just increase the sleep time

because when run this case manually(watching the execution), it may be confused when sleep more than 1 seconds.

critical27

Good job!

src/kvstore/raftex/test/RaftexTestBase.cpp

src/kvstore/raftex/RaftPart.cpp

src/storage/mutate/AddEdgesProcessor.cpp

liwenhui-soul · 2022-04-21T03:36:20Z

I think it's a good job, do you agree?

panda-sheep

GGood job!

* init upload * type * address comments: remove some comments * ?? Co-authored-by: Sophie <[email protected]>

* init upload * type * address comments: remove some comments * ?? Co-authored-by: Sophie <[email protected]> Co-authored-by: [email protected] <[email protected]>

This reverts commit 4112c7d.

Sophie-Xie linked an issue Feb 25, 2022 that may be closed by this pull request

stability: storage memory lock enhancement #3763

Closed

Sophie-Xie mentioned this pull request Feb 28, 2022

stability: memory usage and error handling when import #3762

Closed

liuyu85cn added the ready for review label Mar 17, 2022

liuyu85cn marked this pull request as ready for review March 17, 2022 06:54

liuyu85cn requested review from critical27 and sherman-the-tank as code owners March 17, 2022 06:54

liuyu85cn force-pushed the memory_in_raft branch from cc79619 to c3683ea Compare March 17, 2022 07:28

Sophie-Xie added the ready-for-testing PR: ready for the CI test label Mar 17, 2022

critical27 reviewed Mar 17, 2022

View reviewed changes

critical27 reviewed Mar 21, 2022

View reviewed changes

liuyu85cn force-pushed the memory_in_raft branch from eba1a77 to a53b365 Compare March 21, 2022 04:19

Sophie-Xie added this to the v3.1.0 milestone Mar 21, 2022

liuyu85cn force-pushed the memory_in_raft branch from 52f9f1d to 9ab0b7e Compare March 23, 2022 08:39

Sophie-Xie removed this from the v3.1.0 milestone Mar 24, 2022

liuyu85cn force-pushed the memory_in_raft branch from b884a87 to 329cbd0 Compare March 28, 2022 08:14

critical27 previously approved these changes Mar 28, 2022

View reviewed changes

critical27 requested review from kikimo, liwenhui-soul and pengweisong March 29, 2022 11:00

panda-sheep reviewed Mar 31, 2022

View reviewed changes

Sophie-Xie added this to the v3.2.0 milestone Apr 1, 2022

liuyu85cn dismissed critical27’s stale review via 8ef6ec9 April 1, 2022 07:49

critical27 added the cherry-pick-v3.1 PR: need cherry-pick to this version label Apr 1, 2022

liuyu85cn force-pushed the memory_in_raft branch from 22003dc to 8b39a36 Compare April 8, 2022 06:20

Sophie-Xie removed the cherry-pick-v3.1 PR: need cherry-pick to this version label Apr 12, 2022

liuyu85cn force-pushed the memory_in_raft branch from b3f3779 to ecc05cd Compare April 19, 2022 02:28

init upload

8599d0b

liuyu85cn force-pushed the memory_in_raft branch from ecc05cd to 8599d0b Compare April 19, 2022 02:30

type

25f7162

critical27 previously approved these changes Apr 21, 2022

View reviewed changes

Sophie-Xie added the cherry-pick-v3.1 PR: need cherry-pick to this version label Apr 21, 2022

Sophie-Xie removed this from the v3.2.0 milestone Apr 21, 2022

address comments: remove some comments

9b69cf4

liuyu85cn dismissed critical27’s stale review via 9b69cf4 April 21, 2022 02:47

liuyu85cn force-pushed the memory_in_raft branch from 4818e6a to 9b69cf4 Compare April 21, 2022 02:47

??

a9b8cc1

critical27 approved these changes Apr 21, 2022

View reviewed changes

panda-sheep approved these changes Apr 21, 2022

View reviewed changes

Merge branch 'master' into memory_in_raft

044281c

Sophie-Xie merged commit 4112c7d into vesoft-inc:master Apr 21, 2022

Sophie-Xie added a commit that referenced this pull request Apr 21, 2022

Memory lock in raft (#3926)

cc24e82

* init upload * type * address comments: remove some comments * ?? Co-authored-by: Sophie <[email protected]>

Sophie-Xie added a commit that referenced this pull request Apr 21, 2022

Memory lock in raft (#3926) (#4195)

d0ad877

* init upload * type * address comments: remove some comments * ?? Co-authored-by: Sophie <[email protected]> Co-authored-by: [email protected] <[email protected]>

liuyu85cn added a commit that referenced this pull request Apr 22, 2022

Revert "Memory lock in raft (#3926)"

0b66a37

This reverts commit 4112c7d.

liuyu85cn mentioned this pull request Apr 22, 2022

Revert "Memory lock in raft" #4202

Closed

jamieliu1023 mentioned this pull request Apr 23, 2022

Weekly Report 2022-04-22 vesoft-inc/nebula-community#105

Closed

wey-gu mentioned this pull request May 20, 2022

Fallback to atomicOp for deletion processors when it cannot be batch #4272

Open

critical27 mentioned this pull request Oct 17, 2022

fix inappropriate error code from raft #4737

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory lock in raft #3926

Memory lock in raft #3926

liuyu85cn commented Feb 22, 2022 •

edited

Loading

critical27 left a comment

critical27 Mar 17, 2022

critical27 Mar 17, 2022 •

edited

Loading

liuyu85cn Mar 21, 2022

critical27 Mar 17, 2022

critical27 left a comment

critical27 Mar 21, 2022

critical27 left a comment

critical27 commented Mar 28, 2022

critical27 commented Mar 28, 2022

kikimo commented Mar 31, 2022

panda-sheep Mar 31, 2022

panda-sheep Mar 31, 2022

liuyu85cn Apr 12, 2022

critical27 left a comment

liwenhui-soul commented Apr 21, 2022

panda-sheep left a comment

Memory lock in raft #3926

Memory lock in raft #3926

Conversation

liuyu85cn commented Feb 22, 2022 • edited Loading

What type of PR is this?

What problem(s) does this PR solve?

Issue(s) number:

Description:

How do you solve it?

Special notes for your reviewer, ex. impact of this fix, design document, etc:

Checklist:

Release notes:

critical27 left a comment

Choose a reason for hiding this comment

critical27 Mar 17, 2022

Choose a reason for hiding this comment

critical27 Mar 17, 2022 • edited Loading

Choose a reason for hiding this comment

liuyu85cn Mar 21, 2022

Choose a reason for hiding this comment

critical27 Mar 17, 2022

Choose a reason for hiding this comment

critical27 left a comment

Choose a reason for hiding this comment

critical27 Mar 21, 2022

Choose a reason for hiding this comment

critical27 left a comment

Choose a reason for hiding this comment

critical27 commented Mar 28, 2022

critical27 commented Mar 28, 2022

kikimo commented Mar 31, 2022

panda-sheep Mar 31, 2022

Choose a reason for hiding this comment

panda-sheep Mar 31, 2022

Choose a reason for hiding this comment

liuyu85cn Apr 12, 2022

Choose a reason for hiding this comment

critical27 left a comment

Choose a reason for hiding this comment

liwenhui-soul commented Apr 21, 2022

panda-sheep left a comment

Choose a reason for hiding this comment

liuyu85cn commented Feb 22, 2022 •

edited

Loading

critical27 Mar 17, 2022 •

edited

Loading