Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory lock in raft #3926

Merged
merged 5 commits into from
Apr 21, 2022
Merged

Memory lock in raft #3926

merged 5 commits into from
Apr 21, 2022

Conversation

liuyu85cn
Copy link
Contributor

@liuyu85cn liuyu85cn commented Feb 22, 2022

What type of PR is this?

  • bug
  • feature
  • enhancement

What problem(s) does this PR solve?

Issue(s) number:

Description:

Let add tag / edge use atomic Op again.

  1. before and include Nebula 2.0, we use atomicOp to deal with some atomic operation,
    e.g. change a tag/edge and its index in a batch.

    It works, but as we implement this by send raft log in a sync way.
    (all atomic op should be sent seperately, even if they are disjoint.) this is really slow.

  2. In 2.6.x we use memory lock to deal with concurrent control.
    We check early(in processor) if a request can run or not.
    If it can, then we do the get/put as a normal log, which we can treat in batch.
    if it can't, we return error.

    How ever, some user complain that they meet so many "Conflict error".
    That they need to retry, they believe it will slow down the bulk insert.
    We explained that those conflict has to be retry either in Nebula it self or client,
    but it looks like they didn't agree with us.

  3. So now we implement a hybrid mode for this.
    We had a memory lock in raft. just like Solution2. we check every logs to see if it can be combine with previous logs.
    If it can, then we send them in batch.
    if it can't, then we treat it like the atomicOp way (Solution1).

How do you solve it?

Special notes for your reviewer, ex. impact of this fix, design document, etc:

Checklist:

Tests:

  • Unit test(positive and negative cases)
  • Function test
  • Performance test
  • N/A

Affects:

  • Documentation affected (Please add the label if documentation needs to be modified.)
  • Incompatibility (If it breaks the compatibility, please describe it and add the label.)
  • If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
  • Performance impacted: Consumes more CPU/Memory

Release notes:

As describe in the "Description", conflict concurrent insert tag/edge will not report "Data conflict".
But execute in a queue.

Copy link
Contributor

@critical27 critical27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, generally LGTM. We could get rid of the ugly iterator finally...

replicatingLogs_ = false;
return;

// // Continue to process the original AppendLogsIterator if necessary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could check if cache empty here to continue? Otherwise all logs need to wait another round?

if (!promiseRef.isFulfilled()) {
promiseRef.setValue(code);
}
return MergeAbleCode::MERGE_BOTH;
Copy link
Contributor

@critical27 critical27 Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just drop it instead of still sending out? It is quite easy to do it now.
I am not pretty sure we can survive this case:

  1. atomic op failed
  2. send the log out anyway
  3. leader change
  4. new leader atomic op succeeded...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

src/kvstore/raftex/RaftPart.cpp Show resolved Hide resolved
src/kvstore/raftex/RaftPart.cpp Show resolved Hide resolved
src/kvstore/raftex/RaftPart.cpp Show resolved Hide resolved
}
} // namespace storage
ret.batch = encodeBatchValue(batchHolder->getBatch());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering whether we could move all log-encoding into raft later, because we need to first encode here, and decode in raft again, it also introduce some extra string copy.

Copy link
Contributor

@critical27 critical27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some insert and push_back could be replaced

@@ -1961,10 +1999,10 @@ bool RaftPart::checkAppendLogResult(nebula::cpp2::ErrorCode res) {
{
std::lock_guard<std::mutex> lck(logsLock_);
logs_.clear();
cachingPromise_.setValue(res);
cachingPromise_.reset();
// cachingPromise_.setValue(res);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set the promise in logs_ and sendingLogs_?

critical27
critical27 previously approved these changes Mar 28, 2022
Copy link
Contributor

@critical27 critical27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, we get rid of it finally...

@critical27
Copy link
Contributor

@kikimo Do I merge it first or wait until test? This is a important change, maybe risky since many code in raft has been modified.

@critical27
Copy link
Contributor

@liuyu85cn could write something about this PR in release note.

@kikimo
Copy link
Contributor

kikimo commented Mar 31, 2022

@kikimo Do I merge it first or wait until test? This is a important change, maybe risky since many code in raft has been modified.

@kikimo Do I merge it first or wait until test? This is a important change, maybe risky since many code in raft has been modified.

Hold, don't merge before I do the test.

class AppendLogsIteratorFactory {
public:
AppendLogsIteratorFactory() = default;
static void make(RaftPart::LogCache& cacheLogs, RaftPart::LogCache& sendLogs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nubility

@@ -511,6 +526,10 @@ TEST_F(RebuildIndexTest, RebuildEdgeIndexWithAppend) {
RebuildIndexTest::env_->rebuildIndexGuard_->clear();
writer->stop();
sleep(1);
for (int i = 1; i <= 5; ++i) {
LOG(INFO) << "sleep for " << i << "s";
sleep(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just increase the sleep time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because when run this case manually(watching the execution), it may be confused when sleep more than 1 seconds.

@Sophie-Xie Sophie-Xie added this to the v3.2.0 milestone Apr 1, 2022
@critical27 critical27 added the cherry-pick-v3.1 PR: need cherry-pick to this version label Apr 1, 2022
@Sophie-Xie Sophie-Xie removed the cherry-pick-v3.1 PR: need cherry-pick to this version label Apr 12, 2022
critical27
critical27 previously approved these changes Apr 21, 2022
Copy link
Contributor

@critical27 critical27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

src/kvstore/raftex/test/RaftexTestBase.cpp Outdated Show resolved Hide resolved
src/kvstore/raftex/RaftPart.cpp Outdated Show resolved Hide resolved
src/storage/mutate/AddEdgesProcessor.cpp Outdated Show resolved Hide resolved
src/storage/mutate/AddEdgesProcessor.cpp Outdated Show resolved Hide resolved
@Sophie-Xie Sophie-Xie added the cherry-pick-v3.1 PR: need cherry-pick to this version label Apr 21, 2022
@Sophie-Xie Sophie-Xie removed this from the v3.2.0 milestone Apr 21, 2022
@liwenhui-soul
Copy link
Contributor

I think it's a good job, do you agree?

Copy link
Contributor

@panda-sheep panda-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GGood job!

@Sophie-Xie Sophie-Xie merged commit 4112c7d into vesoft-inc:master Apr 21, 2022
Sophie-Xie added a commit that referenced this pull request Apr 21, 2022
* init upload

* type

* address comments: remove some comments

* ??

Co-authored-by: Sophie <[email protected]>
Sophie-Xie added a commit that referenced this pull request Apr 21, 2022
* init upload

* type

* address comments: remove some comments

* ??

Co-authored-by: Sophie <[email protected]>

Co-authored-by: [email protected] <[email protected]>
liuyu85cn added a commit that referenced this pull request Apr 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-v3.1 PR: need cherry-pick to this version ready for review ready-for-testing PR: ready for the CI test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stability: storage memory lock enhancement
6 participants