[IOTDB-1583] Raft log failed to be committed in cluster version #3832

cigarl · 2021-08-25T09:24:12Z

The bug is caused by a raft log is oversize.

When the size of a log is bigger than buffer (default 16MB),it can not be persistent due to the BufferOverflowException. But the log has committed in memory, we will commit the same log again.

So,we should check the size of the log at the entrance to avoid that leader will send this log to followers.

BTW, this case may cause the next plan failed in server mode. Because we didn't clear the buffer, it still contains some information from the previous log.

coveralls · 2021-08-25T10:19:47Z

Coverage increased (+0.04%) to 67.283% when pulling 75463a2 on cigarl:master into 4d66b1d on apache:master.

qiaojialin · 2021-08-26T04:23:28Z

cluster/src/main/java/org/apache/iotdb/cluster/server/member/RaftMember.java

@@ -1032,6 +1034,15 @@ TSStatus processPlanLocally(PhysicalPlan plan) {
      log.setCurrLogTerm(getTerm().get());
      log.setCurrLogIndex(logManager.getLastLogIndex() + 1);

+      // if a single log exceeds the threshold
+      // we need to return error code to the client as in server mode
+      if ((int) RamUsageEstimator.sizeOf(log) + Integer.BYTES


RamUsageEstimator.sizeOf() is larger than the serialized size of the log, this check may have a risk

Yep. Let me update it. But serializing here probably means that each log will be serialized many times.
With the current structure of the code, there seems to be no better solution.

chengjianyun

Discussed at offline, status code of the error could be refined later.

chengjianyun · 2021-08-26T03:35:21Z

cluster/src/main/java/org/apache/iotdb/cluster/server/member/RaftMember.java

+        logger.error(
+            "Log cannot fit into buffer, please increase raft_log_buffer_size;"
+                + "or reduce the size of requests you send.");
+        return StatusUtils.INTERNAL_ERROR;


Looks like EXECUTE_STATEMENT_ERROR should be a more proper status code to tell user.

chengjianyun · 2021-08-26T03:44:23Z

cluster/src/main/java/org/apache/iotdb/cluster/server/member/RaftMember.java

@@ -1032,6 +1034,15 @@ TSStatus processPlanLocally(PhysicalPlan plan) {
      log.setCurrLogTerm(getTerm().get());
      log.setCurrLogIndex(logManager.getLastLogIndex() + 1);

+      // if a single log exceeds the threshold
+      // we need to return error code to the client as in server mode
+      if ((int) RamUsageEstimator.sizeOf(log) + Integer.BYTES


The buffer size to be used by the object is the bytes size after serialize the object. I don't think the memory size of a object is equals to its serialized bytes size in most of cases.

neuyilan

LGTM

wangchao316 · 2021-08-30T11:05:12Z

cluster/src/main/java/org/apache/iotdb/cluster/server/member/RaftMember.java

+          >= ClusterDescriptor.getInstance().getConfig().getRaftLogBufferSize()) {
+        logger.error(
+            "Log cannot fit into buffer, please increase raft_log_buffer_size;"
+                + "or reduce the size of requests you send.");


We need to control the log size. We can't let him have this anomaly.

If an exception occurs, the server obtains the exception and then splits the log file or uses other methods to apply the log file.

We need to control the log size. We can't let him have this anomaly.

If an exception occurs, the server obtains the exception and then splits the log file or uses other methods to apply the log file.

When this problem occurs, the service can only be restarted. And the current modification method is consistent with the single server mode. So the operation of splitting logs involves several modules, we may need to control the plan size but not only log.
We may need to spend some time on this part in the future to make it better.

I think this needs further discussion, is some error happened during raft log apply, maybe only one raft group failed, it will lose some data, and cause the raft group not consistent with each other, how to deal with the exception?

For now, I think it ok just throw the exception and not let the log be committed.

ok, Thanks for submit a new issues for further discussion in jira, this pr could merged.

Thanks for all your suggesions，I will merge the pr, further discussion can be found here #3856

cigarl added 2 commits August 25, 2021 16:43

[IOTDB-1583] Raft log failed to be committed in cluster version

91061a6

[IOTDB-1583] Raft log failed to be committed in cluster version

dda6da6

HTHou added the Module - Cluster PRs for the cluster module label Aug 25, 2021

cigarl added 3 commits August 25, 2021 18:41

delete blank line

88b9dd8

[IOTDB-1583] Raft log failed to be committed in cluster version

90393fe

Merge remote-tracking branch 'origin/master' into mine/master

a82e276

qiaojialin reviewed Aug 26, 2021

View reviewed changes

cigarl added 6 commits August 26, 2021 14:19

use the result of serialization

a04a742

spotless apply

313b67d

[IOTDB-1583] Raft log failed to be committed in cluster version

1597e24

delete blank line

18b7892

[IOTDB-1583] Raft log failed to be committed in cluster version

f2e7726

use the result of serialization

75463a2

chengjianyun approved these changes Aug 26, 2021

View reviewed changes

jixuan1989 mentioned this pull request Aug 27, 2021

About refine the exception handling logic in method commitTo in RaftLogManager #3856

Open

neuyilan approved these changes Aug 27, 2021

View reviewed changes

wangchao316 reviewed Aug 30, 2021

View reviewed changes

neuyilan merged commit bc696e9 into apache:master Aug 31, 2021

chengjianyun mentioned this pull request Sep 16, 2021

Integrate Apache Ratis to help manage Raft status #3954

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IOTDB-1583] Raft log failed to be committed in cluster version #3832

[IOTDB-1583] Raft log failed to be committed in cluster version #3832

cigarl commented Aug 25, 2021 •

edited

Loading

coveralls commented Aug 25, 2021 •

edited

Loading

qiaojialin Aug 26, 2021

cigarl Aug 26, 2021

chengjianyun left a comment

chengjianyun Aug 26, 2021

chengjianyun Aug 26, 2021

neuyilan left a comment

wangchao316 Aug 30, 2021

cigarl Aug 30, 2021

neuyilan Aug 31, 2021

wangchao316 Aug 31, 2021

neuyilan Aug 31, 2021

[IOTDB-1583] Raft log failed to be committed in cluster version #3832

[IOTDB-1583] Raft log failed to be committed in cluster version #3832

Conversation

cigarl commented Aug 25, 2021 • edited Loading

coveralls commented Aug 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chengjianyun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neuyilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cigarl commented Aug 25, 2021 •

edited

Loading

coveralls commented Aug 25, 2021 •

edited

Loading