Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11375. DN startup fails due to illegal configuration of raft.grpc.message.size.max #7128

Merged
merged 5 commits into from
Aug 28, 2024

Conversation

jojochuang
Copy link
Contributor

@jojochuang jojochuang commented Aug 28, 2024

What changes were proposed in this pull request?

HDDS-11375. DN Startup fails with "RuntimeException: Can't start the HDDS datanode plugin"

Please describe your PR in detail:

  • Remove the predefined hdds.ratis.raft.grpc.message.size, because its value of 32MB is not large enough. We should simply remove it and let it be calculated as hdds.container.ratis.log.appender.queue.byte-limit + 1MB.
  • Update hdds.container.ratis.log.appender.queue.byte-limit to 32MB in the integration test so that integration tests can reproduce the bug.

Note: HDDS-11320 did update hdds.ratis.raft.grpc.message.size to be hdds.container.ratis.log.appender.queue.byte-limit + 1MB, but later it got override here: https://github.com/apache/ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/XceiverServerRatis.java#L370

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11375

How was this patch tested?

https://github.com/jojochuang/ozone/actions/runs/10588626984
without removing hdds.ratis.raft.grpc.message.size, TestMiniOzoneCluster using Ratis master branch fails with the exact same error.

https://github.com/jojochuang/ozone/actions/runs/10589356146
Running TestMiniOzoneCluster using the fix and Ratis master branch does not fail.

Change-Id: I355dd25654864c1e917d3c13e0fb99b263d26de5
(cherry picked from commit 0127097cadc3ac318b56e548b6d59f68626faaef)
Change-Id: Ic9be1662a348c9c65c4df1140a5fb3aa353b3e50
…ult hdds.container.ratis.log.appender.queue.byte-limit in the integration test to 32MB

Change-Id: I76481ea60a0bd37e5f72007b882f741e9a1a82c6
Copy link
Contributor

@ashishkumar50 ashishkumar50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jojochuang Thanks for the fix, Change LGTM.

@adoroszlai adoroszlai changed the title HDDS-11375. DN Startup fails with "RuntimeException: Can't start the HDDS datanode plugin" HDDS-11375. DN startup fails due to illegal configuration of raft.grpc.message.size.max Aug 28, 2024
Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

The config removal needs to be clarified in the release notes.

@jojochuang jojochuang merged commit 5659b7e into apache:master Aug 28, 2024
42 checks passed
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Sep 16, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Sep 18, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Sep 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants