Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fix potential case cause retention policy not working on topic level #21041

Merged
merged 3 commits into from
Aug 22, 2023

Conversation

Technoboy-
Copy link
Contributor

@Technoboy- Technoboy- commented Aug 21, 2023

Fixes #20968

Motivation

image

As described in 20968, even if the retention policy is set at the topic level, the ledger is still deleted. See the above log.

For onUpdate , there are two issues:

public void onUpdate(TopicPolicies policies) {
if (log.isDebugEnabled()) {
log.debug("[{}] update topic policy: {}", topic, policies);
}
if (policies == null) {
return;
}
updateTopicPolicy(policies);
shadowTopics = policies.getShadowTopics();
updateDispatchRateLimiter();
updateSubscriptionsDispatcherRateLimiter().thenRun(() -> {
updatePublishDispatcher();
updateSubscribeRateLimiter();
replicators.forEach((name, replicator) -> replicator.updateRateLimiter());
shadowReplicators.forEach((name, replicator) -> replicator.updateRateLimiter());
checkMessageExpiry();
checkReplicationAndRetryOnFailure();
checkDeduplicationStatus();
preCreateSubscriptionForCompactionIfNeeded();
// update managed ledger config
checkPersistencePolicies();
}).exceptionally(e -> {
Throwable t = e instanceof CompletionException ? e.getCause() : e;
log.error("[{}] update topic policy error: {}", topic, t.getMessage(), t);
return null;
});
}

  1. We print Policies updated successfully in method checkReplicationAndRetryOnFailure. it confuses users.
  2. If the topic is loaded before reader read the policies and checkPersistencePolicies throw exception, then the retention policy will not apply to ManagedLedger config. So trim ledger will still use previous polices. And here will cause the above issue.

Motivation

  1. print Policies updated successfully in the end.
  2. Use thenCompose to call async method.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@Technoboy- Technoboy- self-assigned this Aug 21, 2023
@Technoboy- Technoboy- added this to the 3.2.0 milestone Aug 21, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 21, 2023
@Technoboy- Technoboy- closed this Aug 21, 2023
@Technoboy- Technoboy- reopened this Aug 21, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 21, 2023
.thenCompose(__ -> checkReplicationAndRetryOnFailure())
.thenCompose(__ -> checkDeduplicationStatus())
.thenCompose(__ -> preCreateSubscriptionForCompactionIfNeeded())
.thenCompose(__ -> checkPersistencePolicies())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this PR fix the second issue in onUpdate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, right

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second issue said there are exceptions that happened when calling checkPersistencePolicies. Can we confirm that?

And is it possible that no exceptions happening on checkPersistencePolicies but the topic is loaded before the topic policy is loaded? And the trim task also happened before the topic policy is loaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second issue said there are exceptions that happened when calling checkPersistencePolicies. Can we confirm that

The user said he didn't notice any warn log.

And is it possible that no exceptions happening on checkPersistencePolicies but the topic is loaded before the topic policy is loaded? And the trim task also happened before the topic policy is loaded.

yes, it's possible

@codelipenghui codelipenghui added the type/bug The PR fixed a bug or issue reported a bug label Aug 21, 2023
@@ -457,4 +459,27 @@ public void testGetTopicPoliciesWithCleanCache() throws Exception {

result.join();
}

@Test
public void testConfig() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test can also get passed without this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

.thenCompose(__ -> checkReplicationAndRetryOnFailure())
.thenCompose(__ -> checkDeduplicationStatus())
.thenCompose(__ -> preCreateSubscriptionForCompactionIfNeeded())
.thenCompose(__ -> checkPersistencePolicies())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second issue said there are exceptions that happened when calling checkPersistencePolicies. Can we confirm that?

And is it possible that no exceptions happening on checkPersistencePolicies but the topic is loaded before the topic policy is loaded? And the trim task also happened before the topic policy is loaded.

@codelipenghui codelipenghui added the category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost label Aug 21, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #21041 (42bcbcf) into master (ee91edc) will increase coverage by 0.47%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #21041      +/-   ##
============================================
+ Coverage     72.67%   73.14%   +0.47%     
+ Complexity    32268    32186      -82     
============================================
  Files          1863     1875      +12     
  Lines        139397   139587     +190     
  Branches      15336    15349      +13     
============================================
+ Hits         101308   102103     +795     
+ Misses        30032    29414     -618     
- Partials       8057     8070      +13     
Flag Coverage Δ
inttests 24.19% <87.50%> (-0.05%) ⬇️
systests 25.06% <0.00%> (?)
unittests 72.42% <100.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...sar/broker/service/persistent/PersistentTopic.java 79.46% <100.00%> (+0.11%) ⬆️

... and 150 files with indirect coverage changes

@codelipenghui codelipenghui merged commit d3a6df3 into apache:master Aug 22, 2023
45 checks passed
poorbarcode pushed a commit that referenced this pull request Aug 29, 2023
Technoboy- added a commit that referenced this pull request Sep 5, 2023
Technoboy- added a commit that referenced this pull request Sep 11, 2023
Technoboy- added a commit that referenced this pull request Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost cherry-picked/branch-2.11 cherry-picked/branch-3.0 cherry-picked/branch-3.1 doc-not-needed Your PR changes do not impact docs ready-to-test release/2.9.5 release/2.10.7 release/2.11.3 release/3.0.2 release/3.1.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Brokers restart during read/writes causes data loss
7 participants