Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Avoid infinite bundle unloading #20822

Conversation

Demogorgon314
Copy link
Member

@Demogorgon314 Demogorgon314 commented Jul 17, 2023

Motivation

The bundle ownership assignment logic doesn't know the previous unloaded broker when unloading happens. It might assign the bundle to the same broker. In this case, it might cause an infinite bundle unloading loop.

To resolve this issue, we can check the destination broker when doing the doLoadShedding stage. When the destination broker is the same as the current owner broker, we can skip this unload, and if it is different, we set the new owner in this stage.

Modifications

  • Transfer the bundle to the new owner when needed to unload.

Verifying this change

See the ModularLoadManagerImplTest#testLoadShedding unit test.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@Demogorgon314 Demogorgon314 added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/broker labels Jul 17, 2023
@Demogorgon314 Demogorgon314 self-assigned this Jul 17, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jul 17, 2023
@Demogorgon314 Demogorgon314 force-pushed the Demogorgon314/Avoid-assign-bundles-to-previous-unloaded-broker branch from f953f89 to bcbc1af Compare July 17, 2023 10:04
@Demogorgon314 Demogorgon314 marked this pull request as ready for review July 18, 2023 02:04
Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title should be a fix for the infinite bundle unloading.

If I understand correctly, a breaking change will be introduced to the behavior of loadBalancerDistributeBundlesEvenlyEnabled because the current solution just filters out the previous owner, then the bundle can be moved to other brokers. I think the correct solution should be to skip the bundle unloading if the next owner is the same one as the current owner.

IMO, the main point is we don't care about the next owner broker when performing the bundle unloading. But the bundle unloading and the owner selection are based on the same load data. So is it worth checking the next owner at the unloading stage? @Demogorgon314 @heesung-sn

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I noticed no one put data to LoadData.brokerData and LoadData.bundleData except for the test. Is it still useful? or maybe I missed something.

@Demogorgon314
Copy link
Member Author

BTW, I noticed no one put data to LoadData.brokerData and LoadData.bundleData except for the test. Is it still useful? or maybe I missed something.

@codelipenghui Yes, it is still in use. The ModularLoadManagerImpl#updateAllBrokerData method puts data to LoadData.brokerData, and the ModularLoadManagerImpl#updateBundleData method puts data to LoadData.bundleData

@Demogorgon314 Demogorgon314 changed the title [improve][broker] Avoid assigning bundle to the previous unloaded broker [fix][broker] Avoid infinite bundle unloading Jul 19, 2023
@Demogorgon314 Demogorgon314 marked this pull request as draft July 19, 2023 04:49
Comment on lines +678 to +679
pulsar.getAdminClient().namespaces()
.unloadNamespaceBundle(namespaceName, bundleRange, destBroker.get());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The destinationBroker parameter is added in this PR: #18663

try {
pulsar.getAdminClient().namespaces().unloadNamespaceBundle(namespaceName, bundleRange);
pulsar.getAdminClient().namespaces()
.unloadNamespaceBundle(namespaceName, bundleRange, destBroker.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder, The destBroker is only available after 2.11.0. We are not able to cherry-pick to branch-2.10

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we don't need to ensure the bundle will actually go to the dest broker? We just want to avoid the infinite loop as possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we have many brokers, it still has a chance to select the same broker if we don't pass the destBroker parameter. I wonder if this is acceptable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it will only sometimes go to the same broker, not always. In 2.10, we don't have this API to support a strict owner assignment at the loading stage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. After this PR is merged, I will push a PR for branch 2.10.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 2.10 we don't have a good way for that. What will happen if the same broker is still selected after unloading? Will it retry?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BewareMyPower It will be selected to unload again. And hope next time it will be skipped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. After this PR is merged, I will push a PR for branch 2.10.

To be clear, how are you going to push this fix to 2.10? According to @codelipenghui, the destBroker option in the unload admin is only available since 2.11.

@codelipenghui codelipenghui added type/bug The PR fixed a bug or issue reported a bug and removed type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages labels Jul 19, 2023
@codelipenghui codelipenghui added this to the 3.1.0 milestone Jul 19, 2023
…ce/impl/ModularLoadManagerImpl.java

Co-authored-by: Yunze Xu <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Jul 21, 2023

Codecov Report

Merging #20822 (531e964) into master (57fbee4) will decrease coverage by 0.34%.
The diff coverage is 77.33%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #20822      +/-   ##
============================================
- Coverage     72.97%   72.64%   -0.34%     
- Complexity    32157    32227      +70     
============================================
  Files          1868     1856      -12     
  Lines        139164   139139      -25     
  Branches      15314    15329      +15     
============================================
- Hits         101555   101077     -478     
- Misses        29562    29999     +437     
- Partials       8047     8063      +16     
Flag Coverage Δ
inttests 24.24% <48.00%> (?)
systests ?
unittests 72.40% <76.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...roker/loadbalance/impl/ModularLoadManagerImpl.java 82.75% <77.33%> (+0.27%) ⬆️

... and 168 files with indirect coverage changes

@codelipenghui codelipenghui merged commit 3f63768 into apache:master Jul 25, 2023
@codelipenghui
Copy link
Contributor

@Demogorgon314 Could you please help cherry-pick the PR into release branches?

Demogorgon314 added a commit to Demogorgon314/pulsar that referenced this pull request Jul 26, 2023
Demogorgon314 added a commit to Demogorgon314/pulsar that referenced this pull request Jul 26, 2023
Demogorgon314 added a commit that referenced this pull request Jul 26, 2023
@heesung-sn
Copy link
Contributor

heesung-sn commented Jul 27, 2023

Though this is another great improvement in ModularLoadManager, this is a behavior change too.
For this behavior change,

  • Do we need to update any doc?
  • Do we need to put this logic under any config, as this is a behavior change?

It is great that this logic can block unloading bundles to the same broker.
e.g) Unloading to the same broker
A->A

However, could infinite unloading still happen if unloading bounces to multiple brokers?
e.g) Unloading bounces between two brokers
A->B->A->B

@Demogorgon314
Copy link
Member Author

@heesung-sn

Do we need to update any doc?

Yes, It's good to add a doc to notify the users.

Do we need to put this logic under any config, as this is a behavior change?

I think we can leave config as is, since unloading to the same broker is not make sense.

However, could infinite unloading still happen if unloading bounces to multiple brokers?

Yes, this can happen, not sure if we have a good way to prevent it.

@shibd
Copy link
Member

shibd commented Oct 21, 2023

cherry-picked branch-2.11 by #20878

@hanmz
Copy link
Contributor

hanmz commented Feb 21, 2024

Why not call selectBrokerForAssignment directly, but add a new selectBroker method?The only difference I see is that it is not added to preallocated . What is the impact if added to preallocated ?

Comment on lines +877 to +887
final ConcurrentOpenHashMap<String, ConcurrentOpenHashSet<String>> namespaceToBundleRange =
brokerToNamespaceToBundleRange
.computeIfAbsent(broker,
k -> ConcurrentOpenHashMap.<String,
ConcurrentOpenHashSet<String>>newBuilder()
.build());
synchronized (namespaceToBundleRange) {
namespaceToBundleRange.computeIfAbsent(namespaceName,
k -> ConcurrentOpenHashSet.<String>newBuilder().build())
.add(bundleRange);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The synchronized block here seems meaningless?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... not sure why used synchronized block here before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants