Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fix PulsarRegistrationClient and ZkRegistrationClient not aware rack info problem. #18672

Merged

Conversation

horizonzy
Copy link
Member

@horizonzy horizonzy commented Nov 29, 2022

Fixes #18671

Master Issue: #18671

Before #16825, the method getRack will update the rack info every invoke, so it can get the newest rack info.
After #16825, only Pulsar start or using pulsar admin set-bookie-rack to change zk node /bookies, then update the rack info. When the pulsar start, the PulsarRegistrationClient#bookieServiceInfoCache is empty; it can't resolve bookieId, so can't update the rack info.

Motivation

Make Pulsar and AutoRecovery aware of the rack info after its restart.

Modifications

After writeable bookies change, notify BookieRackAffinityMapping to update the rack info.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions
Copy link

@horizonzy Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Nov 29, 2022
@horizonzy horizonzy changed the title Fix autoRecovery not aware rack info problem. [fix][broker]Fix autoRecovery not aware rack info problem. Nov 29, 2022
@Technoboy- Technoboy- added this to the 2.12.0 milestone Nov 29, 2022
@Technoboy- Technoboy- closed this Nov 29, 2022
@Technoboy- Technoboy- changed the title [fix][broker]Fix autoRecovery not aware rack info problem. [fix][broker] Fix autoRecovery not aware rack info problem. Nov 29, 2022
@Technoboy- Technoboy- reopened this Nov 29, 2022
@codecov-commenter
Copy link

codecov-commenter commented Nov 29, 2022

Codecov Report

Merging #18672 (24cff81) into master (bc1e0cb) will decrease coverage by 11.19%.
The diff coverage is 76.00%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #18672       +/-   ##
=============================================
- Coverage     47.50%   36.31%   -11.19%     
- Complexity    10505    13890     +3385     
=============================================
  Files           698     1665      +967     
  Lines         67984   141229    +73245     
  Branches       7272    16809     +9537     
=============================================
+ Hits          32297    51294    +18997     
- Misses        32112    83994    +51882     
- Partials       3575     5941     +2366     
Flag Coverage Δ
inttests 24.13% <76.00%> (?)
unittests 33.81% <68.00%> (-13.70%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ookie/rackawareness/BookieRackAffinityMapping.java 70.00% <72.22%> (ø)
.../metadata/bookkeeper/PulsarRegistrationClient.java 64.60% <85.71%> (ø)
.../transaction/buffer/metadata/AbortTxnMetadata.java 28.57% <0.00%> (-57.15%) ⬇️
...a/org/apache/bookkeeper/mledger/ManagedCursor.java 37.50% <0.00%> (-48.22%) ⬇️
...java/org/apache/pulsar/proxy/stats/TopicStats.java 58.82% <0.00%> (-41.18%) ⬇️
...oker/loadbalance/LoadResourceQuotaUpdaterTask.java 44.44% <0.00%> (-33.34%) ⬇️
.../client/impl/schema/generic/GenericJsonWriter.java 36.36% <0.00%> (-30.31%) ⬇️
...pache/pulsar/broker/admin/v2/PersistentTopics.java 46.82% <0.00%> (-27.29%) ⬇️
...ent/impl/schema/generic/JsonRecordBuilderImpl.java 36.11% <0.00%> (-25.80%) ⬇️
.../apache/pulsar/broker/admin/impl/ClustersBase.java 51.77% <0.00%> (-24.89%) ⬇️
... and 1192 more

@hangc0276
Copy link
Contributor

@horizonzy Thanks for your contribution. We have changed the resolver and would you please help add a test to cover both two resolvers?

@horizonzy
Copy link
Member Author

@horizonzy Thanks for your contribution. We have changed the resolver and would you please help add a test to cover both two resolvers?

Fine

@horizonzy horizonzy changed the title [fix][broker] Fix autoRecovery not aware rack info problem. [fix][broker] Fix PulsarRegistrationClient and ZkRegistrationClient not aware rack info problem. Dec 2, 2022
@yangl
Copy link
Contributor

yangl commented Dec 12, 2022

Great work, fix the #18834

Copy link
Contributor

@yangl yangl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yangl
Copy link
Contributor

yangl commented Dec 13, 2022

btw, the else logic MetadataStore store should use the broker metadataStoreUrl, because when we set the rack info by admin bookies set-bookie-rack the configured json info on the broker used meta store url.

- String metadataServiceUri = ConfigurationStringUtil.castToString(conf.getProperty("metadataServiceUri"));
+ String metadataServiceUri = ConfigurationStringUtil.castToString(conf.getProperty("metadataStoreUrl"));

String url;
String metadataServiceUri = ConfigurationStringUtil.castToString(conf.getProperty("metadataServiceUri"));
if (StringUtils.isNotBlank(metadataServiceUri)) {
try {
url = metadataServiceUri.replaceFirst(METADATA_STORE_SCHEME + ":", "")
.replace(";", ",");
} catch (Exception e) {
throw new MetadataException(Code.METADATA_SERVICE_ERROR, e);
}
} else {
String zkServers = ConfigurationStringUtil.castToString(conf.getProperty("zkServers"));
if (StringUtils.isBlank(zkServers)) {
String errorMsg = String.format("Neither %s configuration set in the BK client configuration nor "
+ "metadataServiceUri/zkServers set in bk server configuration", METADATA_STORE_INSTANCE);
throw new RuntimeException(errorMsg);
}
url = zkServers;
}
try {
int zkTimeout = Integer.parseInt((String) conf.getProperty("zkTimeout"));
store = MetadataStoreExtended.create(url,
MetadataStoreConfig.builder()
.metadataStoreName(MetadataStoreConfig.METADATA_STORE)
.sessionTimeoutMillis(zkTimeout)
.build());
} catch (MetadataStoreException e) {
throw new MetadataException(Code.METADATA_SERVICE_ERROR, e);
}
}
return store;

run autorecovery standlone :

./bin/pulsar autorecovery
###### bookkeeper client url config #####
metadataServiceUri=zk+hierarchical://10.206.128.154:2181;10.206.128.155:2181;10.206.128.156:2181;10.206.128.157:2181;10.206.128.158:2181/bk1

###### bookkeeper client rack config #####
ensemblePlacementPolicy=org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy
reppDnsResolverClass=org.apache.pulsar.bookie.rackawareness.BookieRackAffinityMapping
minNumRacksPerWriteQuorum=2
enforceMinNumRacksPerWriteQuorum=true

##### BookieRackAffinityMapping config #####
metadataStoreUrl=zk:10.206.128.154:2181,10.206.128.155:2181,10.206.128.156:2181,10.206.128.157:2181,10.206.128.158:2181/sf-pulsar-1
zkTimeout=3000

Copy link
Contributor

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@codelipenghui
Copy link
Contributor

@hangc0276 Do we need to cherry-pick to branch-2.9? I noticed you have added the label release/2.9.5

BookieAddressResolver bookieAddressResolver = getBookieAddressResolver();
if (bookieAddressResolver instanceof DefaultBookieAddressResolver) {
try {
Field field = DefaultBookieAddressResolver.class.getDeclaredField("registrationClient");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please send a PR to BookKeeper in order to make this field accessible with a getter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine.

Copy link
Member Author

@horizonzy horizonzy Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apache/bookkeeper#3724.

Dose this pulsar pr wait for the next bk release?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use reflection

@horizonzy
Copy link
Member Author

@hangc0276
Copy link
Contributor

@hangc0276 Do we need to cherry-pick to branch-2.9? I noticed you have added the label release/2.9.5

@codelipenghui This Pr doesn't need to be cherry-picked to branch-2.9. I removed the label.

Technoboy- pushed a commit that referenced this pull request Feb 8, 2023
liangyepianzhou pushed a commit that referenced this pull request Feb 9, 2023
…ot aware rack info problem. (#18672)

(cherry picked from commit 43335fb)
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Feb 28, 2023
…ot aware rack info problem. (apache#18672)

(cherry picked from commit 43335fb)
(cherry picked from commit 003c186)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] The AutoRecovery didn't aware rack info after restart.
9 participants