Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Ignore and remove the replicator cursor when the remote cluster is absent #23

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

BewareMyPower
Copy link
Owner

Motivation

Sometimes when a remote cluster is deleted, the replication cursor might still exist for some topics. In this case, creating producers or consumers on these topics will fail.

Here is a log observed in a production environment:

WARN org.apache.pulsar.broker.service.BrokerService - Replication or
dedup check failed. Removing topic from topics list
persistent://public/__kafka/__consumer_offsets-partition-40,
java.util.concurrent.CompletionException: java.lang.RuntimeException:
org.apache.pulsar.metadata.api.MetadataStoreException$NotFoundException:
kop

If it happened, unloading the topic or restarting the broker could not help. We have to remove the cursor manually.

Modificatons

When initializing a PersistentTopic, if there is any replicator cursor while the responding cluster does not exist, ignore the exception from addReplicationCluster. Then, remove this "zombie" cursor.

Verifications

PersistentTopicTest#testCreateTopicWithZombieReplicatorCursor is added to verify PersistentTopic#initialize will succeed and the zombie replicator cursor will be removed.

@BewareMyPower BewareMyPower force-pushed the bewaremypower/replicator-zombie-cursors branch 2 times, most recently from c57d1ac to 1469b85 Compare March 30, 2023 17:11
@BewareMyPower BewareMyPower changed the title [fix][broker] Skip creating the replicator when the remote cluster is absent [fix][broker] Ignore and remove the replicator cursor when the remote cluster is absent Mar 30, 2023
… cluster is absent

### Motivation

Sometimes when a remote cluster is deleted, the replication cursor might
still exist for some topics. In this case, creating producers or
consumers on these topics will fail.

Here is a log observed in a production environment:

> WARN  org.apache.pulsar.broker.service.BrokerService - Replication or
> dedup check failed. Removing topic from topics list
> persistent://public/__kafka/__consumer_offsets-partition-40,
> java.util.concurrent.CompletionException: java.lang.RuntimeException:
> org.apache.pulsar.metadata.api.MetadataStoreException$NotFoundException:
> kop

If it happened, unloading the topic or restarting the broker could not
help. We have to remove the cursor manually.

### Modificatons

In `addReplicationCluster`, before getting the replication client, check
the namespace policy and topic policy first. If the remote cluster does
not exist, skip adding the replication client and remove the cursor.

### Verifications

`PersistentTopicTest#testCreateTopicWithZombieReplicatorCursor` is added
to verify `PersistentTopic#initialize` will succeed and the zombie
replicator cursor will be removed.
@BewareMyPower BewareMyPower force-pushed the bewaremypower/replicator-zombie-cursors branch from 1469b85 to 8aeb37d Compare March 31, 2023 09:30
@github-actions
Copy link

github-actions bot commented May 3, 2023

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant