Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][broker] Do-not-merge #251

Closed
wants to merge 81 commits into from

Conversation

mukesh-ctds
Copy link
Collaborator

Motivation

Explain here the context, and why you're making that change. What is the problem you're trying to solve.
This PR sync all commits from apache/branch-3.0 into 3.1_ds which are not present.

Modifications

Describe the modifications you've done.

  • Cherry-picked commits from branch-3.0 which are not present on 3.1_ds

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (yes)
  • The public API: (no)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Documentation

Check the box below or label this PR directly (if you have committer privilege).

Need to update docs?

  • doc-required

    (If you need help on updating docs, create a doc issue)

  • no-need-doc

    (Please explain why)

  • doc

    (If this PR contains doc changes)

Technoboy- and others added 30 commits April 12, 2024 11:15
### Motivation

- The task `trim ledgers` runs in the thread `BkMainThreadPool.choose(ledgerName)`
- The task `write entries to BK` runs in the thread `BkMainThreadPool.choose(ledgerId)`

So the two tasks above may run concurrently/

The task `trim ledgers` work as the flow below:
- find the ledgers which are no longer to read, the result is `{Ledgers before the slowest read}`.
- check if the `{Ledgers before the slowest read}` is out of retention policy, the result is `{Ledgers to be deleted}`.
  - if the create time of the ledger is lower than the earliest retention time, mark it should be deleted
  - if after deleting this ledger, the rest ledgers are still larger than the retention size, mark it should be deleted
- delete the`{Ledgers to be deleted}`

**(Highlight)** There is a scenario that causes the task `trim ledgers` did  discontinuous ledger deletion, resulting consume messages discontinuous:
- context:
  - ledgers: `[{id=1, size=100}, {id=2,size=100}]`
  - retention size: 150
  - no cursor there
- Check `ledger 1`, skip by retention check `(200 - 100) < 150`
- One in-flight writing is finished, the `calculateTotalSizeWrited()` would return `300` now.
- Check `ledger 2`, retention check `(300 - 100) > 150`, mark the ledger-2 should be deleted.
- Delete the `ledger 2`.
- Create a new consumer. It will receive messages from `[ledger-1, ledegr-3]`, but the `ledger-2` will be skipped.

### Modifications

Once the retention constraint has been met, break the loop.

(cherry picked from commit 782e91f)
(cherry picked from commit b87c0fb)
…pache#21035)

Motivation: After [PIP-118: reconnect broker when ZooKeeper session expires](apache#13341), the Broker will not shut down after losing the connection of the local metadata store in the default configuration. However, before the ZK client is reconnected, the events of BK online and offline are lost, resulting in incorrect BK info in the memory. You can reproduce the issue by the test `BkEnsemblesChaosTest. testBookieInfoIsCorrectEvenIfLostNotificationDueToZKClientReconnect`(90% probability of reproduce of the issue, run it again if the issue does not occur)

Modifications: Refresh BK info in memory after the ZK client is reconnected.
(cherry picked from commit db20035)
(cherry picked from commit 5f99925)
…ed on topic, when dedup is enabled and no producer is there (apache#20951)

(cherry picked from commit 30073db)
(cherry picked from commit f68589e)
…ata when doing lookup (apache#21063)

Motivation: If we set `allowAutoTopicCreationType` to `PARTITIONED`, the flow of the create topic progress is like the below:
1. `Client-side`: Lookup topic to get partitioned topic metadata to create a producer.
1. `Broker-side`: Create partitioned topic metadata.
1. `Broker-side`: response `{"partitions":3}`.
1. `Client-side`: Create separate connections for each partition of the topic.
1. `Broker-side`: Receive 3 connect requests and create 3 partition-topics.

In the `step 2` above, the flow of the progress is like the below:
1. Check the policy of topic auto-creation( the policy is `{allowAutoTopicCreationType=PARTITIONED, defaultNumPartitions=3}` )
1. Check the partitioned topic metadata already exists.
1. Try to create the partitioned topic metadata if it does not exist.
1. If created failed by the partitioned topic metadata already exists( maybe another broker is also creating now), read partitioned topic metadata from the metadata store and respond to the client.

There is a race condition that makes the client get non-partitioned metadata of the topic:
| time | `broker-1` | `broker-2` |
| --- | --- | --- |
| 1 | get policy: `PARTITIONED, 3` | get policy: `PARTITIONED, 3` |
| 2 | check the partitioned topic metadata already exists | Check the partitioned topic metadata already exists |
| 3 | Partitioned topic metadata does not exist, the metadata cache will cache an empty optional for the path | Partitioned topic metadata does not exist, the metadata cache will cache an empty optional for the path |
| 4 |  | succeed create the partitioned topic metadata |
| 5 | Receive a ZK node changed event to invalidate the cache of the partitioned topic metadata |
| 6 | Creating the metadata failed due to it already exists |
| 7 | Read the partitioned topic metadata again |

If `step-5` is executed later than `step-7`, `broker-1` will get an empty optional from the cache of the partitioned topic metadata and respond non-partitioned metadata to the client.

**What thing would make the `step-5` is executed later than `step-7`?**
Provide a scenario: Such as the issue that the PR apache#20303 fixed, it makes `zk operation` and `zk node changed notifications`  executed in different threads: `main-thread of ZK client` and `metadata store thread`.

Therefore, the mechanism of the lookup partitioned topic metadata is fragile and we need to optimize it.

Modifications: Before reading the partitioned topic metadata again, refresh the cache first.
(cherry picked from commit d099ac4)
(cherry picked from commit 2e534d2)
… many requests (apache#21216)

Motivation: The Pulsar client will close the socket if it receives a `ServiceNotReady` error when doing a lookup. The Broker will respond to the client with a `TooManyRequests` error if there are too many lookup requests in progress, but the Pulsar Proxy responds to the client with a `ServiceNotReady` error in the same scenario.

Modifications: Make Pulsar Proxy respond to the client with a `TooManyRequests` error if there are too many lookup requests in progress.
(cherry picked from commit d6c3fa4)
(cherry picked from commit 9a7c4bb)
…er id when switch ledger (apache#21201)

### Modifications
- Print a warning log if the SSL handshake error
- Print ledger ID when switching ledger

(cherry picked from commit 8485d68)
(cherry picked from commit f243925)
…ing queue is not empty (apache#21259)

Reproduce steps:
- Create a reader.
- Reader pulls messages into `incoming queue`, do not call `reader.readNext` now.
- Trim ledger task will delete the ledgers, then there is no in the topic.
- Now, you can get messages if you call `reader.readNext`, but the method `reader.hasMessageAvailable` return `false`

Note: the similar issue of `MultiTopicsConsumerImpl` has been fixed by apache#13332, current PR only trying to fix the issue of `ConsumerImpl`.

Make `reader.hasMessageAvailable` return `true` when `incoming queue` is not empty.

(cherry picked from commit 6d82b09)
(cherry picked from commit 38c3f0c)
### Motivation

After trimming ledgers, the variable `lastConfirmedEntry` of the managed ledger might rely on a deleted ledger(the latest ledger which contains data).

There is a bug that makes pulsar allow users to set the start read position to an unexisting ledger or a deleted ledger when creating a subscription. This makes the `backlog` and `markDeletedPosition` wrong.

### Modifications

Fix the bug.

(cherry picked from commit 4ee5cd7)
(cherry picked from commit dd28bb4)
…gers (apache#21250)

### Background
- But after trimming ledgers, `ml.lastConfirmedPosition` relies on a deleted ledger when the current ledger of ML is empty.
- Cursor prevents setting `markDeletedPosition` to a value larger than `ml.lastConfirmedPosition`, but there are no entries to read<sup>[1]</sup>.
- The code description in the method `advanceCursors` said: do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition`<sup>[2]</sup>

### Issue
If there is no durable cursor, the `markDeletedPosition` might be set to `{current_ledger, -1}`, and `async mark delete` will be prevented by the `rule-2` above. So he `backlog`, `readPosition`, and `markDeletedPosition` of the cursor will be in an incorrect position after trimming the ledger. You can reproduce it by the test `testTrimLedgerIfNoDurableCursor`

### Modifications
Do not make `cursor.markDeletedPosition` larger than `ml.lastConfirmedPosition` when advancing non-durable cursors.

(cherry picked from commit ca77982)
(cherry picked from commit 6895919)
…ne faster (apache#21183)

There is an issue similar to the apache#21155 fixed one.

The client assumed the connection was inactive, but the Broker assumed the connection was fine. The Client tried to  use a new connection to reconnect an exclusive consumer, then got an error `Exclusive consumer is already connected`

- Check the connection of the old consumer is available when the new one tries to subscribe

(cherry picked from commit 29db8f8)
(cherry picked from commit b796f56)
…ptions caused by wrong topicName (apache#21997)

Similar to: apache#20131

The master branch has fixed the issue by apache#19841 Since it will makes users can not receive the messages which created in mistake, we did not cherry-pick apache#19841 into other branches, see detail apache#19841)

It works like this:
1. createSubscription( `tp1` )
2. is partitioned topic?
  `no`: return subscriptions
  `yes`: createSubscription(`tp1-partition-0`)....createSubscription(`tp1-partition-n`)

---

```java
String partitionedTopic = "tp1-partition-0-DLQ";

TopicName partition0 = partitionedTopic.getPartition(0);// Highlight: the partition0.toString() will be "tp1-partition-0-DLQ"(it is wrong).The correct value is "tp1-partition-0-DLQ-partition-0"
```

Therefore, if there has a partitioned topic named `tp1-partition-0-DLQ`, the method `PersistentTopics.createSubscription` will works like this:
1. call Admin API ``PersistentTopics.createSubscription("tp1-partition-0-DLQ")`
2. is partitioned topic?
3. yes, call `TopicName.getPartition(0)` to get partition 0 and will get `tp1-partition-0-DLQ` , then loop to step-1.

Then the infinite HTTP call `PersistentTopics.createSubscription` makes the broker crash.

If hits the issue which makes the topic name wrong, do not loop to step 1.

The PR apache#19841 fixes the issue which makes the topic name wrong, and this PR will create unfriendly compatibility, and PIP 263 apache#20033 will make compatibility good.

(cherry picked from commit 4386401)
… reads (apache#22295)

(cherry picked from commit 2803ba2)

(cherry picked from commit fde7c49)
…o same topics (apache#22255)

(cherry picked from commit c616b35)
(cherry picked from commit 5ab0e93)
BewareMyPower and others added 13 commits April 16, 2024 10:51
(cherry picked from commit 1c46877)
(cherry picked from commit 4a5400f)
(cherry picked from commit 5ba3e57)
…he#21274)

Co-authored-by: Baodi Shi <[email protected]>
(cherry picked from commit 5d18ff7)
(cherry picked from commit 000ee66)
…c loading wouldn't timeout (apache#22479)

(cherry picked from commit 837f8bc)
(cherry picked from commit 0fbcbb2)
(cherry picked from commit 7a4e16a)

(cherry picked from commit 93664d7)
…ks (apache#22463)

(cherry picked from commit f3d14a6)
(cherry picked from commit 976399c)
…paths or disabling it (apache#22370)

(cherry picked from commit 15ed659)
(cherry picked from commit cdc5af1)
…d fix race conditions in metricsBufferResponse mode (apache#22494)

(cherry picked from commit 7009071)
(cherry picked from commit 5f9d7c5)
…xception and testIncorrectClientClock (apache#22489)

(cherry picked from commit d9a43dd)
(cherry picked from commit c590198)
Co-authored-by: hoguni <[email protected]>
(cherry picked from commit 20915d1)

# Conflicts:
#	pom.xml
(cherry picked from commit ef9b28f)
(cherry picked from commit bbff29d)
(cherry picked from commit a0120d0)
@mukesh-ctds mukesh-ctds reopened this Apr 17, 2024
@mukesh-ctds mukesh-ctds force-pushed the 3.1_ds_cherrypicks_from_3.0-fix branch 8 times, most recently from 2e65ead to 6070b4d Compare April 19, 2024 11:08
@mukesh-ctds mukesh-ctds force-pushed the 3.1_ds_cherrypicks_from_3.0-fix branch from c0b175c to dde20bf Compare April 19, 2024 13:40
@mukesh-ctds mukesh-ctds deleted the 3.1_ds_cherrypicks_from_3.0-fix branch April 23, 2024 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.