Skip to content
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.

Bump pulsar version to 2.10.4-SNAPSHOT #5860

Closed
wants to merge 1,009 commits into from

Conversation

streamnativebot
Copy link

This is a PR created by snbot to trigger the check suite in each repository.

poorbarcode and others added 30 commits January 28, 2023 10:54
…apache#19275)

### Motivation

#### 1. `readPosition` point to a deleted ledger

When `trim ledgers` and `create new cursor` are executed concurrently, it will cause the `readPosition` of the cursor to point to a deleted ledger.

| time | `trim ledgers` | `create new cursor` |
| --- | --- | --- |
| 1 | | set read position and mark deleted position |
| 2 | delete ledger | |
| 3 | | add the cursor to `ManagedLedger.cursors` |

----

#### 2. Backlog wrong caused by `readPosition` wrong
<strong>(Highlight)</strong>Since the read position of the cursor is pointing at a deleted ledger, so deleted messages will never be consumed or acknowledged. Since the backlog in the API `topics stats` response is calculated as this: `managedLedger.entriesAddedCounter - cursor.messagesConsumedCounter`, the result is: Topics stats show `msgBacklog` but there is reality no backlog.
- `managedLedger.entriesAddedCounter`: Pulsar will set it to `0` when creating a new managed ledger, it will increment when adding entries.
- `cursor.messagesConsumedCounter`: Pulsar will set it to `0` when creating a new cursor, it will increment when acknowledging.

For example:
- write entries to the managed ledger: `{1:0~1:9}...{5:0~5:9}`
  - `managedLedger.entriesAddedCounter` is `50` now
- create a new cursor, and set the read position to `1:0`
  -  `cursor.messagesConsumedCounter` is `0` now
- delete ledgers `1~4`
- consume all messages
  - can only consume the messages {5:0~5:9}, so `cursor.messagesConsumedCounter` is `10` now
- the `backlog` in response of `topics stats` is `50 - 10 = 40`, but there reality no backlog

----

#### 3. Reproduce issue
Sorry, I spent 4 hours trying to write a non-invasive test, but failed. <strong>(Highlight)</strong>You can reproduce by `testBacklogIfCursorCreateConcurrentWithTrimLedger` in the PR apache#19274

https://github.com/apache/pulsar/blob/a2cdc759fc2710e4dd913eb0485d23ebcaa076a4/pulsar-broker/src/test/java/org/apache/bookkeeper/mledger/impl/StatsBackLogTest.java#L163

### Modifications

Avoid the race condition of `cursor.initializeCursorPosition` and `internalTrimLedgers`

(cherry picked from commit 4139fef)
…dCursor(single subscription check) upon subscription (apache#19343)
Relates to: apache#17831 (comment)

### Motivation

When the `ProxyConnection` handles a `Connect` command, that is the time to go to `Connecting` state. There is no other time that makes sense to switch to connecting. The current logic will go to connecting in certain re-authentication scenarios, but those are incorrect. By moving the state change to earlier in the logic, we make the state transition clearer and prevent corrupted state.

### Modifications

* Remove `state = State.Connecting` from the `doAuthentication` method, which is called multiple times for various reasons
* Add `state = State.Connecting` to the start of the `handleConnect` method.

### Verifying this change

The existing tests will verify this change, and reading through the code makes it clear this is a correct change.

### Does this pull request potentially affect one of the following parts:

Not a breaking change.

### Documentation

- [x] `doc-not-needed`

It would be nice to map out the state transitions for our connection classes. That is our of the scope of this small improvement.

### Matching PR in forked repository

PR in forked repository: michaeljmarshall#21

(cherry picked from commit c8650ce)
…pache#19423)

Co-authored-by: Michael Marshall <[email protected]>

### Motivation

Cherry-pick apache#17123

### Verifying this change

- [ ] Make sure that the change passes the CI checks.

*(Please pick either of the following options)*

This change is a trivial rework / code cleanup without any test coverage.

*(or)*

This change is already covered by existing tests, such as *(please describe tests)*.

*(or)*

This change added tests and can be verified as follows:

*(example:)*
  - *Added integration tests for end-to-end deployment with large payloads (10MB)*
  - *Extended integration test for recovery after broker failure*

### Does this pull request potentially affect one of the following parts:

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

*If the box was checked, please highlight the changes*

- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment

### Documentation

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

### Matching PR in forked repository

PR in forked repository: <!-- ENTER URL HERE -->

<!--
After opening this PR, the build in apache/pulsar will fail and instructions will
be provided for opening a PR in the PR author's forked repository.

apache/pulsar pull requests should be first tested in your own fork since the 
apache/pulsar CI based on GitHub Actions has constrained resources and quota.
GitHub Actions provides separate quota for pull requests that are executed in 
a forked repository.

The tests will be run in the forked repository until all PR review comments have
been handled, the tests pass and the PR is approved by a reviewer.
-->
Co-authored-by: fengwenzhi <[email protected]>
(cherry picked from commit 41edd2e)
…8975)

### Motivation

fix multi invocation for ledger `createComplete`

### Modifications

Only call `createComplete` at the point of creating ledger timeout if the ledger is not created normally

(cherry picked from commit 8f1c1b1)
…essage if enabled read compacted (apache#18877)

### Motivation

The method `consumer.getLastMessageId` will return the latest message which can be received.
- If disabled `read compacted`, will return the last confirmed position of `ManagedLedger`.
- If enabled `read compacted`, will return the latest message id which can be read from the compacted topic.

If we send a batch message like this:

```java
producer.newMessage().key("k1").value("v0").sendAsync(); // message-id is [3:1,-1:0]
producer.newMessage().key("k1").value("v1").sendAsync(); // message-id is [3:1,-1:1]
producer.newMessage().key("k1").value("v2").sendAsync(); // message-id is [3:1,-1:2]
producer.newMessage().key("k2").value("v0").sendAsync(); // message-id is [3:1,-1:3]
producer.newMessage().key("k2").value("v1").sendAsync(); // message-id is [3:1,-1:4]
producer.newMessage().key("k2").value(null).sendAsync(); // message-id is [3:1,-1:5]
producer.flush();
```

After the compaction task is done, the messages with key `k2` will be deleted by the compaction task. Then the latest message that can be received will be `[3:1:-1:2]`.

---
When we call `consumer.getLastMessageId`, the expected result is:

```
[3:1,-1:2]
```

---
But the actual result is:

```
[3:1,-1:5]
```

### Modifications
If enabled `read compacted` and the latest entry of the compacted topic is a batched message, extract the entry and calculate all internal messages, then return the latest message which is not marked `compacted out`.

(cherry picked from commit 83993ae)
…ache#12615)" (apache#19439)

This reverts commit 62e2547.

### Motivation

The motivation for apache#12615 relies on an incorrect understanding of Netty's threading model. The `ctx.executor()` is the context's event loop thread that is the same thread used to process messages. The `waitingForPingResponse` variable is only ever updated/read from the context's event loop, so there is no need to make the variable `volatile`.

### Modifications

* Remove `volatile` keyword for `waitingForPingResponse`

### Verifying this change

Read through all references to the variable.

### Documentation

- [x] `doc-not-needed`

### Matching PR in forked repository

PR in forked repository: Skipping for this trivial PR.

(cherry picked from commit fb28d83)
…ot aware rack info problem. (apache#18672)

(cherry picked from commit 43335fb)
liangyepianzhou and others added 28 commits July 8, 2023 09:51
Missed the required /** in the license header
…eExpiryMonitor

### Motivation

apache#20781 adds a new constructor to
`PersistentMessageExpiryMonitor` and initialize the old constructor with

```java
this.topic = subscription.topic;
```

NPE will happen when `subscription` is null. However, it's allowed to
pass a null `subscription` for test because methods like
`findEntryFailed` don't depend on the `topic` field.

### Modifications

Add the null check and mark the old constructor as deprecated.
…apache#20568)

- Since `cnx.address + consumerId` is the identifier of one consumer; add `consumer-id` into the log when doing subscribe.
- add a test to confirm that even if the error occurs when sending messages to the client, the consumption is still OK.
- print debug log if ack-command was discarded due to `ConsumerFuture is not complete.`
- print debug log if sending a message to the client is failed.

(cherry picked from commit a41ac49)
…pache#20819)

Motivation: If the producer name is generated by the Broker, the producer will update the variable `producerName` after connecting, but not update the same variable of the batch message container.

Modifications: fix bug
(cherry picked from commit aba50f2)
…egistered if there has no message was sent (apache#20888)

Motivation: In the replication scenario, we want to produce messages on the native cluster and consume messages on the remote cluster, the producer and consumer both use a same schema, but the consumer cannot be registered if there has no messages in the topic yet.The root cause is that for the remote cluster, there is a producer who has been registered with `AUTO_PRODUCE_BYTES` schema, so there is no schema to check the compatibility.

Modifications: If there is no schema and only the replicator producer was registered, skip the compatibility check.
(cherry picked from commit 9be0b52)
- The task `trim ledgers` runs in the thread `BkMainThreadPool.choose(ledgerName)`
- The task `write entries to BK` runs in the thread `BkMainThreadPool.choose(ledgerId)`

So the two tasks above may run concurrently/

The task `trim ledgers` work as the flow below:
- find the ledgers which are no longer to read, the result is `{Ledgers before the slowest read}`.
- check if the `{Ledgers before the slowest read}` is out of retention policy, the result is `{Ledgers to be deleted}`.
  - if the create time of the ledger is lower than the earliest retention time, mark it should be deleted
  - if after deleting this ledger, the rest ledgers are still larger than the retention size, mark it should be deleted
- delete the`{Ledgers to be deleted}`

**(Highlight)** There is a scenario that causes the task `trim ledgers` did  discontinuous ledger deletion, resulting consume messages discontinuous:
- context:
  - ledgers: `[{id=1, size=100}, {id=2,size=100}]`
  - retention size: 150
  - no cursor there
- Check `ledger 1`, skip by retention check `(200 - 100) < 150`
- One in-flight writing is finished, the `calculateTotalSizeWrited()` would return `300` now.
- Check `ledger 2`, retention check `(300 - 100) > 150`, mark the ledger-2 should be deleted.
- Delete the `ledger 2`.
- Create a new consumer. It will receive messages from `[ledger-1, ledegr-3]`, but the `ledger-2` will be skipped.

Once the retention constraint has been met, break the loop.

(cherry picked from commit 782e91f)
The C++ and Python clients are not maintained in the main repo now.
@streamnativebot streamnativebot deleted the branch-2.10.4-SNAPSHOT branch August 11, 2023 09:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.