-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Create new ledger after the current ledger is closed #22034
Conversation
This sounds like a big behaviour change. |
|
1. transaction test: deleteNamespace after test 2. badVersionErrorDuringTruncateLedger: Avoid the ledger was closed after rollover 3. testBacklogQuotaWithReader no backlog should be retained
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waiting for review.
Hi, @liangyepianzhou could you please point out which code? I saw we will create a new ledger here. pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java Line 1815 in 0dabc97
|
Oh, I saw one place will cause this behaviour pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java Line 859 in 0dabc97
|
Oh, I see. This is a bug, not a behavior change. The rollover in the OpAddEntry does not create a new ledger if there are no pending write ops. pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java Lines 244 to 249 in 0dabc97
The callback finally reaches here. pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java Lines 1757 to 1760 in 0dabc97
|
Is this related to #21893? |
@liangyepianzhou It would be good to improve the description of the PR.
This doesn't make sense.
This makes more sense. Is this what is the actual goal? Obviously the trimming could result in deleting the ledger. |
Would it make sense to rename "Delete current ledger when it is closed" to "Trim current ledger when it is closed" if that's what is the expectation? |
It seems that the problem might be very different. It would be necessary to explain the use case. One rare case where Pulsar currently seems to ignore retention policies is the case when topics aren't actively loaded on any broker in the cluster. A topic gets loaded when a consumer or producer connects to the topic. |
There's a scheduled job pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java Lines 653 to 659 in 5df97b4
retentionCheckIntervalInSeconds ).That iterates all active topics and their managed ledgers: pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java Lines 2041 to 2051 in 5df97b4
This works as long as the topic is active (loaded) in a broker in the cluster. |
@lhotari Thanks for your review. It helps a lot. I have updated the description, which has greatly changed from my original idea. In the initial plan, I did not consider the following two points:
I hope the original description did not cause you too much trouble, and I look forward to your reply. |
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/test/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerTest.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/test/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerTest.java
Show resolved
Hide resolved
managed-ledger/src/test/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerTest.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/test/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerTest.java
Outdated
Show resolved
Hide resolved
…ntriesInStorage - change apache#22034 is missing from branch-3.0 (cherry picked from commit e3531e8)
…pache#22034) (cherry picked from commit d0ca983) (cherry picked from commit 54042df)
…losed (apache#22034)" This reverts commit eed3d17.
…pache#22034) (cherry picked from commit d0ca983) (cherry picked from commit 54042df) (cherry picked from commit eed3d17)
…ntriesInStorage - change apache#22034 is missing from branch-3.0 (cherry picked from commit e3531e8)
…losed (apache#22034)" This reverts commit eed3d17.
…ntriesInStorage - change apache#22034 is missing from branch-3.0 (cherry picked from commit e3531e8)
…losed (apache#22034)" This reverts commit eed3d17.
…ntriesInStorage - change apache#22034 is missing from branch-3.0 (cherry picked from commit e3531e8)
…pache#22034) (cherry picked from commit d0ca983) (cherry picked from commit 54042df)
In branch-3.0 this change is making ClusterMigrationTest fail consistently with NPE.
example stacktrace of the NPE
|
When cherry-picking, it's important to also pick #22552 |
…ntriesInStorage - change apache#22034 is missing from branch-3.0 (cherry picked from commit e3531e8)
…pache#22034) (cherry picked from commit d0ca983) (cherry picked from commit 54042df)
This PR introduced a flaky test #23164, @liangyepianzhou do you have a chance to fix it? thanks |
Motivation
Fix1: Create new ledger after the current ledger is closed
Background
maximumRolloverTimeMs
to configure the max time of closing a ledger.maximumRolloverTimeMs
.ClosedLedger
state of the manager indicates that the current ledger is rollovered and can not be written to.In the current logic, if it is found that the current ledger should be closed during the process of writing an entry, then the current ledger will be closed after this entry is written. However, if there is no pending add entry operation, a new ledger will not be created. This will cause the current ledger to be unable to be deleted, and the retention policy cannot be executed as expected by the user. This is the problem we need to solve, and I will further elaborate on this issue below:
We need to open a new ledger after the current ledger when it is rolled over. In fact, Pulsar has a task that periodically checks if the ledger is full, and it will create a new ledger immediately after closing the ledger. Subsequently, the new ledger becomes the current ledger and the previous current ledger can be deleted.
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Lines 4445 to 4447 in fc2e314
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Lines 1794 to 1816 in fc2e314
However, the ledger may also be closed after adding an entry, and at this time, if there is no pending add entry operation, a new ledger will not be created. Therefore, the already closed current ledger cannot be deleted because there is no new ledger to become the current ledger.
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java
Line 276 in fc2e314
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Lines 1757 to 1760 in fc2e314
Fix2: slowestReaderPosition should be the next porsition of markdelete position instead of markdelete position
In the current ledger the slowestReaderPosition will be the slowest markdeletepostion and this will make the last full ledger can not be trimed after all the entries in this ledger is acked.
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Line 1369 in fc2e314
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Lines 2038 to 2070 in fc2e314
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorContainer.java
Lines 232 to 261 in 73f62c5
Modifications
Fix1: Create new ledger after the current ledger is closed
We have two ways to solve this problem:
In most cases, users take more care about the write-read latency instead of the cost of storage brought by creating an empty ledger. So I suggest we delete the limit of
!pendingAddEntries.isEmpty()
, creating a new ledger after closing the current ledger.Fix2: slowestReaderPosition should be the next porsition of markdelete position instead of markdelete position
Skip the ledger that has been read completed when trimming.
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: