-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][client] PIP-393: Improve performance of Negative Acknowledgement #23600
base: master
Are you sure you want to change the base?
Conversation
I will implement a space efficient map structure for |
@thetumbled Fastutil could be a better source of space efficient map data structures. I believe that there's a templating solution where it's possible to generate code for efficient implementations. In this case, is there a need for the data structure to be concurrent? Following a single writer principle could result in simpler and more performant designs. One way to address message passing from other threads to a single writer thread is to use message passing queues from JCTools which we already use in Pulsar. Just some food for thought. |
...rc/main/java/org/apache/pulsar/common/util/collections/ConcurrentTripleLong2LongHashMap.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/pulsar/common/util/collections/ConcurrentTripleLong2LongHashMap.java
Outdated
Show resolved
Hide resolved
If any solution from |
The main purpose of Fastutil is performance and space efficiency. pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/PendingAcksMap.java Line 101 in 3d0625b
There's a huge benefit when the keys are primitives and you can only achieve that with a hierarchical data structure. In the long run, I'd rather get rid of our custom collection implementations instead of adding more of them. It's not trivial to create bug free collections. Using existing libraries for that purpose is strongly preferred. @thetumbled Have you considered a hierarchical data structure? (such as map of maps) |
It is a good point, i will test it. |
@thetumbled In certain cases when tracking existence (true/false), it's worth considering to use space efficient bit maps. In Pulsar, we use the RoaringBitmap library. |
I guess it's not applicable in this case. @thetumbled I checked the NegativeAcksTracker class and it seems that the actual key is (ledgerId, entryId). It's easy to implement (ledgerId, entryId) as map of maps. |
This is a very poor solution in the current implementation: pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/NegativeAcksTracker.java Lines 80 to 88 in 22cfa54
There should be a separate datastructure (a list or queue) which contains the entries in timestamp order. The benefit of that is that iterating could stop after the timestamp condition no longer holds. |
@thetumbled It looks like there's no need for a map data structure in the first place. That's completely unnecessary for implementing NegativeAcksTracker |
You are right. We need to improve the code execution efficiency too. some kind of structures sorted by timestamp. |
Fastutil contains multiple PriorityQueue implementations: https://fastutil.di.unimi.it/docs/it/unimi/dsi/fastutil/PriorityQueue.html. |
I propose a pip to fix several issues with nack tracker, with a new data structure :
This PR become the implementation PR for PIP-393: #23601. |
Motivation
There are many issues with the current implementation of Negative Acknowledgement in Pulsar:
All of these problem is severe and need to be solved.
Memory occupation is high
After the improvement of #23582, we have reduce half more memory occupation
of
NegativeAcksTracker
by replacingHashMap
withConcurrentLongLongPairHashMap
. With 100w entry, the memory occupation decrease from 178Mb to 64Mb. With 1kw entry, the memory occupation decrease from 1132Mb to 512Mb.The average memory occupation of each entry decrease from 1132MB/10000000=118byte to 512MB/10000000=53byte.
But it is not enough. Assuming that we negative ack message 1w/s, assigning 1h redelivery delay for each message,
the memory occupation of
NegativeAcksTracker
will be3600*10000*53/1024/1024/1024=1.77GB
, if the delay is 5h,the required memory is
3600*10000*53/1024/1024/1024*5=8.88GB
, which increase too fast.Code execution efficiency is low
Currently, each time the timer task is triggered, it will iterate all the entries in
NegativeAcksTracker.nackedMessages
,which is unnecessary. We can sort entries by timestamp and only iterate the entries that need to be redelivered.
Redelivery time is not accurate
Currently, the redelivery time is controlled by the
timerIntervalNanos
, which is 1/3 of thenegativeAckRedeliveryDelay
.That means, if the
negativeAckRedeliveryDelay
is 1h, the redelivery time will be 20min, which is unacceptable.Multiple negative ack for messages in the same entry(batch) will interfere with each other
Currently,
NegativeAcksTracker#nackedMessages
map(ledgerId, entryId)
totimestamp
, which means multiple nacks from messages in the same batch share single one timestamp.If we let msg1 redelivered 10s later, then let msg2 redelivered 20s later, these two messages are delivered 20s later together. msg1 will not be redelivered 10s later as the timestamp recorded in
NegativeAcksTracker#nackedMessages
is overrode by the second nack call.we can reproduce this problem with test code below:
We expect the second message redelivered 10s later than the first message, as it call nack 10s later than the first one.
However, we will receive two messages together.
You can also reproduce this problem with the test code in this PR: org.apache.pulsar.client.impl.NegativeAcksTest#testNegativeAcksWithBatch
Modifications
Refactor the
NegativeAcksTracker
to solve the above problems.Verifying this change
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: thetumbled#64