Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][client] Support multi-topic messageId deserialization to ack messages #19944

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rdhabalia
Copy link
Contributor

@rdhabalia rdhabalia commented Mar 28, 2023

Motivation

Right now, when user tries to serialize messageId and deserialize messageId for a partitioned-topic and lacks the message then it fails with the below exception.

org.apache.pulsar.client.api.PulsarClientException$NotAllowedException: Only TopicMessageId is allowed to acknowledge for a multi-topics consumer, while messageId is org.apache.pulsar.client.impl.BatchMessageIdImpl
	at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:1083)
	at org.apache.pulsar.client.impl.ConsumerBase.acknowledge(ConsumerBase.java:418)
	at org.apache.pulsar.client.api.MultiTopicsConsumerTest.testMultiTopicAckWithByteMessageId(MultiTopicsConsumerTest.java:409)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

It happens because MessageId is not able deserialize into TopicMessageIdImpl and because of that MultiTopicsConsumerImpl fails to ack the message.

Modifications

Support serializing and deserialzing for messageId(TopicMessageIdImpl) of MultiTopicsConsumer.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@rdhabalia rdhabalia added this to the 3.0.0 milestone Mar 28, 2023
@rdhabalia rdhabalia self-assigned this Mar 28, 2023
@rdhabalia rdhabalia added doc-not-needed Your PR changes do not impact docs ready-to-test labels Mar 28, 2023
@rdhabalia rdhabalia changed the title [improve][client] Pulsar client supports multi-topic messageId deserialization to ack messages [improve][client] Support multi-topic messageId deserialization to ack messages Mar 28, 2023
// For the chunk message id, we need to specify the first chunk message id.
optional MessageIdData first_chunk_message_id = 7;
optional string topicName = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a PIP to add new fields to the Pulsar API. In PIP-224 I proposed to add an extra class TopicMessageIdSerDes for the serialization of the TopicMessageId, but the related PR is not present yet. If we can introduce such a new field, I think we don't need to implement TopicMessageIdSerDes any more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I described here, the limit of PIP-224 is caused by the lack of the topic name field in MessageIdData. I think it would be better to add this field before the 3.0.0 release. But I'd like to hear more voices about this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BewareMyPower I replied on the PR serializing and deserializing is expensive and on top of that having different APIs for different use cases is creating a really bad experience for users, and I strongly feel we should avoid such APIs and complexity if things can be solved with a simple straight forward change with the same API and without creating the bad user experience.
I think we should consider this simple change without costing performance and API incompatibility and confusing usage to the users. So, I would like to avoid this change: BewareMyPower#20

Copy link
Contributor

@BewareMyPower BewareMyPower Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm +1 to your PR and I think the API proposed in PIP-224 could be replaced with this PR.

few committers block all the PRs without any reason

However, the reason I blocked this PR is the current requirement of the PIP:

Any change to the wire protocol APIs

This PR brings a change to the wire protocol API, and I think it should not be passed without any proposal. /cc @merlimat @eolivelli @codelipenghui

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I think it's better to add a description to clarify the new field will not apply to RPC calls. It is only used by the client side to carry the topic name.

@BewareMyPower
Copy link
Contributor

Could you resolve the conflicts with master?

// For the chunk message id, we need to specify the first chunk message id.
optional MessageIdData first_chunk_message_id = 7;
optional string topicName = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, i come from Kafka. and i think when client enable group commit offset. the field can be group by topicName and we can have much smaller RPC message to send on the wire. see kafka OffsetCommit RPC https://kafka.apache.org/protocol.html#The_Messages_OffsetCommit : - )

@poorbarcode
Copy link
Contributor

Since we will start the RC version of 3.0.0 on 2023-04-11, I will change the label/milestone of PR who have not been merged.

  • The PR of type feature is deferred to 3.1.0
  • The PR of type fix is deferred to 3.0.1

So drag this PR to 3.1.0

@rdhabalia
Copy link
Contributor Author

@BewareMyPower anyways, I have resolved the conflict to move this PR forward.

@codecov-commenter
Copy link

Codecov Report

Merging #19944 (04f52fa) into master (dc1cdac) will increase coverage by 48.66%.
The diff coverage is 61.11%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #19944       +/-   ##
=============================================
+ Coverage     24.26%   72.92%   +48.66%     
- Complexity      294    31857    +31563     
=============================================
  Files          1609     1865      +256     
  Lines        125669   138191    +12522     
  Branches      13707    15204     +1497     
=============================================
+ Hits          30490   100779    +70289     
+ Misses        90689    29393    -61296     
- Partials       4490     8019     +3529     
Flag Coverage Δ
inttests 24.19% <18.05%> (-0.07%) ⬇️
systests 25.00% <29.16%> (?)
unittests 72.18% <61.11%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...sar/broker/authorization/AuthorizationService.java 58.00% <ø> (+52.79%) ⬆️
.../pulsar/broker/service/BrokerServiceException.java 86.84% <0.00%> (+65.93%) ⬆️
.../apache/pulsar/client/impl/BatchMessageIdImpl.java 82.75% <0.00%> (+18.47%) ⬆️
...org/apache/pulsar/proxy/server/ProxyClientCnx.java 79.41% <ø> (+79.41%) ⬆️
...rg/apache/pulsar/proxy/server/ProxyConnection.java 58.76% <33.33%> (+22.30%) ⬆️
...sar/broker/service/persistent/PersistentTopic.java 79.34% <40.00%> (+33.23%) ⬆️
...n/java/org/apache/pulsar/broker/PulsarService.java 81.63% <50.00%> (+27.98%) ⬆️
.../apache/pulsar/client/impl/TopicMessageIdImpl.java 58.33% <50.00%> (-4.17%) ⬇️
...a/org/apache/pulsar/client/impl/MessageIdImpl.java 89.87% <87.50%> (+53.76%) ⬆️
...va/org/apache/pulsar/broker/service/ServerCnx.java 71.37% <90.00%> (+40.70%) ⬆️
... and 3 more

... and 1574 files with indirect coverage changes

@BewareMyPower BewareMyPower dismissed their stale review April 12, 2023 02:28

Changes are done

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look to me to introduce wire protocol changes that without PIP.

@codelipenghui
Copy link
Contributor

where did we discuss this? do you want me to show examples where breaking changes and critical impact PRs were merged which should not be merged. Sorry, I can't keep spending time in rebasing PR.

@rdhabalia I mean it should start with a PIP because the PR introduced changes to the wire protocol. If the PR fixes a breaking change, why the PR title is using [improve], it confused me. Starting with a proposal will help us understand the whole context of the issue and why this solution is the best solution for the problem.

@rdhabalia
Copy link
Contributor Author

@codelipenghui
I have created PIP: #20221

@BewareMyPower
Copy link
Contributor

You need to send a discussion to [email protected], otherwise others might not know this PIP. For example, I see the PIP number of @liangyepianzhou has conflict with this PIP. #20225

Reference:

@liangyepianzhou
Copy link
Contributor

You need to send a discussion to [email protected], otherwise others might not know this PIP. For example, I see the PIP number of @liangyepianzhou has conflict with this PIP. #20225

I recall it was mentioned before that we should select the next PIP number based on the PIP numbers in the emails. Before sending the vote email, I searched and found that PIP-266 had not been used.

@rdhabalia
Copy link
Contributor Author

@liangyepianzhou @BewareMyPower
It seems this PIP was created before #20225 . But anyways, I will rename my PIP to 267 to resolve the conflict.

@github-actions
Copy link

github-actions bot commented Jun 9, 2023

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Jun 9, 2023
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@eolivelli
Copy link
Contributor

I had missed this work.
I support it

@github-actions github-actions bot removed the Stale label Jun 10, 2023
@codelipenghui codelipenghui dismissed their stale review June 21, 2023 03:36

Dismiss the review

@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Jul 22, 2023
@Technoboy- Technoboy- modified the milestones: 3.1.0, 3.2.0 Jul 31, 2023
@Technoboy- Technoboy- modified the milestones: 3.2.0, 3.3.0 Dec 22, 2023
@coderzc coderzc modified the milestones: 3.3.0, 3.4.0 May 8, 2024
@lhotari lhotari modified the milestones: 4.0.0, 4.1.0 Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.