Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIP-229: Add a common interface to get fields of the MessageIdData #18950

Closed
BewareMyPower opened this issue Dec 16, 2022 · 4 comments
Closed

Comments

@BewareMyPower
Copy link
Contributor

BewareMyPower commented Dec 16, 2022

Motivation

MessageIdData is defined in PulsarApi.proto:

message MessageIdData {
required uint64 ledgerId = 1;
required uint64 entryId = 2;
optional int32 partition = 3 [default = -1];
optional int32 batch_index = 4 [default = -1];
repeated int64 ack_set = 5;
optional int32 batch_size = 6;
// For the chunk message id, we need to specify the first chunk message id.
optional MessageIdData first_chunk_message_id = 7;
}

However, there is no common interface to get the specific field like ledger id and entry id. These details might be not much useful to application users, but they are important to developers of Pulsar and its ecosystems. Currently, we can only access the specific implementations directly. So there are a lot of unnecessary type assumptions checks like

if (messageId instanceof BatchMessageIdImpl) {
BatchMessageIdImpl batchMessageId = (BatchMessageIdImpl) messageId;
this.ledgerId = batchMessageId.getLedgerId();
this.entryId = batchMessageId.getEntryId();
this.batchIndex = batchMessageId.getBatchIndex();
this.partitionIndex = batchMessageId.partitionIndex;
} else if (messageId instanceof MessageIdImpl) {
MessageIdImpl messageIdImpl = (MessageIdImpl) messageId;
this.ledgerId = messageIdImpl.getLedgerId();
this.entryId = messageIdImpl.getEntryId();
this.partitionIndex = messageIdImpl.partitionIndex;
} else if (messageId instanceof TopicMessageIdImpl) {

And for TopicMessageIdImpl, we have to check the MessageId is a TopicMessageIdImpl and then call the getInnerMessageId() method:

checkArgument(messageId instanceof TopicMessageIdImpl);

MessageId innerId = topicMessageId.getInnerMessageId();

Another problem is that when users want to create a MessageId used in seek or acknowledge, they have to create instances of these implementations defined in pulsar-client module and the impl package. Any API change to these implementations could bring a breaking change. However, it should be allowed to make some modifications on the impl package, otherwise differing api and impl would be meaningless.

Goal

All the problems are all caused by the lack of abstraction of MessageIdData. There is no interface to represent the MessageIdData. This proposal aims at adding a common interface to access the fields of MessageIdData so that all usages of msgId instanceof XXXImpl could be simplified to something like (MessageIdAdv) msgId

API Changes

Introduce a new interface to represent a MessageIdData.

package org.apache.pulsar.client.api;

/**
 * The {@link MessageId} interface provided for advanced users.
 * <p>
 * It supports retrieving any field of {@link org.apache.pulsar.common.api.proto.MessageIdData}, which is generated
 * from `PulsarApi.proto`.
 */
public interface MessageIdAdv extends MessageId {

    long getLedgerId();

    long getEntryId();

    default int getPartition() {
        return -1;
    }

    default int getBatchIndex() {
        return -1;
    }

    default @Nullable BitSet getAckSet() {
        return null;
    }

    default int getBatchSize() {
        return 0;
    }

    default @Nullable MessageIdAdv getFirstChunkMessageId() {
        return null;
    }

    default boolean isBatch() {
        return getBatchIndex() >= 0 && getBatchSize() > 0;
    }
}

Since the aimed developers are Pulsar core developers, it's added in the pulsar-common module (PulsarApi.proto is also in this module), not the pulsar-client-api module.

To avoid naming conflicts with proto.MessageIdData, the interface name just adds the PulsarApi prefix to represent it's a representation of the message Id defined in PulsarApi.proto.

Only getLedgerId and getEntryId are required because when seeking to a specific position, the partition field is not needed. For example, if users want to create its own implementation for seek or acknowledge, they can create an implementation like:

    @AllArgsConstructor
    private static class NonBatchedMessageId implements MessageIdAdv {
        // For non-batched message id in a single topic, only ledger id and entry id are required

        private final long ledgerId;
        private final long entryId;

        @Override
        public byte[] toByteArray() {
            return new byte[0]; // dummy implementation
        }

        @Override
        public long getLedgerId() {
            return ledgerId;
        }

        @Override
        public long getEntryId() {
            return entryId;
        }
    }

Implementation

Most modifications are replacing the msgId instanceof XXXImpl with (MessageIdAdv) msgId. And some methods like TopicMessageIdImpl#getInnerMessageId will be marked as deprecated. They might need to be retained for one or more major releases for compatibility.

There is a point that since we use a BitSet to represent the ack set, which is a long array defined in PulsarApi.proto.

repeated int64 ack_set = 5;

We have to deprecated the BatchMessageAcker, which is just a wrapper of a BitSet and the batch size. After that, we no longer needs to acknowledge one message of the batch like:

if (msgId instanceof BatchMessageIdImpl) {
    ((BatchMessageIdImpl) msgId).getAcker().ackIndividual();
}

Use the getAckSet() API and clear the specific bits of the BitSet instead.

Alternatives

Add the getters to the MessageId directly. This idea was denied from the discussion here: https://lists.apache.org/thread/rdkqnkohbmkjjs61hvoqplhhngr0b0sd

Anything else?

No response

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Jan 19, 2023
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Jan 28, 2023
…e MessageIdData

Master issue: apache#18950

### Motivation

We need a common interface to get fields of the MessageIdData. After
that, we won't need to assert a MessageId implementation is an instance
of a specific class. And we can pass our customized MessageId
implementation to APIs like `acknowledge` and `seek`.

### Modifications

- Add `MessageIdAdv` to get fields of `MessageIdData`, make all
  MessageId implementations inherit it (except `MultiMessageIdImpl`).
- Deprecate `BatchMessageAcker` by adding the ACK bit set field and the
  `PreviousMessageAcknowledger` interface to `BatchMessageIdImpl`.
- Deprecate `TopicMessageIdImpl#getInnerMessageId` by passing the
  `TopicMessageIdImpl` directly.
- Remove `instanceof BatchMessageIdImpl` checks in `pulsar-client`
  module by casting to `MessageIdAdv`.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Jan 28, 2023
…e MessageIdData

Master issue: apache#18950

### Motivation

We need a common interface to get fields of the MessageIdData. After
that, we won't need to assert a MessageId implementation is an instance
of a specific class. And we can pass our customized MessageId
implementation to APIs like `acknowledge` and `seek`.

### Modifications

- Add `MessageIdAdv` to get fields of `MessageIdData`, make all
  MessageId implementations inherit it (except `MultiMessageIdImpl`).
- Deprecate `BatchMessageAcker` by adding the ACK bit set field and the
  `PreviousMessageAcknowledger` interface to `BatchMessageIdImpl`.
- Deprecate `TopicMessageIdImpl#getInnerMessageId` by passing the
  `TopicMessageIdImpl` directly.
- Remove `instanceof BatchMessageIdImpl` checks in `pulsar-client`
  module by casting to `MessageIdAdv`.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Jan 29, 2023
…e MessageIdData

Master issue: apache#18950

### Motivation

We need a common interface to get fields of the MessageIdData. After
that, we won't need to assert a MessageId implementation is an instance
of a specific class. And we can pass our customized MessageId
implementation to APIs like `acknowledge` and `seek`.

### Modifications

- Add `MessageIdAdv` to get fields of `MessageIdData`, make all
  MessageId implementations inherit it (except `MultiMessageIdImpl`).
- Deprecate `BatchMessageAcker` by adding the ACK bit set field and the
  `PreviousMessageAcknowledger` interface to `BatchMessageIdImpl`.
- Deprecate `TopicMessageIdImpl#getInnerMessageId` by passing the
  `TopicMessageIdImpl` directly.
- Remove `instanceof BatchMessageIdImpl` checks in `pulsar-client`
  module by casting to `MessageIdAdv`.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Jan 30, 2023
…e MessageIdData

Master issue: apache#18950

### Motivation

We need a common interface to get fields of the MessageIdData. After
that, we won't need to assert a MessageId implementation is an instance
of a specific class. And we can pass our customized MessageId
implementation to APIs like `acknowledge` and `seek`.

### Modifications

- Add `MessageIdAdv` to get fields of `MessageIdData`, make all
  MessageId implementations inherit it (except `MultiMessageIdImpl`).
- Replace `BatchMessageAcker` with the `BitSet` for ACK.
- Remove `TopicMessageIdImpl#getInnerMessageId` since a
  `TopicMessageIdImpl` can be treated as its underlying `MessageId`
  implementation now.
- Remove `instanceof BatchMessageIdImpl` checks in `pulsar-client`
  module by casting to `MessageIdAdv`.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Feb 1, 2023
…ssageIdData

Master issue: apache#18950

### Motivation

We need a common interface to get fields of the MessageIdData. After
that, we won't need to assert a MessageId implementation is an instance
of a specific class. And we can pass our customized MessageId
implementation to APIs like `acknowledge` and `seek`.

### Modifications

- Add `MessageIdAdv` to get fields of `MessageIdData`, make all
  MessageId implementations inherit it (except `MultiMessageIdImpl`).
- Add `MessageIdAdvUtils` for the most common used methods.
- Replace `BatchMessageAcker` with the `BitSet` for ACK.
- Remove `TopicMessageIdImpl#getInnerMessageId` since a
  `TopicMessageIdImpl` can be treated as its underlying `MessageId`
  implementation now.
- Remove `instanceof BatchMessageIdImpl` checks in `pulsar-client`
  module by casting to `MessageIdAdv`.

After this refactoring, the 3rd party library will no longer need to
cast a `MessageId` to a specific implementation. It only needs to cast
`MessageId` to `MessageIdAdv`. Users can also implement their own util
class so the methods of `MessageIdAdvUtils` are all not public.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Feb 3, 2023
…ssageIdData

Master issue: apache#18950

### Motivation

We need a common interface to get fields of the MessageIdData. After
that, we won't need to assert a MessageId implementation is an instance
of a specific class. And we can pass our customized MessageId
implementation to APIs like `acknowledge` and `seek`.

### Modifications

- Add `MessageIdAdv` to get fields of `MessageIdData`, make all
  MessageId implementations inherit it (except `MultiMessageIdImpl`).
- Add `MessageIdAdvUtils` for the most common used methods.
- Replace `BatchMessageAcker` with the `BitSet` for ACK.
- Remove `TopicMessageIdImpl#getInnerMessageId` since a
  `TopicMessageIdImpl` can be treated as its underlying `MessageId`
  implementation now.
- Remove `instanceof BatchMessageIdImpl` checks in `pulsar-client`
  module by casting to `MessageIdAdv`.

After this refactoring, the 3rd party library will no longer need to
cast a `MessageId` to a specific implementation. It only needs to cast
`MessageId` to `MessageIdAdv`. Users can also implement their own util
class so the methods of `MessageIdAdvUtils` are all not public.

### Verifications

Add `CustomMessageIdTest` to verify a simple MessageIdAdv implementation
that only has the (ledger id, entry id, batch idx, batch size) fields
also works for seek and acknowledgment.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Feb 8, 2023
…ssageIdData

Master issue: apache#18950

### Motivation

We need a common interface to get fields of the MessageIdData. After
that, we won't need to assert a MessageId implementation is an instance
of a specific class. And we can pass our customized MessageId
implementation to APIs like `acknowledge` and `seek`.

### Modifications

- Add `MessageIdAdv` to get fields of `MessageIdData`, make all
  MessageId implementations inherit it (except `MultiMessageIdImpl`).
- Add `MessageIdAdvUtils` for the most common used methods.
- Replace `BatchMessageAcker` with the `BitSet` for ACK.
- Remove `TopicMessageIdImpl#getInnerMessageId` since a
  `TopicMessageIdImpl` can be treated as its underlying `MessageId`
  implementation now.
- Remove `instanceof BatchMessageIdImpl` checks in `pulsar-client`
  module by casting to `MessageIdAdv`.

After this refactoring, the 3rd party library will no longer need to
cast a `MessageId` to a specific implementation. It only needs to cast
`MessageId` to `MessageIdAdv`. Users can also implement their own util
class so the methods of `MessageIdAdvUtils` are all not public.

### Verifications

Add `CustomMessageIdTest` to verify a simple MessageIdAdv implementation
that only has the (ledger id, entry id, batch idx, batch size) fields
also works for seek and acknowledgment.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Feb 14, 2023
…ssageIdData

Master issue: apache#18950

### Motivation

We need a common interface to get fields of the MessageIdData. After
that, we won't need to assert a MessageId implementation is an instance
of a specific class. And we can pass our customized MessageId
implementation to APIs like `acknowledge` and `seek`.

### Modifications

- Add `MessageIdAdv` to get fields of `MessageIdData`, make all
  MessageId implementations inherit it (except `MultiMessageIdImpl`).
- Add `MessageIdAdvUtils` for the most common used methods.
- Replace `BatchMessageAcker` with the `BitSet` for ACK.
- Remove `TopicMessageIdImpl#getInnerMessageId` since a
  `TopicMessageIdImpl` can be treated as its underlying `MessageId`
  implementation now.
- Remove `instanceof BatchMessageIdImpl` checks in `pulsar-client`
  module by casting to `MessageIdAdv`.

After this refactoring, the 3rd party library will no longer need to
cast a `MessageId` to a specific implementation. It only needs to cast
`MessageId` to `MessageIdAdv`. Users can also implement their own util
class so the methods of `MessageIdAdvUtils` are all not public.

### Verifications

Add `CustomMessageIdTest` to verify a simple MessageIdAdv implementation
that only has the (ledger id, entry id, batch idx, batch size) fields
also works for seek and acknowledgment.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Feb 17, 2023
…ssageIdData

Master issue: apache#18950

### Motivation

We need a common interface to get fields of the MessageIdData. After
that, we won't need to assert a MessageId implementation is an instance
of a specific class. And we can pass our customized MessageId
implementation to APIs like `acknowledge` and `seek`.

### Modifications

- Add `MessageIdAdv` to get fields of `MessageIdData`, make all
  MessageId implementations inherit it (except `MultiMessageIdImpl`).
- Add `MessageIdAdvUtils` for the most common used methods.
- Replace `BatchMessageAcker` with the `BitSet` for ACK.
- Remove `TopicMessageIdImpl#getInnerMessageId` since a
  `TopicMessageIdImpl` can be treated as its underlying `MessageId`
  implementation now.
- Remove `instanceof BatchMessageIdImpl` checks in `pulsar-client`
  module by casting to `MessageIdAdv`.

After this refactoring, the 3rd party library will no longer need to
cast a `MessageId` to a specific implementation. It only needs to cast
`MessageId` to `MessageIdAdv`. Users can also implement their own util
class so the methods of `MessageIdAdvUtils` are all not public.

### Verifications

Add `CustomMessageIdTest` to verify a simple MessageIdAdv implementation
that only has the (ledger id, entry id, batch idx, batch size) fields
also works for seek and acknowledgment.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Feb 21, 2023
…ssageIdData

Master issue: apache#18950

### Motivation

We need a common interface to get fields of the MessageIdData. After
that, we won't need to assert a MessageId implementation is an instance
of a specific class. And we can pass our customized MessageId
implementation to APIs like `acknowledge` and `seek`.

### Modifications

- Add `MessageIdAdv` to get fields of `MessageIdData`, make all
  MessageId implementations inherit it (except `MultiMessageIdImpl`).
- Add `MessageIdAdvUtils` for the most common used methods.
- Replace `BatchMessageAcker` with the `BitSet` for ACK.
- Remove `TopicMessageIdImpl#getInnerMessageId` since a
  `TopicMessageIdImpl` can be treated as its underlying `MessageId`
  implementation now.
- Remove `instanceof BatchMessageIdImpl` checks in `pulsar-client`
  module by casting to `MessageIdAdv`.

After this refactoring, the 3rd party library will no longer need to
cast a `MessageId` to a specific implementation. It only needs to cast
`MessageId` to `MessageIdAdv`. Users can also implement their own util
class so the methods of `MessageIdAdvUtils` are all not public.

### Verifications

Add `CustomMessageIdTest` to verify a simple MessageIdAdv implementation
that only has the (ledger id, entry id, batch idx, batch size) fields
also works for seek and acknowledgment.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Mar 6, 2023
…ssageIdData

Master issue: apache#18950

### Motivation

We need a common interface to get fields of the MessageIdData. After
that, we won't need to assert a MessageId implementation is an instance
of a specific class. And we can pass our customized MessageId
implementation to APIs like `acknowledge` and `seek`.

### Modifications

- Add `MessageIdAdv` to get fields of `MessageIdData`, make all
  MessageId implementations inherit it (except `MultiMessageIdImpl`).
- Add `MessageIdAdvUtils` for the most common used methods.
- Replace `BatchMessageAcker` with the `BitSet` for ACK.
- Remove `TopicMessageIdImpl#getInnerMessageId` since a
  `TopicMessageIdImpl` can be treated as its underlying `MessageId`
  implementation now.
- Remove `instanceof BatchMessageIdImpl` checks in `pulsar-client`
  module by casting to `MessageIdAdv`.

After this refactoring, the 3rd party library will no longer need to
cast a `MessageId` to a specific implementation. It only needs to cast
`MessageId` to `MessageIdAdv`. Users can also implement their own util
class so the methods of `MessageIdAdvUtils` are all not public.

### Verifications

Add `CustomMessageIdTest` to verify a simple MessageIdAdv implementation
that only has the (ledger id, entry id, batch idx, batch size) fields
also works for seek and acknowledgment.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Mar 15, 2023
…ssageIdData

Master issue: apache#18950

### Motivation

We need a common interface to get fields of the MessageIdData. After
that, we won't need to assert a MessageId implementation is an instance
of a specific class. And we can pass our customized MessageId
implementation to APIs like `acknowledge` and `seek`.

### Modifications

- Add `MessageIdAdv` to get fields of `MessageIdData`, make all
  MessageId implementations inherit it (except `MultiMessageIdImpl`).
- Add `MessageIdAdvUtils` for the most common used methods.
- Replace `BatchMessageAcker` with the `BitSet` for ACK.
- Remove `TopicMessageIdImpl#getInnerMessageId` since a
  `TopicMessageIdImpl` can be treated as its underlying `MessageId`
  implementation now.
- Remove `instanceof BatchMessageIdImpl` checks in `pulsar-client`
  module by casting to `MessageIdAdv`.

After this refactoring, the 3rd party library will no longer need to
cast a `MessageId` to a specific implementation. It only needs to cast
`MessageId` to `MessageIdAdv`. Users can also implement their own util
class so the methods of `MessageIdAdvUtils` are all not public.

### Verifications

Add `CustomMessageIdTest` to verify a simple MessageIdAdv implementation
that only has the (ledger id, entry id, batch idx, batch size) fields
also works for seek and acknowledgment.
@rdhabalia
Copy link
Contributor

as mentioned into PR: #19414
we should not let users use `ledgerId/entryId. what if we replace bookkeeper with other some other storage? in that case, such codebase will be broken and exposing internal data and encouraging users to use it not a good design decision. I didn't review this PIP before but such discussion happened multiple times in past and Pulsar community had rejected this proposal in past. As we have not done any release yet, so, it's better to revert this PR soon.

@codelipenghui
Copy link
Contributor

@BewareMyPower

As described in the proposal

Since the aimed developers are Pulsar core developers, it's added in the pulsar-common module (PulsarApi.proto is also in this module), not the pulsar-client-api module.

The MessageIdAdv should not be an API for users, right? Is it possible to move the pulsar-common or client implementation?

@rdhabalia I agree with the MessageIdAdv interface is not reasonable for users, and it is also not the motivation to expose it to users. Do you think we are good if the MessageIdAdv interface is moved to the internals?

@BewareMyPower
Copy link
Contributor Author

Is it possible to move the pulsar-common or client implementation?

Yes. I think it's reasonable.

BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Apr 19, 2023
### Motivation

apache#19414 does not follow the design
of apache#18950

> Since the aimed developers are Pulsar core developers, it's added in
> the pulsar-common module (PulsarApi.proto is also in this module), not
> the pulsar-client-api module.

The reason is that `TopicMessageId#create` now cannot be a
`MessageIdAdv` if `MessageIdAdv` is not in the `pulsar-client-api`
module.

### Modifications

- Move the `MessageIdAdv` class to the `pulsar-common` module.
- To handle the instance created by `TopicMessageId` well, add
  `MessageIdAdvUtils#convert` to add an extra deserialization and
  serialization of `MessageId`.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Apr 19, 2023
### Motivation

apache#19414 does not follow the design
of apache#18950

> Since the aimed developers are Pulsar core developers, it's added in
> the pulsar-common module (PulsarApi.proto is also in this module), not
> the pulsar-client-api module.

The reason is that `TopicMessageId#create` now cannot be a
`MessageIdAdv` if `MessageIdAdv` is not in the `pulsar-client-api`
module.

### Modifications

- Move the `MessageIdAdv` class to the `pulsar-common` module.
- Create a `TopicMessageIdImpl` instance for `TopicMessageId#create` via
  the `DefaultImplementation` class with the overhead of reflection.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this issue Apr 19, 2023
### Motivation

apache#19414 does not follow the design
of apache#18950

> Since the aimed developers are Pulsar core developers, it's added in
> the pulsar-common module (PulsarApi.proto is also in this module), not
> the pulsar-client-api module.

The reason is that `TopicMessageId#create` now cannot be a
`MessageIdAdv` if `MessageIdAdv` is not in the `pulsar-client-api`
module.

### Modifications

- Move the `MessageIdAdv` class to the `pulsar-common` module.
- Implement the `MessageIdAdv` interface in `TopicMessageIdImpl` instead
  of `TopicMessageId.Impl`.
- Create a `TopicMessageIdImpl` instance for `TopicMessageId#create` via
  the `DefaultImplementation` class with the overhead of reflection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants