-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIP-145: Improve performance of regex subscriptions #14505
Labels
Comments
4 tasks
The issue had no activity for 30 days, mark with Stale label. |
The issue had no activity for 30 days, mark with Stale label. |
5 tasks
2 tasks
4 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Discussion thread: https://lists.apache.org/thread/51y8s6tfw1p1h5dcb9fgzkzd8tw2bqpz
Motivation
Pulsar allows consumers to subscribe to multiple topics by a pattern. When using this feature, consumers poll brokers for the list of all topics in a namespace and filter the list on the client side based on the pattern. This causes unnecessary network load since most of the time only a small fraction of returned topics match the pattern. In addition polling introduces latency in processing messages produced to a newly created topic.
Goal
This PIP proposes three changes to improve performance and decrease network utilization:
To help compatibility of new clients with older brokers, a new feature flag will be introduced for this feature. Brokers will return FeatureFlags as part of the CommandConnected message to let clients know what features they support.
First, the feature will be implemented in the broker and the Java client, but later other clients can also make use of the capability.
API Changes
Protocol Changes
New fields will be added to existing commands CommandGetTopicsOfNamespace and CommandGetTopicsOfNamespaceResponse.
Clients can register as topic list observers by sending the command CommandWatchTopicList:
Brokers will respond with a success message containing the watcher ID and the initial list of topics.
When new matching topics are added or deleted, the broker sends an update along with the hash computed from the whole list of matching topics (i.e. not just those that are listed in this message).
Clients can unsubscribe the watcher by sending a CommandUnwatchTopicList message, to which the response is a CommandWatchTopicListSuccess without any topics.
When a client connects to a broker it is notified if the broker supports topic watchers. If not, it will not send CommandWatchTopicList message and continues to rely on polling.
HTTP clients will not be changed and will continue using the current polling behaviour.
Configuration Changes
A new broker configuration property enableBrokerSideTopicFiltering will be added with default value true. Setting this to false will disable the feature.
Implementation
Polling with pattern
Pulsar clients will poll for topics with the pattern included in the command. Initially the client doesn't have a topicsHash, but once the broker has responded, clients will retain the hash and send it with the next command. If the response from the server contains no hash, the client will perform client side filtering. Otherwise, clients will consider the returned list as already filtered.
The Pulsar broker will check the command for topicsPattern. If there's no pattern in the message, the broker will respond with all topics of the namespace. If a pattern is present, the list of topics is filtered and a hash is computed from the list. If the request contains a topicsHash and it equals the current hash the response will not contain the list of topics, only the changed flag is set to false. The pattern in topicsPattern will be evaluated using java.util.regex.Pattern.
Notifications
If the broker supports topic list watchers, the client will create such a watcher by sending CommandWatchTopicList. A new class, org.apache.pulsar.TopicsListService will keep track of watchers and will listen to changes in the metadata. Whenever a topic is created it checks if any watchers should be notified and sends an update through the ServerCnx. To prevent memory leaks, all watchers will be removed from the TopicsListService when the ServerCnx's channel becomes inactive.
Compatibility
Old clients with new servers
When the server receives a message without the new fields, it will not filter the messages but sends the whole list of topics. These clients will ignore the new fields in the response.
New clients with old servers
New fields in the client request will be ignored by the protobuf parser. The server will send the unfiltered list and omit the new fields (as it does not know about them). The client will check the response for the new fields. If the hash is not present, the client filters the result as it does now.
The introduction of FeatureFlags in CommandConnected will prevent the client from sending CommandWatchTopicList messages to brokers that don't yet support it.
The same is true when broker-side topic filtering is switched off.
Rejected Alternatives
An alternative that came up during the discussion of this PIP is creating a system topic to which topic names are produced by the brokers and consumers subscribe to it to get notified. It was agreed that it would be very hard to make this reliable.
The text was updated successfully, but these errors were encountered: