-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SuperStream doesn't elect the single active consumer #7743
Comments
acogoluegnes
added a commit
that referenced
this issue
Mar 30, 2023
A group of consumers on a super stream can end up blocked without an active consumer. This can happen with consumer churn: one consumer gets removed, which makes the active consumer passive, but the former active consumer never gets to know because it has been removed itself. This commit changes the structure of the messages the SAC coordinator sends to consumer connections, to embed enough information to look up the group and to instruct it to choose a new active consumer when the race condition mentioned above comes up. Because of the changes in the structure of messages, a feature flag is required to make sure the SAC coordinator starts sending the new messages only when all the nodes have been upgraded. References #7743
acogoluegnes
added a commit
that referenced
this issue
Mar 31, 2023
A group of consumers on a super stream can end up blocked without an active consumer. This can happen with consumer churn: one consumer gets removed, which makes the active consumer passive, but the former active consumer never gets to know because it has been removed itself. This commit changes the structure of the messages the SAC coordinator sends to consumer connections, to embed enough information to look up the group and to instruct it to choose a new active consumer when the race condition mentioned above comes up. Because of the changes in the structure of messages, a feature flag is required to make sure the SAC coordinator starts sending the new messages only when all the nodes have been upgraded. References #7743
acogoluegnes
added a commit
that referenced
this issue
Apr 3, 2023
The stream plugin can send frames to a client connection and expect a response from it. This is used currently for the consumer_update frame (single active consumer feature). There was no timeout mechanism so far, so a slow or blocked application could prevent a group of consumers to move on. This commit introduces a timeout mechanism: if the expected response takes too long to arrive, the server assumes the connection is blocked and closes it. The default timeout is 60 seconds but it can be changed by setting the request_timeout parameter of the rabbitmq_stream application. Note the mechanism does not enforce the exact duration of the timeout, as a timer is set for the first request and re-used for other requests. With bad timing, a request can time out after twice as long as the set-up timeout. References #7743
michaelklishin
pushed a commit
that referenced
this issue
Apr 4, 2023
A group of consumers on a super stream can end up blocked without an active consumer. This can happen with consumer churn: one consumer gets removed, which makes the active consumer passive, but the former active consumer never gets to know because it has been removed itself. This commit changes the structure of the messages the SAC coordinator sends to consumer connections, to embed enough information to look up the group and to instruct it to choose a new active consumer when the race condition mentioned above comes up. Because of the changes in the structure of messages, a feature flag is required to make sure the SAC coordinator starts sending the new messages only when all the nodes have been upgraded. References #7743
michaelklishin
pushed a commit
that referenced
this issue
Apr 4, 2023
The stream plugin can send frames to a client connection and expect a response from it. This is used currently for the consumer_update frame (single active consumer feature). There was no timeout mechanism so far, so a slow or blocked application could prevent a group of consumers to move on. This commit introduces a timeout mechanism: if the expected response takes too long to arrive, the server assumes the connection is blocked and closes it. The default timeout is 60 seconds but it can be changed by setting the request_timeout parameter of the rabbitmq_stream application. Note the mechanism does not enforce the exact duration of the timeout, as a timer is set for the first request and re-used for other requests. With bad timing, a request can time out after twice as long as the set-up timeout. References #7743
mergify bot
pushed a commit
that referenced
this issue
Apr 4, 2023
A group of consumers on a super stream can end up blocked without an active consumer. This can happen with consumer churn: one consumer gets removed, which makes the active consumer passive, but the former active consumer never gets to know because it has been removed itself. This commit changes the structure of the messages the SAC coordinator sends to consumer connections, to embed enough information to look up the group and to instruct it to choose a new active consumer when the race condition mentioned above comes up. Because of the changes in the structure of messages, a feature flag is required to make sure the SAC coordinator starts sending the new messages only when all the nodes have been upgraded. References #7743 (cherry picked from commit 70538c5)
mergify bot
pushed a commit
that referenced
this issue
Apr 4, 2023
The stream plugin can send frames to a client connection and expect a response from it. This is used currently for the consumer_update frame (single active consumer feature). There was no timeout mechanism so far, so a slow or blocked application could prevent a group of consumers to move on. This commit introduces a timeout mechanism: if the expected response takes too long to arrive, the server assumes the connection is blocked and closes it. The default timeout is 60 seconds but it can be changed by setting the request_timeout parameter of the rabbitmq_stream application. Note the mechanism does not enforce the exact duration of the timeout, as a timer is set for the first request and re-used for other requests. With bad timing, a request can time out after twice as long as the set-up timeout. References #7743 (cherry picked from commit 763acc2)
mergify bot
pushed a commit
that referenced
this issue
Apr 4, 2023
A group of consumers on a super stream can end up blocked without an active consumer. This can happen with consumer churn: one consumer gets removed, which makes the active consumer passive, but the former active consumer never gets to know because it has been removed itself. This commit changes the structure of the messages the SAC coordinator sends to consumer connections, to embed enough information to look up the group and to instruct it to choose a new active consumer when the race condition mentioned above comes up. Because of the changes in the structure of messages, a feature flag is required to make sure the SAC coordinator starts sending the new messages only when all the nodes have been upgraded. References #7743 (cherry picked from commit 70538c5) (cherry picked from commit 221f10d) # Conflicts: # deps/rabbit/src/rabbit_core_ff.erl # deps/rabbit/src/rabbit_stream_sac_coordinator.erl # deps/rabbitmq_stream/src/rabbit_stream_reader.erl
mergify bot
pushed a commit
that referenced
this issue
Apr 4, 2023
The stream plugin can send frames to a client connection and expect a response from it. This is used currently for the consumer_update frame (single active consumer feature). There was no timeout mechanism so far, so a slow or blocked application could prevent a group of consumers to move on. This commit introduces a timeout mechanism: if the expected response takes too long to arrive, the server assumes the connection is blocked and closes it. The default timeout is 60 seconds but it can be changed by setting the request_timeout parameter of the rabbitmq_stream application. Note the mechanism does not enforce the exact duration of the timeout, as a timer is set for the first request and re-used for other requests. With bad timing, a request can time out after twice as long as the set-up timeout. References #7743 (cherry picked from commit 763acc2) (cherry picked from commit 62d016d) # Conflicts: # deps/rabbitmq_stream/src/rabbit_stream_reader.erl # deps/rabbitmq_stream/test/rabbit_stream_SUITE.erl
acogoluegnes
added a commit
that referenced
this issue
Apr 5, 2023
They can be useful and are not on hot paths, but they are replicated on all nodes as part of the state machine replication, so we are better off removing them to avoid noise. References #7743
acogoluegnes
added a commit
to rabbitmq/rabbitmq-stream-java-client
that referenced
this issue
Apr 5, 2023
github-actions bot
pushed a commit
to rabbitmq/rabbitmq-stream-java-client
that referenced
this issue
Apr 5, 2023
@fabiorosa-sn have you enabled |
yes, it's enabled. we are using the java stream client version 0.15.0 |
Here is the result of the command
as requested by @Gsantomaggio |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
The super stream doesn't elect the single active consumer when the consumers are restarted.
The consumers stop consuming and the status is always in
waiting
status. see the image:Reproduction steps
rabbitmq-streams add_super_stream invoices --partitions 10
Expected behavior
One single active consumer has to be active
Additional context
I noticed that the
invoices-1
is usually the first portion to have problems.The
invoices-0
partition usually worksThe other partitions, at some point, will have the same issue.
RabbitMQ 3.11.11
Java RabbitMQ Stream / Java 0.10.0-SNAPSHOT
The text was updated successfully, but these errors were encountered: