Skip to content

Commit

Permalink
CMQ: Propagate requeue on promotion to master
Browse files Browse the repository at this point in the history
When a node goes down a slave gets promoted to master. When this
happens the new master requeues all messages pending acks. If
x-max-length is defined and the queue length after requeue goes
over the limit, the new master will start dropping messages
immediately.

This causes issues for other slaves because they do not requeue
their messages automatically, instead they wait for the new
master to tell them what to do. This eventually triggers an
assert because the queue length are unexpectedly out of sync
when the first drop message is propagated to the cluster.

This issue must have been present for a very long time,
probably since e352608.

The fix is to make the new master propagate the requeues when
it gets promoted.

To reproduce, a cluster must be started, ha-mode: all set via
policies, and perf-test started with the following arguments:

perf-test -x 1 -y 1 -r 10000 -R 50 -c 500 -s 1000 -u v2 \
    -qa x-queue-version=2,x-max-length=10000 -ad false -f persistent

Wait a little bit for the queue to have 10000+ ready messages
(not total, total will be more) and then kill the master node
(usually the first pid that 'ps -aux | grep beam' gives you).
The crashes will be logged in the slave node that was not
promoted (node 2 in my case).
  • Loading branch information
lhoguin committed Mar 23, 2023
1 parent 9d72450 commit 8da5acf
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion deps/rabbit/src/rabbit_mirror_queue_master.erl
Original file line number Diff line number Diff line change
Expand Up @@ -513,7 +513,8 @@ zip_msgs_and_acks(Msgs, AckTags, Accumulator,
master_state().

promote_backing_queue_state(QName, CPid, BQ, BQS, GM, AckTags, Seen, KS) ->
{_MsgIds, BQS1} = BQ:requeue(AckTags, BQS),
{MsgIds, BQS1} = BQ:requeue(AckTags, BQS),
ok = gm:broadcast(GM, {requeue, MsgIds}),
Len = BQ:len(BQS1),
Depth = BQ:depth(BQS1),
true = Len == Depth, %% ASSERTION: everything must have been requeued
Expand Down

0 comments on commit 8da5acf

Please sign in to comment.