Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
CMQ: Propagate requeue on promotion to master
When a node goes down a slave gets promoted to master. When this happens the new master requeues all messages pending acks. If x-max-length is defined and the queue length after requeue goes over the limit, the new master will start dropping messages immediately. This causes issues for other slaves because they do not requeue their messages automatically, instead they wait for the new master to tell them what to do. This eventually triggers an assert because the queue length are unexpectedly out of sync when the first drop message is propagated to the cluster. This issue must have been present for a very long time, probably since e352608. The fix is to make the new master propagate the requeues when it gets promoted. To reproduce, a cluster must be started, ha-mode: all set via policies, and perf-test started with the following arguments: perf-test -x 1 -y 1 -r 10000 -R 50 -c 500 -s 1000 -u v2 \ -qa x-queue-version=2,x-max-length=10000 -ad false -f persistent Wait a little bit for the queue to have 10000+ ready messages (not total, total will be more) and then kill the master node (usually the first pid that 'ps -aux | grep beam' gives you). The crashes will be logged in the slave node that was not promoted (node 2 in my case).
- Loading branch information