-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Periodically check for unapplied policies on QQs #12412
base: main
Are you sure you want to change the base?
Periodically check for unapplied policies on QQs #12412
Conversation
Instead of checking the values for current configuration, represented in `rabbit_quorum_queue:handle_tick` by the `Overview` variable, against the effective policy, just regenerate the configuration and compare with the current configuration.
(some of this is just reverting to the original format to reduce the diff against main)
ShouldUpdate = NewPolicyConfig =/= CurrentPolicyConfig, | ||
case ShouldUpdate of | ||
true -> | ||
rabbit_log:debug("Re-applying policies to ~p", [amqqueue:get_name(Q)]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log messages should use rabbit_misc:rs/2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "maybe log" changes the internal API in ways that are difficult to justify.
@@ -1528,35 +1555,35 @@ reclaim_memory(Vhost, QueueName) -> | |||
ra_log_wal:force_roll_over({?RA_WAL_NAME, Node}). | |||
|
|||
%%---------------------------------------------------------------------------- | |||
dead_letter_handler(Q, Overflow) -> | |||
dead_letter_handler(Q, Overflow, ShouldLog) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly dislike this extra argument and how it changes existing functions. If an invalid overflow strategy is used, I don't see a problem with logging that periodically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I agree, I'm not sure what value they add to this PR
Servers), | ||
|
||
% Wait for the queue to be available again. | ||
lists:foreach(fun(Srv) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is reinventing rabbit_ct_helpers:await_condition/2
.
end, | ||
Consume([]). | ||
|
||
ensure_qq_proc_dead(Config, Server, RaName) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the target process recovers in fewer than 500ms, this function will loop forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ra supervisor has a max restart intensity of 2 restarts per 5 seconds https://github.com/rabbitmq/ra/blob/main/src/ra_server_sup.erl#L36-L37. So supervisor will give up eventually.
Otoh if the process restart takes more than 500ms then this loop would stop before the process is dead completely. But I think this is highly unlikely for a test queue.
rabbit_log:info("~ts: delivery_limit not set, defaulting to ~b", | ||
[rabbit_misc:rs(QName), ?DEFAULT_DELIVERY_LIMIT]), | ||
maybe_log(ShouldLog, info, | ||
"~ts: delivery_limit not set, defaulting to ~b", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dear Core Team, should this message be logged unconditionally as well? This is not a misconfiguration, if a user is happy with the default value and does not set an explicit delivery-limit, this will be logged all the time for all the quorum queues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's there to be clear that there is a default set. We can perhaps lower it to debug in 4.1 or remove it completely but I think making users aware of this potentially breaking change doesn't harm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move this to a function that is not called periodically and remove it from this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, we don't necessarily have to do it in rabbit_quorum_queue
if there's a more suitable alternative where the delivery limit is known.
Removes the usage of a ShouldLog parameter on several functions and limits the logging of the message warning about the delivery_limit not being set to the moment of queueDeclaration
Proposed Changes
As documented #7863 :
If a quorum queue is unavailable when a policy is changed it may never apply the resulting configuration command and thus be out of sync with the matching policy.
This PR provides a function in
rabbit_quorum_queue.erl
that checks whether the current Ra Machine configuration for a queue corresponds to the expected configuration to be in use based on defined policies. That function is called by each queue process on tick (handle_tick
).Types of Changes
Checklist
CONTRIBUTING.md
documentFurther Comments
.