Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster node start with many classic queues v2 #7676

Merged
merged 2 commits into from
Mar 21, 2023

Conversation

mkuratczyk
Copy link
Contributor

Faster CQv2 startup.

On my test box, with 100k CQs (with classic_queue.default_version = 2), on main:
import: ~70 seconds
start_app: ~55-65 seconds

With this PR:
import: ~48 seconds
start_app: ~29 seconds

@mkuratczyk mkuratczyk marked this pull request as draft March 20, 2023 12:14
@mkuratczyk mkuratczyk requested a review from lhoguin March 20, 2023 12:31
@mkuratczyk mkuratczyk marked this pull request as ready for review March 20, 2023 12:32
@mkuratczyk mkuratczyk marked this pull request as draft March 20, 2023 12:32
lhoguin
lhoguin previously approved these changes Mar 20, 2023
This cuts ~10s of node startup time with 100k classic queues v2.
ensure_dir only if the write failed
use raw files
@mkuratczyk mkuratczyk marked this pull request as ready for review March 21, 2023 12:09
@mkuratczyk mkuratczyk merged commit 4da170f into main Mar 21, 2023
@mkuratczyk mkuratczyk deleted the faster-queue-index-start branch March 21, 2023 12:12
mergify bot pushed a commit that referenced this pull request Mar 21, 2023
* Faster all_queue_directory_names/1
* Optimise writing stub files

Combined, this reduces node startup time by half with many empty classic queues v2

(cherry picked from commit 4da170f)
michaelklishin added a commit that referenced this pull request Mar 21, 2023
Faster node start with many classic queues v2 (backport #7676)
@michaelklishin michaelklishin added this to the 3.12.0 milestone Mar 21, 2023
mkuratczyk added a commit that referenced this pull request Mar 23, 2023
Per-vhost DETS file with recovery terms for all queues is a bottleneck
when stopping RabbitMQ - all queues try save their state, leading
to a very long file server mailbox and very unpredictable time
to stop RabbitMQ (on my machine it can vary from 20 seconds to 5 minutes
with 100k classic queues).

In this PR we can still read the recovery terms from DETS but we only
save them in per-queue files. This way each queue can quickly store its
state. Under the same condition, my machine can consistently stop
RabbitMQ in 15 seconds or so.

The tradeoff is a slower startup time: on my machine, it goes up from
29 seconds to 38 seconds, but that's still better than what we had until
#7676 was merged a few
days ago. More importantly, the total of stop+start is lower and more
predictable.

This PR also improves shutdown with many classic queues v1.
Startup time with 100k CQv1s is so long and unpredictable that it's hard
to even tell if this PR affects it (it varies from 4 to 8 minutes for me).
mkuratczyk added a commit that referenced this pull request Mar 23, 2023
Per-vhost DETS file with recovery terms for all queues is a bottleneck
when stopping RabbitMQ - all queues try save their state, leading
to a very long file server mailbox and very unpredictable time
to stop RabbitMQ (on my machine it can vary from 20 seconds to 5 minutes
with 100k classic queues).

In this PR we can still read the recovery terms from DETS but we only
save them in per-queue files. This way each queue can quickly store its
state. Under the same condition, my machine can consistently stop
RabbitMQ in 15 seconds or so.

The tradeoff is a slower startup time: on my machine, it goes up from
29 seconds to 38 seconds, but that's still better than what we had until
#7676 was merged a few
days ago. More importantly, the total of stop+start is lower and more
predictable.

This PR also improves shutdown with many classic queues v1.
Startup time with 100k CQv1s is so long and unpredictable that it's hard
to even tell if this PR affects it (it varies from 4 to 8 minutes for me).

Unfortunately this PR makes startup on MacOS slower (~55s instead of 30s
for me), but we don't have to optimise for that. In most cases (with
much fewer queues), it won't be noticable anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants