-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async pools #3433
Async pools #3433
Commits on Dec 17, 2021
-
Create batch_worker abstraction
This is a gen_server, whose purpose is to be part of a worker pool. The server accumulates tasks to take action to, until the number of tasks reaches a configured threshold, or, a timeout is fired since the first task that was requested in this batch (this is to ensure that no queue awaits too long for action). Note that we don't trigger timeouts unless the queue is not empty, to avoid waking up an unused worker. The worker also takes a callback, and a map with any metadata the callback may need. When is time to flush a queue, it is this callback that will be ran, taking the queue in the order in which it has been submitted, and the optionally provided metadata. Most of the code for this worker is learnt from MAM's async workers, with some changes or learnt lessons: * Note that this worker sets info logs on start and terminate. * There is also a debug log on queue flushes. * The terminate callback flushes the task queue if there's any task pending, to try to avoid losing tasks inadvertently. * The format_status callback prevents printing the actual tasks on crashes or system state lookups, as the tasks might contain data private to clients under any GDPR-like legal constrains. * Garbage collection is triggered after every flush, first to ensure saving memory, but second, to ensure the release of any binaries this worker might be holding.
Configuration menu - View commit details
-
Copy full SHA for 35536b9 - Browse repository at this point
Copy the full SHA 35536b9View commit details -
Create async pools abstraction on top of batch_workers
This module implements a worker pool abstraction to handle batch workers. It will start a transient supervisor under ejabberd_sup, that will spawn a pool using the `worker_pool` library. This will then supervise the three supervisors that `worker_pool` supervises (time-manager, queue-manager, and worker-sup). One of these, 'worker-sup', will in turn spawn all the mongoose_batch_worker workers with the correct configuration.
Configuration menu - View commit details
-
Copy full SHA for 94b39e9 - Browse repository at this point
Copy the full SHA 94b39e9View commit details -
Rework mam async workers using the new abstraction
First of all, remove all the old modules implementing the mam rdbms async pools. Then, create a new module called mod_mam_rdbms_arch_async, that abstract what the previous two where doing, by using the modules created in the two previous commits. This module starts the pools with all the correct parameters, and registers the hooks that will then call `worker_pool` using the `hash_worker` strategy, to ensure MAM entries to a single archive are all processed in parallel. Note a few lessons here: * Selecting the worker isn't sound based on the `rem` operator, because the distribution of archives over workers is not at all ensured to be uniform. `worker_pool` uses `erlang:phash2/2`, which ensures such uniform distribution. * The callbacks for the batch workers abstract all the metrics. * mod_mam_meta parsing has breaking changes: this is what was reported as broken in the previous implementation. Now there's an entire toml section called `async_writer`, that can be enabled, disabled and configured both globally for mam, or in a fine-grained manner within the `pm` and `muc` sections.
Configuration menu - View commit details
-
Copy full SHA for c1c21b6 - Browse repository at this point
Copy the full SHA c1c21b6View commit details -
Add an init_callback parameter to the async pool
The purpose of this is for the supervisor to be able to dynamically generate data that will be passed to the workers, an example being, the supervisor might create ets tables. We keep this in the context of the supervisor and its workers, to keep the abstraction isolated to this supervision tree: an ets table may not make any sense outside of this supervision tree, we might want the ets table not to even be named, so that nobody outside of the tree will have access to it, only the workers will get an opaque reference. We will also want the ets tables to be cleaned when the supervision tree dies, as the data might not make sense on a newly restarted tree anymore, or we will want the tree to die fast if the ets table is lost. This init callback can also set global configuration flags, or send messages, or log something, or define new metrics, or many other things that for example the current MongooseIM init callbacks for `mongoose_wpool` are already doing.
Configuration menu - View commit details
-
Copy full SHA for 674af75 - Browse repository at this point
Copy the full SHA 674af75View commit details -
Use stored instead of generated pool names
The whole operation to build lists and append them and generate an atom in the end is quite expensive: not only it generates a lot of lists and copies of lists that later need to be garbage collected, but also, the `list_to_atom` operation is a concurrent bottleneck: the atom table is not optimised for inserts, only for reads, this one needs to grab locks and check if the atom already existed and so on. Instead, store the name on a persistent_term record, and simply fetch that when needed, making the hotter part of the code, finding the pool to submit a task to, faster.
Configuration menu - View commit details
-
Copy full SHA for 68a3345 - Browse repository at this point
Copy the full SHA 68a3345View commit details -
Configuration menu - View commit details
-
Copy full SHA for b535e8f - Browse repository at this point
Copy the full SHA b535e8fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 385221f - Browse repository at this point
Copy the full SHA 385221fView commit details -
Note that running garbage collection within a function will not collect the references this function has, so the queue will actually not be cleared after being flushed as it was expected. Instead, we use the option `async` for the GC. This will make this process end its reductions, and once it has been preempted, it will be scheduled for garbage collection, and thereafter, a message will be delivered notifying him of so.
Configuration menu - View commit details
-
Copy full SHA for ca68102 - Browse repository at this point
Copy the full SHA ca68102View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6479445 - Browse repository at this point
Copy the full SHA 6479445View commit details