Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async pools #3433

Merged
merged 9 commits into from
Dec 17, 2021
Merged

Async pools #3433

merged 9 commits into from
Dec 17, 2021

Commits on Dec 17, 2021

  1. Create batch_worker abstraction

    This is a gen_server, whose purpose is to be part of a worker pool. The
    server accumulates tasks to take action to, until the number of tasks
    reaches a configured threshold, or, a timeout is fired since the first
    task that was requested in this batch (this is to ensure that no queue
    awaits too long for action).
    
    Note that we don't trigger timeouts unless the queue is not empty, to
    avoid waking up an unused worker.
    
    The worker also takes a callback, and a map with any metadata the
    callback may need. When is time to flush a queue, it is this callback
    that will be ran, taking the queue in the order in which it has been
    submitted, and the optionally provided metadata.
    
    Most of the code for this worker is learnt from MAM's async workers,
    with some changes or learnt lessons:
    * Note that this worker sets info logs on start and terminate.
    * There is also a debug log on queue flushes.
    * The terminate callback flushes the task queue if there's any task
      pending, to try to avoid losing tasks inadvertently.
    * The format_status callback prevents printing the actual tasks on
      crashes or system state lookups, as the tasks might contain data
      private to clients under any GDPR-like legal constrains.
    * Garbage collection is triggered after every flush, first to ensure
      saving memory, but second, to ensure the release of any binaries this
      worker might be holding.
    NelsonVides committed Dec 17, 2021
    Configuration menu
    Copy the full SHA
    35536b9 View commit details
    Browse the repository at this point in the history
  2. Create async pools abstraction on top of batch_workers

    This module implements a worker pool abstraction to handle batch
    workers. It will start a transient supervisor under ejabberd_sup, that
    will spawn a pool using the `worker_pool` library. This will then
    supervise the three supervisors that `worker_pool` supervises
    (time-manager, queue-manager, and worker-sup). One of these,
    'worker-sup', will in turn spawn all the mongoose_batch_worker workers
    with the correct configuration.
    NelsonVides committed Dec 17, 2021
    Configuration menu
    Copy the full SHA
    94b39e9 View commit details
    Browse the repository at this point in the history
  3. Rework mam async workers using the new abstraction

    First of all, remove all the old modules implementing the mam rdbms
    async pools.
    
    Then, create a new module called mod_mam_rdbms_arch_async, that abstract
    what the previous two where doing, by using the modules created in the
    two previous commits. This module starts the pools with all the correct
    parameters, and registers the hooks that will then call `worker_pool`
    using the `hash_worker` strategy, to ensure MAM entries to a single
    archive are all processed in parallel.
    
    Note a few lessons here:
    * Selecting the worker isn't sound based on the `rem` operator, because
      the distribution of archives over workers is not at all ensured to be
      uniform. `worker_pool` uses `erlang:phash2/2`, which ensures such
      uniform distribution.
    * The callbacks for the batch workers abstract all the metrics.
    * mod_mam_meta parsing has breaking changes: this is what was reported
      as broken in the previous implementation. Now there's an entire toml
      section called `async_writer`, that can be enabled, disabled and
      configured both globally for mam, or in a fine-grained manner within
      the `pm` and `muc` sections.
    NelsonVides committed Dec 17, 2021
    Configuration menu
    Copy the full SHA
    c1c21b6 View commit details
    Browse the repository at this point in the history
  4. Add an init_callback parameter to the async pool

    The purpose of this is for the supervisor to be able to dynamically
    generate data that will be passed to the workers, an example being, the
    supervisor might create ets tables. We keep this in the context of the
    supervisor and its workers, to keep the abstraction isolated to this
    supervision tree: an ets table may not make any sense outside of this
    supervision tree, we might want the ets table not to even be named, so
    that nobody outside of the tree will have access to it, only the workers
    will get an opaque reference. We will also want the ets tables to be
    cleaned when the supervision tree dies, as the data might not make sense
    on a newly restarted tree anymore, or we will want the tree to die fast
    if the ets table is lost. This init callback can also set global
    configuration flags, or send messages, or log something, or define new
    metrics, or many other things that for example the current MongooseIM
    init callbacks for `mongoose_wpool` are already doing.
    NelsonVides committed Dec 17, 2021
    Configuration menu
    Copy the full SHA
    674af75 View commit details
    Browse the repository at this point in the history
  5. Use stored instead of generated pool names

    The whole operation to build lists and append them and generate an atom
    in the end is quite expensive: not only it generates a lot of lists and
    copies of lists that later need to be garbage collected, but also, the
    `list_to_atom` operation is a concurrent bottleneck: the atom table is
    not optimised for inserts, only for reads, this one needs to grab locks
    and check if the atom already existed and so on. Instead, store the name
    on a persistent_term record, and simply fetch that when needed, making
    the hotter part of the code, finding the pool to submit a task to,
    faster.
    NelsonVides committed Dec 17, 2021
    Configuration menu
    Copy the full SHA
    68a3345 View commit details
    Browse the repository at this point in the history
  6. Fix tests

    NelsonVides committed Dec 17, 2021
    Configuration menu
    Copy the full SHA
    b535e8f View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    385221f View commit details
    Browse the repository at this point in the history
  8. Optimise garbage collection

    Note that running garbage collection within a function will not collect
    the references this function has, so the queue will actually not be
    cleared after being flushed as it was expected. Instead, we use the
    option `async` for the GC. This will make this process end its
    reductions, and once it has been preempted, it will be scheduled for
    garbage collection, and thereafter, a message will be delivered
    notifying him of so.
    NelsonVides committed Dec 17, 2021
    Configuration menu
    Copy the full SHA
    ca68102 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    6479445 View commit details
    Browse the repository at this point in the history