Docs: add section on basic performance benchmark (aiidateam#5635)

The section provides a basic script to benchmark the performance of an AiiDA installation by launching a number of `ArithmeticAddCalculation` jobs. The script by default automatically sets up the required `Code` and localhost `Computer`. All created nodes are automatically deleted from the database at the end. The documentation gives instructions on how to run the script and provides example output of a run on a typical work station including completion times for runs with variable number of daemon workers. This should give users an idea of the performance of their installation. Cherry-pick: 07c1ba9
sphuber · Oct 27, 2022 · 6f8cba7 · 6f8cba7
1 parent a1b9f79
commit 6f8cba7
Show file tree

Hide file tree

Showing 4 changed files with 244 additions and 6 deletions.
diff --git a/docs/source/howto/include/scripts/performance_benchmark_base.py b/docs/source/howto/include/scripts/performance_benchmark_base.py
@@ -0,0 +1,127 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+"""Script to benchmark the performance of the AiiDA workflow engine on a given installation."""
+import click
+
+from aiida.cmdline.params import options
+from aiida.cmdline.utils import decorators, echo
+
+
+@click.command()
+@options.CODE(required=False, help='A code that can run the ``ArithmeticAddCalculation``, for example bash.')
+@click.option('-n', 'number', type=int, default=100, show_default=True, help='The number of processes to submit.')
+@decorators.with_dbenv()
+def main(code, number):
+    """Submit a number of ``ArithmeticAddCalculation`` to the daemon and record time to completion.
+
+    This command requires the daemon to be running.
+
+    The script will submit a configurable number of ``ArithmeticAddCalculation`` jobs. By default, the jobs are executed
+    using the ``bash`` executable of the system. If this executable cannot be found the script will exit. The jobs will
+    be run on the localhost, which is automatically created and configured. At the end of the script, the created nodes
+    will be deleted, as well as the code and computer, if they were automatically setup.
+    """
+    import shutil
+    import tempfile
+    import time
+    import uuid
+
+    from aiida import orm
+    from aiida.common import exceptions
+    from aiida.engine import submit
+    from aiida.engine.daemon.client import get_daemon_client
+    from aiida.plugins import CalculationFactory
+    from aiida.tools.graph.deletions import delete_nodes
+
+    client = get_daemon_client()
+
+    if not client.is_daemon_running:
+        echo.echo_critical('The daemon is not running.')
+
+    computer_created = False
+    code_created = False
+
+    if not code:
+        label = f'benchmark-{uuid.uuid4().hex[:8]}'
+        computer = orm.Computer(
+            label=label,
+            hostname='localhost',
+            transport_type='core.local',
+            scheduler_type='core.direct',
+            workdir=tempfile.gettempdir(),
+        ).store()
+        computer.configure(safe_interval=0.0)
+        echo.echo_success(f'Created and configured temporary `Computer` {label} for localhost.')
+        computer_created = True
+
+        executable = shutil.which('bash')
+
+        if executable is None:
+            echo.echo_critical('Could not determine the absolute path for the `bash` executable.')
+
+        code = orm.InstalledCode(label='bash', computer=computer, filepath_executable=executable).store()
+        echo.echo_success(f'Created temporary `Code` {code.label} for localhost.')
+        code_created = True
+
+    cls = CalculationFactory('core.arithmetic.add')
+    builder = cls.get_builder()
+    builder.code = code
+    builder.x = orm.Int(1)
+    builder.y = orm.Int(1)
+
+    time_start = time.time()
+    nodes = []
+
+    with click.progressbar(range(number), label=f'Submitting {number} calculations.') as bar:
+        for iteration in bar:
+            node = submit(builder)
+            nodes.append(node)
+
+    time_end = time.time()
+    echo.echo(f'Submission completed in {(time_end - time_start):.2f} seconds.')
+
+    completed = 0
+
+    with click.progressbar(length=number, label='Waiting for calculations to complete') as bar:
+        while True:
+            time.sleep(0.2)
+
+            terminated = [node.is_terminated for node in nodes]
+            newly_completed = terminated.count(True) - completed
+            completed = terminated.count(True)
+
+            bar.update(newly_completed)
+
+            if all(terminated):
+                break
+
+    if any(node.is_excepted or node.is_killed for node in nodes):
+        echo.echo_warning('At least one submitted calculation excepted or was killed.')
+    else:
+        echo.echo_success('All calculations finished successfully.')
+
+    time_end = time.time()
+    echo.echo(f'Elapsed time: {(time_end - time_start):.2f} seconds.')
+
+    echo.echo('Cleaning up...')
+    delete_nodes([node.pk for node in nodes], dry_run=False)
+    echo.echo_success('Deleted all calculations.')
+
+    if code_created:
+        code_label = code.full_label
+        orm.Node.objects.delete(code.pk)
+        echo.echo_success(f'Deleted the created code {code_label}.')
+
+    if computer_created:
+        computer_label = computer.label
+        user = orm.User.objects.get_default()
+        auth_info = computer.get_authinfo(user)
+        orm.AuthInfo.objects.delete(auth_info.pk)
+        orm.Computer.objects.delete(computer.pk)
+        echo.echo_success(f'Deleted the created computer {computer_label}.')
+
+    echo.echo(f'Performance: {(time_end - time_start) / number:.2f} s / process')
+
+
+if __name__ == '__main__':
+    main()
diff --git a/docs/source/howto/installation.rst b/docs/source/howto/installation.rst
@@ -340,10 +340,35 @@ Tuning performance
 ==================
 
 AiiDA supports running hundreds of thousands of calculations and graphs with millions of nodes.
-However, optimal performance at that scale might require some tweaks to the AiiDA configuration to balance the CPU and disk load.
+However, optimal performance at that scale can require tweaking the AiiDA configuration to balance the CPU and disk load.
 
-Here are a few tips for tuning AiiDA performance:
+Below, we share a few practical tips for assessing and tuning AiiDA performance.
+Further in-depth information is available in the dedicated :ref:`topic on performance<topics:performance>`.
 
+.. dropdown:: Benchmark workflow engine performance
+
+    Start the AiiDA daemon with a single worker, download the :download:`benchmark script <include/scripts/performance_benchmark_base.py>` :fa:`download`, and run it in your AiiDA environment.
+
+    .. code:: console
+
+        sph@citadel:~/$ python performance_benchmark_base.py -n 100
+            Success: Created and configured temporary `Computer` benchmark-5fa8c67f for localhost.
+            Success: Created temporary `Code` bash for localhost.
+            Submitting 100 calculations.  [####################################]  100%
+            Submission completed in 9.36 seconds.
+            Waiting for calculations to complete  [####################################]  100%
+            Success: All calculations finished successfully.
+            Elapsed time: 46.55 seconds.
+            Cleaning up...
+            Success: Deleted all calculations.
+            Success: Deleted the created code bash@benchmark-5fa8c67f.
+            Success: Deleted the created computer benchmark-5fa8c67f.
+            Performance: 0.47 s / process
+
+    The output above was generated with a *single* daemon worker on one core of an AMD Ryzen 5 3600 6-Core processor (3.6 GHz, 4.2 GHz turbo boost) using AiiDA v1.6.9, and RabbitMQ and PostgreSQL running on the same machine.
+    Here, 100 ``ArithmeticAddCalculation`` processes completed in ~47s (including the time needed to submit them), corresponding to an average of half a second per process.
+
+    If you observe a significantly higher runtime, you may want to check whether any relevant component (CPU, disk, postgresql, rabbitmq) is congested.
 
 .. dropdown:: Increase the number of daemon workers
 
@@ -355,11 +380,10 @@ Here are a few tips for tuning AiiDA performance:
     To make the change permanent, set
     ::
 
-        verdi config set daemon.default_workers 5
+        verdi config set daemon.default_workers 4
 
 .. dropdown:: Increase the number of daemon worker slots
 
-
     Each daemon worker accepts only a limited number of tasks at a time.
     If ``verdi daemon status`` constantly warns about a high percentage of the available daemon worker slots being used, you can increase the number of tasks handled by each daemon worker (thus increasing the workload per worker).
     Increasing it to 1000 should typically work.
@@ -369,8 +393,6 @@ Here are a few tips for tuning AiiDA performance:
 
         verdi config set daemon.worker_process_slots 1000
 
-
-
 .. dropdown:: Prevent your operating system from indexing the file repository.
 
     Many Linux distributions include the ``locate`` command to quickly find files and folders, and run a daily cron job ``updatedb.mlocate`` to create the corresponding index.

diff --git a/docs/source/topics/index.rst b/docs/source/topics/index.rst
@@ -16,6 +16,7 @@ Topics
    plugins
    schedulers
    transport
+   performance
 
 .. todo::
 

diff --git a/docs/source/topics/performance.rst b/docs/source/topics/performance.rst
@@ -0,0 +1,88 @@
+.. _topics:performance:
+
+***********
+Performance
+***********
+
+The performance of AiiDA depends on many factors:
+
+* the hardware that AiiDA is running on
+* how the services for AiiDA are configured (the database, message broker, filesystem, etc.)
+* the codes and their plugins that are being run.
+
+This section gives an overview of how each of these factors influence the overall performance of AiiDA and how it can be optimized.
+
+
+.. _topics:performance:hardware:
+
+Hardware
+========
+
+The bulk of AiiDA's workload is typically carried by the daemon and its workers.
+The performance is typically limited by the computing power of the machine on which AiiDA is running.
+
+Each worker is a separate Python process that takes care of executing the AiiDA processes that are submitted.
+AiiDA was designed to allow to increase the throughput by adding more daemon workers that can work independently in parallel.
+A rule of thumb is to not have more workers than the number of cores of the machine's CPU on which AiiDA is running.
+If more workers are added, they will have to start sharing and swapping resources and the performance scaling will degrade.
+
+
+.. _topics:performance:services:
+
+Services
+========
+
+For the default setup, AiiDA essentially has three services that influence its performance:
+
+* PostgreSQL (the database in which the provenance graph is stored)
+* RabbitMQ (the message broker that the daemon workers use to communicate)
+* Filesystem (files are stored by AiiDA in the file repository on a filesytem)
+
+For the simplest installations, the PostgreSQL and RabbitMQ services are typically running on the same machine as AiiDA itself.
+Although this means that a part of the machine's resources is not available for AiiDA itself and its daemon, the latency for AiiDA to communicate with the services is minimal.
+
+It is possible to configure an AiiDA profile to use services that are running on a different machine and can be reached over a network.
+However, this will typically affect the performance negatively as now each time a connection needs to be made to a service, the latency of the network is incurred.
+
+
+.. _topics:performance:benchmarks:
+
+Benchmarks
+==========
+
+The :download:`benchmark script <../howto/include/scripts/performance_benchmark_base.py>` :fa:`download` provides a basic way of assessing performance of the workflow engine that involves all components (CPU, file system, postgresql, rabbitmq).
+
+It launches 100 ``ArithmeticAddCalculation`` jobs on the localhost and measures the time until completion.
+Since the workload of the ``ArithmeticAddCalculation`` (summing two numbers) completes instantly, the time per process is a reasonable measure of the overhead incurred from the workflow engine.
+
+The numbers reported in the :ref:`howto section<how-to:installation:performance>` were obtained using a single daemon worker and can be reduced by increasing the number of daemon workers:
+
+.. table::
+    :widths: auto
+
+    ========== ======================= ========================
+    # Workers  Total elapsed time (s)  Performance (s/process)
+    ========== ======================= ========================
+    1          46.55                   0.47
+    2          27.83                   0.28
+    4          16.43                   0.16
+    ========== ======================= ========================
+
+.. note::
+
+    While the process rate increases with the number of daemon workers, the scaling is not quite linear.
+    This is because, for simplicity, the benchmark script measures both the time required to submit the processes to the daemon (not parallelized) as well as the time needed to run the processes (parallelized over daemon workers).
+    In long-running processes, the time required to submit the process (roughly 0.1 seconds per process) is not relevant and linear scaling is achieved.
+
+
+.. _topics:performance:plugins:
+
+Plugins
+=======
+
+One of AiiDA's strengths is its plugin system, which allows it capabilities to be customized in a variety of ways.
+However, this flexibility also means that the performance of AiiDA can be affected significantly by the implementation of the plugins.
+For example, a `CalcJob` plugin determines which files are transferred from and to the computing resources.
+If the plugin needs to transfer and store large amounts of data, this will affect the process throughput of the daemon workers.
+Likewise, if a `Parser` plugin performs heavy numerical computations to parse the retrieved data, this will slow down the workers' throughput.
+In order to optimize the process throughput, plugins should try to minize heavy computations and the transfer of lots of unnecessary data.