Red Hat/PSAP's Test Orchestrator for Performance and Scalability of AI pLatforms
This repository provides an extensive toolbox for performance and scale testing of Red Hat OpenShift AI (RHOAI) platform.
The automation relies on:
- Python scripts for the orchestration (the
testing
directories) - Ansible roles for the cluster control (the
toolbox
androles
directories) - MatrixBenchmarking for the
post-processing (the
visualization
directories)
The recommended way to run TOPSAIL either via a CI environment, or within TOPSAIL container via its Toolbx launcher.
Requirements:
- All the software requirements should be provided by the container
image, built by the
topsail_build
command. - A reachable OpenShift cluster
oc version # fails if the cluster is not reachable
Note that TOPSAIL assumes that it has cluster-admin privileges to the cluster.
TOPSAIL provides multiple levels of functionalities:
- the test orchestrations are top level. Most of the time, they are
triggered via a CI engine, for end-to-end testing of a given RHOAI
component. The test orchestration Python code and configuration is
stored in the
projects/*/testing
directory. - the toolbox commands operate between the orchestration code and the
cluster. They are Ansible roles (
projects/*/toolbox
), in charge of a specific task to prepare the cluster, run a given test, capture the state of the cluster ... The Ansible roles have a thin Python layer on top of them (based on the Google Fire package) which provides a well-defined command-line interface (CLI). This CLI interface documents the parameters of the command, it allows its discovery via the ./run_toolbox.py entrypoint, and it generates artifacts for post-mortem troubleshooting. - the post-processing visualization, provided via MatrixBenchmarking workload
modules (
projects/*/visualization
). The modules are in charge of parsing the test artifacts, generating visualization reports, uploading KPIs to OpenSearch, and performing regression analyses.
TOPSAIL projects directories are organized following the different levels described above.
- the
testing
directory provides the Python scripts with CI entrypoints (test.py prepare_ci
andtest.py run_ci
) and possibly extra entrypoints for local interactions. It also contains the project configuration file (config.yaml
) - the
toolbox
directory contains the Ansible roles that controls and mutates the cluster during the cluster preparation and test - the
toolbox
directory also contains the Python wrapper which provides a well-defined CLI over the Ansible roles - the
visualization
directory contains the MatrixBenchmarking workload modules, which perform the post-processing step of the test (parsing, visualization, regression analyze)