TOPSAIL

Red Hat/PSAP's Test Orchestrator for Performance and Scalability of AI pLatforms

This repository provides an extensive toolbox for performance and scale testing of Red Hat OpenShift AI (RHOAI) platform.

The automation relies on:

Dependencies

The recommended way to run TOPSAIL either via a CI environment, or within TOPSAIL container via its Toolbx launcher.

Requirements:

All the software requirements should be provided by the container image, built by the topsail_build command.
A reachable OpenShift cluster

oc version # fails if the cluster is not reachable

Note that TOPSAIL assumes that it has cluster-admin privileges to the cluster.

TOPSAIL provides multiple levels of functionalities:

the test orchestrations are top level. Most of the time, they are triggered via a CI engine, for end-to-end testing of a given RHOAI component. The test orchestration Python code and configuration is stored in the projects/*/testing directory.
the toolbox commands operate between the orchestration code and the cluster. They are Ansible roles (projects/*/toolbox), in charge of a specific task to prepare the cluster, run a given test, capture the state of the cluster ... The Ansible roles have a thin Python layer on top of them (based on the Google Fire package) which provides a well-defined command-line interface (CLI). This CLI interface documents the parameters of the command, it allows its discovery via the ./run_toolbox.py entrypoint, and it generates artifacts for post-mortem troubleshooting.
the post-processing visualization, provided via MatrixBenchmarking workload modules (projects/*/visualization). The modules are in charge of parsing the test artifacts, generating visualization reports, uploading KPIs to OpenSearch, and performing regression analyses.

TOPSAIL projects directories are organized following the different levels described above.

the testing directory provides the Python scripts with CI entrypoints (test.py prepare_ci and test.py run_ci) and possibly extra entrypoints for local interactions. It also contains the project configuration file (config.yaml)
the toolbox directory contains the Ansible roles that controls and mutates the cluster during the cluster preparation and test
the toolbox directory also contains the Python wrapper which provides a well-defined CLI over the Ansible roles
the visualization directory contains the MatrixBenchmarking workload modules, which perform the post-processing step of the test (parsing, visualization, regression analyze)