Precise and reproducible benchmarking. Inspired by running-ng.
- Install the harness CLI:
cargo install harness-cli
. - Get the example crate:
git clone https://github.com/wenyuzhao/harness.git && cd harness/examples/sort
. - Start an evaluation:
cargo harness run
. - View results:
cargo harness report
.
Please see more examples on how to configure and use harness
. The evaluation configs can be found in Cargo.toml of each example crate.
harness
avoids running the same benchmark multiple times in a loop, unlike what most existing Rust benchmarking tools would do.
For an evaluation, given benchmark programs harness
will use the following run order, row by row:
Any machine can have performance fluctuations, e.g. CPU frequency suddenly scaled down, or a background process waking up to do some task. Interleaved runs will make sure fluctuations do not affect only one build or one benchmark, but all the benchmarks and builds in a relatively fair way.
When running in a complex environment, you are very likely to see a difference in the results between the two run orders.
Note: For the same reason, it's recommended to always have more than two different builds in each evaluation. Otherwise, there is no difference to running a single build in a loop.
harness
has a clear notion of warmup and timing iterations, instead of blindly iterating a single benchmark multiple times and reporting the per-iteration time distribution. By default, each invocation of
Similar to other bench tools, harness
runs each
After all the cargo harness report
will parse the results and report the min/max/mean/geomean for each performance value, as well as the 95% confidence interval per benchmark. You can also use your own script to load the results and analyze them differently. The performance values are stored in target/harness/logs/<RUNID>/results.csv
.
harness
supports collecting and reporting extra performance data other than execution time, by enabling the following probes:
harness-probe-perf
: Collect perf-event values for the timing iteration.harness-probe-ebpf (WIP)
: Extra performance data collected by eBPF programs.
harness
performs a series of strict checks to minimize system noise. It refuses to start benchmarking if any of the following checks fail:
- (Linux-only) Only one user is logged in
- (Linux-only) All CPU scaling governors are set to
performance
harness
refuses to support casual benchmarking. Each evaluation is enforced to be properly tracked by Git, including all the benchmark configurations and the revisions of all the benchmarks and benchmarked programs. Verifying the correctness of any evaluation, or re-running an evaluation from years ago, can be done by simply tracking back the git history.
harness
assigns each individual evaluation a unique RUNID
and generates an evaluation summary at target/harness/logs/<RUNID>/config.toml
. harness
uses this file to record the evaluation info for the current benchmark run, including:
- Git commit of the evaluation config
- Git commit, cargo features, and environment variables used for producing each evaluated build
- The
Cargo.lock
file used for producing each evaluated builds
Reproducing a previous evaluation is as simple as running cargo harness run --config <RUNID>
. harness
automatically checks out the corresponding commits, sets up the recorded cargo features or environment variables, and replays the pre-recorded Cargo.lock
file, to ensure the codebase and builds are exactly at the same state as when RUNID
was generated.
Note: harness
cannot check local dependencies right now. For completely deterministic builds, don't use local dependencies.
In the same <RUNID>/config.toml
file, harness
also records all the environmental info for every benchmark run, including but not limited to:
- All global system environment variables at the time of the run
- OS / CPU / Memory / Swap information used for the run
Any change to the system environments would affect reproducibility. So it's recommended to keep the same environment variables and the same OS / CPU / Memory / Swap config as much as possible. harness
automatically verifies the current system info against the recorded ones and warns for any differences.
- Runner
- Binary runner
- Result reporting
- Test runner
- Scratch folder
- Default to compare HEAD vs HEAD~1
- Restore git states after benchmarking
- Comments for public api
- Documentation
- Benchmark subsetting
- Handle no result cases
- More examples
- Add tests
- Plugin system
- Plugin: html or markdown report with graphs
- Plugin: Copy files
- Plugin: Rsync results
- Performance evaluation guide / tutorial