Look into adding warmup runs in benchmark script to avoid measuring JIT time #30

illuhad · 2024-08-02T12:36:34Z

As far as I can see, the benchmark script https://github.com/UoB-HPC/Barnes-Hut/blob/main/ci/benchmark does not do warmup runs. With AdaptiveCpp, this can cause us to measure the overhead of LLVM JIT, which can substantially alter results in a misleading way.

So we should look into adding some warmup runs.
For AdaptiveCpp,

if ACPP_ADAPTIVITY_LEVEL=1 (default) every kernel will be compiled separately on the first time the kernel is executed. The generated PTX/SPIR-V etc is put into a persistent, on-disk cache. This means that we would need to ensure that every kernel has been invoked at least once before we do the measurements. (either in the same or in subsequent application runs. If we do it in the same application run, we would also benefit from not measuring ptx compilation by the CUDA driver, which might also affect nvc++)
if ACPP_ADAPTIVITY_LEVEL=2 it will attempt to detect and hardwire invariant kernel arguments. In this case, the application should be run 2-3 times before doing the measurement run.

The text was updated successfully, but these errors were encountered:

Provide feedback