Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look into adding warmup runs in benchmark script to avoid measuring JIT time #30

Open
illuhad opened this issue Aug 2, 2024 · 0 comments

Comments

@illuhad
Copy link
Collaborator

illuhad commented Aug 2, 2024

As far as I can see, the benchmark script https://github.com/UoB-HPC/Barnes-Hut/blob/main/ci/benchmark does not do warmup runs. With AdaptiveCpp, this can cause us to measure the overhead of LLVM JIT, which can substantially alter results in a misleading way.

So we should look into adding some warmup runs.
For AdaptiveCpp,

  • if ACPP_ADAPTIVITY_LEVEL=1 (default) every kernel will be compiled separately on the first time the kernel is executed. The generated PTX/SPIR-V etc is put into a persistent, on-disk cache. This means that we would need to ensure that every kernel has been invoked at least once before we do the measurements. (either in the same or in subsequent application runs. If we do it in the same application run, we would also benefit from not measuring ptx compilation by the CUDA driver, which might also affect nvc++)
  • if ACPP_ADAPTIVITY_LEVEL=2 it will attempt to detect and hardwire invariant kernel arguments. In this case, the application should be run 2-3 times before doing the measurement run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant