Add Benchmarking CI #420

christiangnrd · 2024-09-18T16:03:21Z

Using a JuliaGPU branch instead of from my fork. See #419 for the start of this branch.

Original text:
Copying over the benchmarks from CUDA.jl.

I'm not sure if I converted them properly. Some function seem to be missing Metal implementations (like reverse).

The final (and biggest) problem is how inconsistent these results have been. Simply rerunning the benchmarks on the same code gives some huge performance differences. There is always at least one benchmark that is >20% slower or faster. The benchmarks seem to be way more consistent on the runners. Only a few seem to have big variance.

See #418 (comment)

Todo:

Implement actual CI benchmark solution from GemmKernels.jl
Ensure Benchmarks are correct

christiangnrd · 2024-09-18T16:09:29Z

The culprit was whitespace... This is (one of the many reasons) why Julia > Python.

christiangnrd · 2024-09-18T17:24:54Z

@maleadt Do you know how I can set up the buildkite token on the Github Actions side?

maleadt · 2024-09-18T20:25:56Z

I can add a token. Which permissions do you need?

maleadt · 2024-09-18T20:30:43Z

Found https://github.com/actions-marketplace-validations/EnricoMi_download-buildkite-artifact-action?tab=readme-ov-file#configuration; I've added a token

christiangnrd · 2024-09-19T13:05:41Z

Do we want to save the minimum, the median, or the mean?

For testing now I'll do median since that's what https://github.com/LuxDL/LuxLib.jl/pull/128/files is doing.

christiangnrd · 2024-09-19T13:36:30Z

This is mostly ready. I assume benchmarks will be posted to new PRs after this is merged to master?

A few uncertainties:

Which summary statistic do we want to report?
Most benchmarks are working, but I'd like someone with a better understanding of lower-level gpu programming to make sure that I converted them from CUDA to Metal properly.
volumerhs isn't running, can/should that be fixed or is it not relevant for Metal?
Do we want to run the benchmarks even if CI fails like in CUDA.jl or should I get rid of those lines?

[only benchmarks]

…eatures.

maleadt

I guess we'll only see this in action on subsequent PRs?

maleadt · 2024-09-26T11:45:29Z

.buildkite/pipeline.yml

+          queue: "juliaecosystem"
+          os: "macos"
+          arch: "aarch64"
+          macos_version: "15.0"


Why do you need macOS 15 for the benchmarks?

I initially wanted to see the performance impact of logging. Seems like now it's only enabled when actually used so might not be worth it.

christiangnrd · 2024-09-26T11:50:36Z

I guess we'll only see this in action on subsequent PRs?

Looking at the Lux PR I based this off of it seems like it.

christiangnrd mentioned this pull request Sep 18, 2024

Add benchmarking CI #419

Closed

2 tasks

christiangnrd force-pushed the cg/benchmark branch from 3debcc9 to 6997e40 Compare September 18, 2024 16:08

christiangnrd force-pushed the cg/benchmark branch 2 times, most recently from f8c7b21 to 8b19a91 Compare September 18, 2024 16:38

christiangnrd force-pushed the cg/benchmark branch 3 times, most recently from 721ef62 to b40d7f1 Compare September 19, 2024 12:41

christiangnrd force-pushed the cg/benchmark branch from dbc3eb2 to f367f16 Compare September 19, 2024 13:08

christiangnrd marked this pull request as ready for review September 19, 2024 13:36

christiangnrd marked this pull request as draft September 19, 2024 13:40

christiangnrd marked this pull request as ready for review September 19, 2024 14:59

christiangnrd force-pushed the cg/benchmark branch from 51fcec4 to 680a134 Compare September 19, 2024 15:00

christiangnrd mentioned this pull request Sep 21, 2024

@mtlprintf #418

Open

4 tasks

christiangnrd force-pushed the cg/benchmark branch from b8e2424 to e97d4d1 Compare September 24, 2024 14:18

christiangnrd requested review from maleadt and tgymnich September 24, 2024 16:00

tgymnich approved these changes Sep 24, 2024

View reviewed changes

christiangnrd added 7 commits September 24, 2024 13:46

Copy-paste CUDA benchmarks

87f3862

Add CI

d2f63c8

Adapt for Metal

9175ab8

[only benchmarks]

Save median of results

cc8de56

[only benchmarks]

Cleanup

a448039

Don't disable main tests when a PR is a draft

57f004f

Benchmark on macOS 15 to catch potential performance impacts of new f…

c3a6fb3

…eatures.

Use juliaecosystem runner for macOS 15

cfd9499

christiangnrd force-pushed the cg/benchmark branch from e97d4d1 to cfd9499 Compare September 24, 2024 16:46

maleadt reviewed Sep 26, 2024

View reviewed changes

maleadt merged commit 8652754 into main Sep 26, 2024
2 checks passed

maleadt deleted the cg/benchmark branch September 26, 2024 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Benchmarking CI #420

Add Benchmarking CI #420

christiangnrd commented Sep 18, 2024 •

edited

Loading

christiangnrd commented Sep 18, 2024 •

edited

Loading

christiangnrd commented Sep 18, 2024

maleadt commented Sep 18, 2024

maleadt commented Sep 18, 2024

christiangnrd commented Sep 19, 2024 •

edited

Loading

christiangnrd commented Sep 19, 2024 •

edited

Loading

maleadt left a comment

maleadt Sep 26, 2024

christiangnrd Sep 26, 2024

christiangnrd commented Sep 26, 2024

Add Benchmarking CI #420

Add Benchmarking CI #420

Conversation

christiangnrd commented Sep 18, 2024 • edited Loading

christiangnrd commented Sep 18, 2024 • edited Loading

christiangnrd commented Sep 18, 2024

maleadt commented Sep 18, 2024

maleadt commented Sep 18, 2024

christiangnrd commented Sep 19, 2024 • edited Loading

christiangnrd commented Sep 19, 2024 • edited Loading

maleadt left a comment

Choose a reason for hiding this comment

maleadt Sep 26, 2024

Choose a reason for hiding this comment

christiangnrd Sep 26, 2024

Choose a reason for hiding this comment

christiangnrd commented Sep 26, 2024

christiangnrd commented Sep 18, 2024 •

edited

Loading

christiangnrd commented Sep 18, 2024 •

edited

Loading

christiangnrd commented Sep 19, 2024 •

edited

Loading

christiangnrd commented Sep 19, 2024 •

edited

Loading