Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run the benchmarks: ERROR: Case arguments are arrays of different sizes. #99

Closed
giordano opened this issue Jan 11, 2024 · 5 comments

Comments

@giordano
Copy link
Contributor

$ WATERLILY_ROOT=$(julia --project=. --startup-file=no -e 'using WaterLily; print(pkgdir(WaterLily))')
$ bash "${WATERLILY_ROOT}/benchmark/benchmark.sh"  -v "1.10.0" -t "12 24 36 48 60 72" -c "tgv jelly" -p "5,6,7,8"
ERROR: Case arguments are arrays of different sizes.

CC @b-fg

@b-fg
Copy link
Member

b-fg commented Jan 11, 2024

Hey! I updated the benchamark suite and now you need to explicitly pass "case arguments" for all requested cases, that would be for example:

bash "${WATERLILY_ROOT}/benchmark/benchmark.sh"  -v "1.10.0" -t "12 24 36 48 60 72" -c "tgv jelly" -p "5,6,7,8 5,6,7,8" -s "100 100" -ft "Float32 Float32"

@giordano
Copy link
Contributor Author

Thanks!

For the record, with #101 and running

# Get Waterlily root directory
WATERLILY_ROOT=$(julia --project=. --startup-file=no -e 'using WaterLily; print(pkgdir(WaterLily))')

# Run the benchmarks.  jelly only up to log2p=7 because the case log2p=8 is superslow on CPU
"${WATERLILY_ROOT}/benchmark/benchmark.sh"  -v "1.10" -t "12 24 36 48 60 72" -b "Array CuArray" -c "tgv jelly" -p "5,6,7,8 5,6,7" -s "100 100" -ft "Float32 Float32"

on Nvidia GH200 I get

Benchmark environment: tgv sim_step! (max_steps=100)
▶ log2p = 5
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│  CPUx12 │   43e5784 │ 1.10.0 │   Float32 │    10050707 │   2.03 │     1.42 │     1.00 │
│  CPUx24 │   43e5784 │ 1.10.0 │   Float32 │    16226723 │   2.24 │     1.89 │     0.75 │
│  CPUx36 │   43e5784 │ 1.10.0 │   Float32 │    21577295 │   2.48 │     2.51 │     0.56 │
│  CPUx48 │   43e5784 │ 1.10.0 │   Float32 │    25294522 │   2.53 │     3.07 │     0.46 │
│  CPUx60 │   43e5784 │ 1.10.0 │   Float32 │    28977646 │   2.69 │     3.34 │     0.42 │
│  CPUx72 │   43e5784 │ 1.10.0 │   Float32 │    32296306 │   2.80 │     4.07 │     0.35 │
│     GPU │   43e5784 │ 1.10.0 │   Float32 │     5291894 │   1.70 │     1.18 │     1.20 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 6
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│  CPUx12 │   43e5784 │ 1.10.0 │   Float32 │    11067020 │   1.48 │     3.65 │     1.00 │
│  CPUx24 │   43e5784 │ 1.10.0 │   Float32 │    17929605 │   2.01 │     4.03 │     0.91 │
│  CPUx36 │   43e5784 │ 1.10.0 │   Float32 │    24063681 │   2.34 │     4.71 │     0.78 │
│  CPUx48 │   43e5784 │ 1.10.0 │   Float32 │    29679847 │   2.64 │     5.48 │     0.67 │
│  CPUx60 │   43e5784 │ 1.10.0 │   Float32 │    35276190 │   2.77 │     6.12 │     0.60 │
│  CPUx72 │   43e5784 │ 1.10.0 │   Float32 │    40603811 │   2.82 │     7.30 │     0.50 │
│     GPU │   43e5784 │ 1.10.0 │   Float32 │     4838973 │   1.60 │     1.07 │     3.40 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 7
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│  CPUx12 │   43e5784 │ 1.10.0 │   Float32 │     7600713 │   1.71 │     9.53 │     1.00 │
│  CPUx24 │   43e5784 │ 1.10.0 │   Float32 │    13186184 │   2.94 │     8.10 │     1.18 │
│  CPUx36 │   43e5784 │ 1.10.0 │   Float32 │    18532520 │   3.52 │     8.14 │     1.17 │
│  CPUx48 │   43e5784 │ 1.10.0 │   Float32 │    23642594 │   4.22 │     8.50 │     1.12 │
│  CPUx60 │   43e5784 │ 1.10.0 │   Float32 │    28747149 │   4.64 │     8.85 │     1.08 │
│  CPUx72 │   43e5784 │ 1.10.0 │   Float32 │    33606789 │   5.02 │     9.84 │     0.97 │
│     GPU │   43e5784 │ 1.10.0 │   Float32 │     3752082 │   1.38 │     1.05 │     9.04 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 8
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│  CPUx12 │   43e5784 │ 1.10.0 │   Float32 │     7325711 │   2.33 │    69.45 │     1.00 │
│  CPUx24 │   43e5784 │ 1.10.0 │   Float32 │    13016663 │   4.09 │    55.60 │     1.25 │
│  CPUx36 │   43e5784 │ 1.10.0 │   Float32 │    18530143 │   5.54 │    52.38 │     1.33 │
│  CPUx48 │   43e5784 │ 1.10.0 │   Float32 │    23822965 │   7.33 │    52.14 │     1.33 │
│  CPUx60 │   43e5784 │ 1.10.0 │   Float32 │    29067229 │   8.88 │    52.25 │     1.33 │
│  CPUx72 │   43e5784 │ 1.10.0 │   Float32 │    34308861 │  10.52 │    54.07 │     1.28 │
│     GPU │   43e5784 │ 1.10.0 │   Float32 │     3649091 │   0.46 │     3.24 │    21.40 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
Benchmark environment: jelly sim_step! (max_steps=100)
▶ log2p = 5
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│  CPUx12 │   43e5784 │ 1.10.0 │   Float32 │    15469230 │   1.37 │     3.29 │     1.00 │
│  CPUx24 │   43e5784 │ 1.10.0 │   Float32 │    25696182 │   2.37 │     3.95 │     0.83 │
│  CPUx36 │   43e5784 │ 1.10.0 │   Float32 │    34962198 │   2.46 │     5.10 │     0.64 │
│  CPUx48 │   43e5784 │ 1.10.0 │   Float32 │    42857578 │   2.53 │     6.32 │     0.52 │
│  CPUx60 │   43e5784 │ 1.10.0 │   Float32 │    50564026 │   2.70 │     7.14 │     0.46 │
│  CPUx72 │   43e5784 │ 1.10.0 │   Float32 │    57486610 │   2.57 │     8.73 │     0.38 │
│     GPU │   43e5784 │ 1.10.0 │   Float32 │     8018123 │   2.17 │     1.57 │     2.10 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 6
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│  CPUx12 │   43e5784 │ 1.10.0 │   Float32 │    20598523 │   0.78 │    12.15 │     1.00 │
│  CPUx24 │   43e5784 │ 1.10.0 │   Float32 │    35587951 │   1.58 │    12.01 │     1.01 │
│  CPUx36 │   43e5784 │ 1.10.0 │   Float32 │    49779678 │   2.31 │    13.38 │     0.91 │
│  CPUx48 │   43e5784 │ 1.10.0 │   Float32 │    63047703 │   2.80 │    14.95 │     0.81 │
│  CPUx60 │   43e5784 │ 1.10.0 │   Float32 │    76194003 │   3.16 │    16.38 │     0.74 │
│  CPUx72 │   43e5784 │ 1.10.0 │   Float32 │    88376271 │   3.24 │    19.03 │     0.64 │
│     GPU │   43e5784 │ 1.10.0 │   Float32 │    10491694 │   1.87 │     2.22 │     5.47 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 7
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│  CPUx12 │   43e5784 │ 1.10.0 │   Float32 │    31778861 │   1.15 │   111.33 │     1.00 │
│  CPUx24 │   43e5784 │ 1.10.0 │   Float32 │    55873553 │   1.99 │   102.58 │     1.09 │
│  CPUx36 │   43e5784 │ 1.10.0 │   Float32 │    79130929 │   1.97 │   100.16 │     1.11 │
│  CPUx48 │   43e5784 │ 1.10.0 │   Float32 │   101121788 │   2.19 │   102.71 │     1.08 │
│  CPUx60 │   43e5784 │ 1.10.0 │   Float32 │   122910896 │   2.31 │   103.50 │     1.08 │
│  CPUx72 │   43e5784 │ 1.10.0 │   Float32 │   144187652 │   2.31 │   107.25 │     1.04 │
│     GPU │   43e5784 │ 1.10.0 │   Float32 │    16077560 │   1.02 │     6.44 │    17.30 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘

@giordano giordano closed this as not planned Won't fix, can't repro, duplicate, stale Jan 11, 2024
@b-fg
Copy link
Member

b-fg commented Jan 11, 2024

Interesting how CPU multi-threading seems to not scale much with current case sizes (at least from 12 threads upwards). Also I understand that the p=8 jelly case does not fit in the GPU? The size of the jelly simulation is N=(2^p)*(2^p)*(4*2^p) -- for p=8, N≈67e6. The TGV is just N=(2^p)^3, which for p=8 results in N≈17e6. And thank you very much for the benchmarks, @giordano!

@giordano
Copy link
Contributor Author

Interesting how CPU multi-threading seems to not scale much with current case sizes (at least from 12 threads upwards).

I think there's some problem with the CPU threading, because in htop I see very few cores busy, I don't know if it's bad load balancing, too many memory allocations causing lots of GC pauses, or what else, I haven't looked at the code.

Also I understand that the p=8 jelly case does not fit in the GPU?

No, on the GPU it works fine, if I remember correctly it takes about 50 seconds, it's the CPU version which is unbearably slow, I think it was going to take over 20 minutes with 12 threads, and probably not much less with more threads, given the generally bad scaling on CPU, so I didn't want to spend more than 2 hours to get the results.

@weymouth
Copy link
Collaborator

weymouth commented Jan 12, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants