-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot run the benchmarks: ERROR: Case arguments are arrays of different sizes.
#99
Comments
Hey! I updated the benchamark suite and now you need to explicitly pass "case arguments" for all requested cases, that would be for example: bash "${WATERLILY_ROOT}/benchmark/benchmark.sh" -v "1.10.0" -t "12 24 36 48 60 72" -c "tgv jelly" -p "5,6,7,8 5,6,7,8" -s "100 100" -ft "Float32 Float32" |
Thanks! For the record, with #101 and running # Get Waterlily root directory
WATERLILY_ROOT=$(julia --project=. --startup-file=no -e 'using WaterLily; print(pkgdir(WaterLily))')
# Run the benchmarks. jelly only up to log2p=7 because the case log2p=8 is superslow on CPU
"${WATERLILY_ROOT}/benchmark/benchmark.sh" -v "1.10" -t "12 24 36 48 60 72" -b "Array CuArray" -c "tgv jelly" -p "5,6,7,8 5,6,7" -s "100 100" -ft "Float32 Float32" on Nvidia GH200 I get
|
Interesting how CPU multi-threading seems to not scale much with current case sizes (at least from 12 threads upwards). Also I understand that the |
I think there's some problem with the CPU threading, because in htop I see very few cores busy, I don't know if it's bad load balancing, too many memory allocations causing lots of GC pauses, or what else, I haven't looked at the code.
No, on the GPU it works fine, if I remember correctly it takes about 50 seconds, it's the CPU version which is unbearably slow, I think it was going to take over 20 minutes with 12 threads, and probably not much less with more threads, given the generally bad scaling on CPU, so I didn't want to spend more than 2 hours to get the results. |
We see reasonable CPU multi-thread scaling for other hardware. And the
core-code is non-allocating. So could this be an issue with
KernelAbstractions on this particular hardware?
…On Thu, Jan 11, 2024, 23:07 Mosè Giordano ***@***.***> wrote:
Interesting how CPU multi-threading seems to not scale much with current
case sizes (at least from 12 threads upwards).
I think there's some problem with the CPU threading, because in htop I see
very few cores busy, I don't know if it's bad load balancing, too many
memory allocations causing lots of GC pauses, or what else, I haven't
looked at the code.
Also I understand that the p=8 jelly case does not fit in the GPU?
No, on the GPU it works fine, if I remember correctly it takes about 50
seconds, it's the CPU version which is unbearably slow, I think it was
going to take over 20 minutes with 12 threads, and probably not much less
with more threads, given the generally bad scaling on CPU, so I didn't want
to spend more than 2 hours to get the results.
—
Reply to this email directly, view it on GitHub
<#99 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADSKJ3DOWWXR3WVP2ZILP3YOBPDHAVCNFSM6AAAAABBWRGSA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGA2DKNJZGA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
CC @b-fg
The text was updated successfully, but these errors were encountered: