-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: OpenCL via CLBlast? #173
Comments
Interesting. It recently got mentioned here by @StuartIanNaylor ; What hardware are you running this at, if I may ask? The reason, I ask is because I wonder if you have proper Vulkan GPU support that could make use of; https://github.com/kpet/clvk as CLBlast appears to be supported for it; https://github.com/kpet/clvk/blob/main/docs/supported-applications.md (Me also wondering what the status is of the Mesa3d Vulkan Broadcom driver for the RPi4 and if so, if the above might help as well for those systems. Most likely again more a proof of concept, but would be cool to have some sort of GPU enabled deep learning on the RPI4 to play with) |
I have a Rock5b with a Mali G610 which as a SBC & Soc is a bit bleeding edge as Mesa/Panfrost stops at Mali G57 Valhall OpenGL ES (v9) OpenGL 3.1 I have been doing some testing with the G610 as currently its just a Rockchip blob but using the OpenCL drivers with https://github.com/StuartIanNaylor/rock5b-wav2letter-bench which works and for ML is about equiv of the CPU for ML. If it isn't you can prob run the tests in https://github.com/KhronosGroup/OpenCL-CTS if the ArmNN is a fail. But yeah might be really interesting. |
If I install with -DNETLIB=ON then add your patch then performance is terrible.
7% load is all I am getting 15% at max so guess prob needs someone with much better knowledge than me. Which if I scaled my GPU from the 7% load to 100% then its x14 which divide the times by 14 and yeah about cpu equiv. Which is far beyond my ability but if clear parrallelism exists then with the G610 x2 is possible, well a bit less due inefficiences and that the code maybe similar to the load of ArmNN that provided about 7% load on the cpu so stealing that.
Whilst normal optimised cpu
|
Thanks! Yes you're right it needs some serious work. As I mentioned I'm a bit hw challenged myself but sure, Vulkan would also be good if the hw supports it. EDIT: It seems this GPU only has 1.5GB of VRAM and that causes even the base.en model to crash sometimes (often times). So yeah, for older GPUs to be feasible, VRAM usage should be lowered quite a lot. |
PS the ability is there as I looked at my power meter after the test that was running near 10 watts with confusion and then remembered on my other screen that had power saved I was still running the streaming version of whisper :) I am more interested in embedded as my results are not bad as running whisper takes about 5watts whilst the GPU could be as low as 1.5watt as seems about 1/3 when running similar taks. I have forgot what my RTX3050 got but from the full version of whisper as I got it because its only 140watt! In nvidia's crazy wattage world. |
For me, CLBlast provides a ~12.5% speedup compared to vanilla whisper.cpp
Test configuration:
|
That's odd. Mine is opposite. It's two times slower than vanilla .... I am Intel UHD 630 Windows 11. |
Out of interest what's the command people are using to get the above stats, I'm looking at various options via CLBlast and would be interested to be able to provide comparable perf feedback :) |
Hi,
Nice project! Thanks for your work, I wish I had better hw to make use of it :D
I haven't seen anyone mentioning CLBlast here?
They actually provide a wrapper so that CLBlast can be used as a drop-in replacement for OpenBLAS.
I've now tried this, it works and it was easy enough even for me :D
But to get the best performance you'd need to adjust it a lot more I guess.
Here's my naïve patch in case anyone wants to play with it:
whisper.cpp_CLBlast.patch.gz
Note that the patch simply replaces the existing OpenBLAS implementation.
Also, CLBlast needs to be compiled with -DNETLIB=ON to enable the wrapper.
The text was updated successfully, but these errors were encountered: