You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm experimenting with running whisper at scale on a VPS cluster, but am not getting good performance, it is quite slow even on dedicated CPU hardware. Here are the CPU stats which are being output when I run ./main: system_info: n_threads = 2 / 2 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Is the lack of BLAS one potential reason why it's slow? I have also specifically built it with OpenBLAS, for some reason it isn't actually running with BLAS.
The text was updated successfully, but these errors were encountered:
Don't think you can do anything at the moment to improve the performance.
In the future, quantised models might be useful for such use-cases, so keep track of progress in #540
In my experience it's very hard to improve performance without offloading to GPU. Even throwing dramatically more cores at it does not work. AMD EPYC 7532 at 128/128 threads runs no faster than 12/128.
The sweet spot is probably 6-8 cores, quantized if accuracy allows, and scaling out the workload across a cluster.
I wish whisper.cpp scaled better, though there's some performance discussion in #200 and hopefully this can be improved over time.
Even when running with triple Titan RTX's, a Xeon E5-2643 v4 24-core, and 512GB of ram, I only get 1.66x realtime for the large model (~7 min for ~13min audio). If nothing else, this proves you cannot throw more resources at it to speed it up.
When compared to openai/whisper proper, it handles offloading to cuda devices much more efficiently - same machine with whisper runs about 7x realtime.
When compared to openai/whisper CPU, however, whisper.cpp pulls ahead by a long shot. I don't have exact numbers but roughly 0.33x realtime? on the large model.
I'm experimenting with running whisper at scale on a VPS cluster, but am not getting good performance, it is quite slow even on dedicated CPU hardware. Here are the CPU stats which are being output when I run
./main
:system_info: n_threads = 2 / 2 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Is the lack of BLAS one potential reason why it's slow? I have also specifically built it with OpenBLAS, for some reason it isn't actually running with BLAS.
The text was updated successfully, but these errors were encountered: