Recommendations for performance when running whisper.cpp on VPS? #524

jalustig · 2023-02-22T14:55:02Z

I'm experimenting with running whisper at scale on a VPS cluster, but am not getting good performance, it is quite slow even on dedicated CPU hardware. Here are the CPU stats which are being output when I run ./main: system_info: n_threads = 2 / 2 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

Is the lack of BLAS one potential reason why it's slow? I have also specifically built it with OpenBLAS, for some reason it isn't actually running with BLAS.

The text was updated successfully, but these errors were encountered:

nxtreaming · 2023-02-23T04:53:05Z

I use the following VPS configurartion, it can reach 90% realtime, ie 36min audio needs 40min to process

system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

ggerganov · 2023-02-27T18:58:57Z

Don't think you can do anything at the moment to improve the performance.
In the future, quantised models might be useful for such use-cases, so keep track of progress in #540

sigaloid · 2023-07-04T18:33:33Z

In my experience it's very hard to improve performance without offloading to GPU. Even throwing dramatically more cores at it does not work. AMD EPYC 7532 at 128/128 threads runs no faster than 12/128.

The sweet spot is probably 6-8 cores, quantized if accuracy allows, and scaling out the workload across a cluster.

I wish whisper.cpp scaled better, though there's some performance discussion in #200 and hopefully this can be improved over time.

Even when running with triple Titan RTX's, a Xeon E5-2643 v4 24-core, and 512GB of ram, I only get 1.66x realtime for the large model (~7 min for ~13min audio). If nothing else, this proves you cannot throw more resources at it to speed it up.

When compared to openai/whisper proper, it handles offloading to cuda devices much more efficiently - same machine with whisper runs about 7x realtime.

When compared to openai/whisper CPU, however, whisper.cpp pulls ahead by a long shot. I don't have exact numbers but roughly 0.33x realtime? on the large model.

ggerganov added the performance CPU and memory usage - results and comparisons label Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommendations for performance when running whisper.cpp on VPS? #524

Recommendations for performance when running whisper.cpp on VPS? #524

jalustig commented Feb 22, 2023

nxtreaming commented Feb 23, 2023

ggerganov commented Feb 27, 2023

sigaloid commented Jul 4, 2023

Recommendations for performance when running whisper.cpp on VPS? #524

Recommendations for performance when running whisper.cpp on VPS? #524

Comments

jalustig commented Feb 22, 2023

nxtreaming commented Feb 23, 2023

ggerganov commented Feb 27, 2023

sigaloid commented Jul 4, 2023