'command' not using CUDA? #2275

lcbpublic · 2024-07-01T22:08:06Z

lcbpublic
Jul 1, 2024

I cloned the 'whisper.cpp' repository a few days ago. Initially, I did a plain 'make clean; make -j' which generated a few warnings but otherwise seem to finish fine. I also downloaded all the models listed on the project home page. I then ran './command -m ./models/ggml-large-v3.bin'. Recognizing the initial prompt, and each subsequent command, takes somewhere between 75sec and 80sec (see below for my system description). I then did 'make clean; GGML_CUDA=1 make -j'. This also completed successfully (with only a few warnings). Finally, I ran './command -m ./models/ggml-large-v3.bin' again. Unfortunately, the time required for command recognition was unchanged as far as I could tell. I would have thought it would be greatly reduced if it was using my system's GPU. Am I missing something?

Out of curiosity, I also tried doing 'make clean; GGML_OPENBLAS=1 make -j'. After this, doing './command -m ./models/ggml-large-v3.bin' reduced the recognition time to about 30sec. Doing './command -m ./models/ggml-large-v3.bin -t 32' brought it down to about 13sec.

My system has an AMD 'Threadripper' CPU with 32 cores (64 "threads"), an NVIDIA GeForce RTX 2080 Ti GPU on the PCIE bus, and 64GBytes of memory. I'm running Pop!_OS 22.04 with CUDA and SDL2 installed from the OS repositories.

Incidentally, I tried doing 'make clean; cmake .; make -j' but that wouldn't build some of the stuff in 'examples' (including 'command'). Anyway, any help with the CUDA issue would be greatly appreciated.

ggerganov · 2024-07-03T17:41:34Z

ggerganov
Jul 3, 2024
Maintainer

Try make clean && GGML_CUDA=1 make -j command. Make sure to pull the latest whisper.cpp

1 reply

lcbpublic Jul 3, 2024
Author

Heh. Well, your answer put me on the right track. However, it turns out I misspelled 'GGML_CUDA'. Once I had fixed that, I got a make error:

Makefile:862: *** I ERROR: For CUDA versions < 11.7 a target CUDA architecture must be explicitly provided via environment variable CUDA_DOCKER_ARCH, e.g. by running "export CUDA_DOCKER_ARCH=compute_XX" on Unix-like systems, where XX is the minimum compute capability that the code needs to run on. A list with compute capabilities can be found here: https://developer.nvidia.com/cuda-gpus . Stop.

Once I fixed that (by following the instructions in the error message) the build succeeded. Now './command -m models/ggml-large-v3.bin' recognizes commands in about 1sec (according to timings printed by the program).

Apologies for wasting everybody's time for (basically) a typo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'command' not using CUDA? #2275

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

'command' not using CUDA? #2275

lcbpublic Jul 1, 2024

Replies: 1 comment · 1 reply

ggerganov Jul 3, 2024 Maintainer

lcbpublic Jul 3, 2024 Author

lcbpublic
Jul 1, 2024

Replies: 1 comment 1 reply

ggerganov
Jul 3, 2024
Maintainer

lcbpublic Jul 3, 2024
Author