RuntimeError: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device #794

JusticeGH · 2024-05-01T04:33:15Z

I have been trying to get whisperX to work on my GTX 970, but have been running into a myriad of problems. Please bear with me as I’m a beginner in all things programming.

I followed all the installation instructions to the letter and then ran the following command:
whisperx.exe "C:\Users\Justin\Music\Kraft Punk Soundbites\WAV\Hey what's up_ I'm Kraft Punk.wav" --model large-v2 --device cuda --batch_size 1 --compute_type float32 --output_dir "C:\Users\Justin\Desktop\" --language en --diarize --min_speakers 1 --max_speakers 1 --hf_token XXXXXXXXXXXXXXXXX

I then ran into the following error:

torchvision is not available - cannot save figures
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.2.3. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint C:\Users\Justin\.cache\torch\whisperx-vad-segmentation.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.0.0. Bad things might happen unless you revert torch to 1.x.
>>Performing transcription...
Traceback (most recent call last):
  File "C:\Users\Justin\miniconda3\envs\whisperx\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Justin\miniconda3\envs\whisperx\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\Justin\miniconda3\envs\whisperx\Scripts\whisperx.exe\__main__.py", line 7, in <module>
  File "C:\Users\Justin\miniconda3\envs\whisperx\lib\site-packages\whisperx\transcribe.py", line 176, in cli
    result = model.transcribe(audio, batch_size=batch_size, chunk_size=chunk_size, print_progress=print_progress)
  File "C:\Users\Justin\miniconda3\envs\whisperx\lib\site-packages\whisperx\asr.py", line 218, in transcribe
    for idx, out in enumerate(self.__call__(data(audio, vad_segments), batch_size=batch_size, num_workers=num_workers)):
  File "C:\Users\Justin\miniconda3\envs\whisperx\lib\site-packages\transformers\pipelines\pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "C:\Users\Justin\miniconda3\envs\whisperx\lib\site-packages\transformers\pipelines\pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "C:\Users\Justin\miniconda3\envs\whisperx\lib\site-packages\transformers\pipelines\base.py", line 1112, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "C:\Users\Justin\miniconda3\envs\whisperx\lib\site-packages\whisperx\asr.py", line 152, in _forward
    outputs = self.model.generate_segment_batched(model_inputs['inputs'], self.tokenizer, self.options)
  File "C:\Users\Justin\miniconda3\envs\whisperx\lib\site-packages\whisperx\asr.py", line 47, in generate_segment_batched
    encoder_output = self.encode(features)
  File "C:\Users\Justin\miniconda3\envs\whisperx\lib\site-packages\whisperx\asr.py", line 86, in encode
    return self.model.encode(features, to_cpu=to_cpu)
RuntimeError: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device

I did run some basic diagnostics to check if CUDA is available:

>>> import torch
>>> import sys
>>> print('A', sys.version)
A 3.10.14 | packaged by Anaconda, Inc. | (main, Mar 21 2024, 16:20:14) [MSC v.1916 64 bit (AMD64)]
>>> print('B', torch.__version__)
B 2.3.0
>>> print('C', torch.cuda.is_available())
C True
>>> print('D', torch.backends.cudnn.enabled)
D True
>>> device = torch.device('cuda')
>>> print('E', torch.cuda.get_device_properties(device))
E _CudaDeviceProperties(name='NVIDIA GeForce GTX 970', major=5, minor=2, total_memory=4095MB, multi_processor_count=13)
>>> print('F', torch.tensor([1.0, 2.0]).cuda())
F tensor([1., 2.], device='cuda:0')

>>> import torch
>>> x = torch.rand(5, 3)
>>> print(x)
tensor([[0.2926, 0.4866, 0.1281],
        [0.6154, 0.8456, 0.5436],
        [0.4880, 0.7883, 0.2404],
        [0.6841, 0.2353, 0.2622],
        [0.9875, 0.0566, 0.4680]])

Also ran nvidia-smi:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.23                 Driver Version: 551.23         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 970       WDDM  |   00000000:01:00.0  On |                  N/A |
| 47%   30C    P2             50W /  250W |     980MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

I have asked the NVIDIA Developer Forum, but they say it's not a CUDA error:

One or more of the software stacks (perhaps the whisperx.exe executable) you are using have not been compiled to support a GTX 970. This isn’t a CUDA setup issue (which is what this forum is about) but rather a problem with the software stack.

I'm at a loss and would really appreciate any help :)

The text was updated successfully, but these errors were encountered:

laraws · 2024-05-10T03:08:57Z

I met the same issue, and my GPU is GTX 950M.

My CUDA version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

Also ran nvidia-smi:

NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2

YJCX330 · 2024-05-10T06:37:30Z

I think you can refer to this link：#720 I encountered the same error message as yours , and solved by the solutions in this ref (pip install ctranslate2==3.24.0)

laraws · 2024-05-10T06:43:02Z

I think you can refer to this link：#720 I encountered the same error message as yours , and solved by the solutions in this ref (pip install ctranslate2==3.24.0)

Thx. I tried it, but it didn't work. It showed another error: "Segmentation fault "

JusticeGH · 2024-05-17T03:54:32Z

I think you can refer to this link：#720 I encountered the same error message as yours , and solved by the solutions in this ref (pip install ctranslate2==3.24.0)

This worked for me. Hopefully this is resolved at some point to take advantage of newer versions.

laraws · 2024-05-17T09:00:12Z

I think you can refer to this link：#720 I encountered the same error message as yours , and solved by the solutions in this ref (pip install ctranslate2==3.24.0)

This worked for me. Hopefully this is resolved at some point to take advantage of newer versions.

Glad to hear that. My system is Ubuntu, maybe the solution of it can be different.

cbsfletch · 2024-08-07T20:07:56Z

I think you can refer to this link：#720 I encountered the same error message as yours , and solved by the solutions in this ref (pip install ctranslate2==3.24.0)

This worked for me. Hopefully this is resolved at some point to take advantage of newer versions.

@JusticeGH

Not sure how this can work, at least with regard to this repo (whisperx) when faster-whisper requires ctranslate2 >4.0.0. Can you explain how you got past that requirement?

GuyPaddock · 2024-08-18T05:03:15Z

I think you can refer to this link：#720 I encountered the same error message as yours , and solved by the solutions in this ref (pip install ctranslate2==3.24.0)

I'm astounded... This worked for me as well.

GuyPaddock · 2024-08-18T05:04:50Z

@JusticeGH

Not sure how this can work, at least with regard to this repo (whisperx) when faster-whisper requires ctranslate2 >4.0.0. Can you explain how you got past that requirement?

You have to install it after installing everything else. pip will install the conflicting requirement, but it will complain (see the ERROR followed by the SUCCESS):

Using cached ctranslate2-3.24.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (36.8 MB)
Installing collected packages: ctranslate2
  Attempting uninstall: ctranslate2
    Found existing installation: ctranslate2 4.3.1
    Uninstalling ctranslate2-4.3.1:
      Successfully uninstalled ctranslate2-4.3.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
faster-whisper 1.0.0 requires ctranslate2<5,>=4.0, but you have ctranslate2 3.24.0 which is incompatible.
Successfully installed ctranslate2-3.24.0

Basically, you're forcing something that should not be compatible to install anyway, at least as a workaround until this issue can be resolved some other way (likely in the CTranslate package).

GuyPaddock mentioned this issue Aug 18, 2024

OSError: undefined symbol: _ZN2at4_ops10zeros... #854

Open

giuliopaci mentioned this issue Aug 27, 2024

Reintroduce support for Compute Capability 5.0 OpenNMT/CTranslate2#1765

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device #794

RuntimeError: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device #794

JusticeGH commented May 1, 2024

laraws commented May 10, 2024

YJCX330 commented May 10, 2024

laraws commented May 10, 2024 •

edited

Loading

JusticeGH commented May 17, 2024

laraws commented May 17, 2024

cbsfletch commented Aug 7, 2024

GuyPaddock commented Aug 18, 2024

GuyPaddock commented Aug 18, 2024

RuntimeError: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device #794

RuntimeError: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device #794

Comments

JusticeGH commented May 1, 2024

laraws commented May 10, 2024

YJCX330 commented May 10, 2024

laraws commented May 10, 2024 • edited Loading

JusticeGH commented May 17, 2024

laraws commented May 17, 2024

cbsfletch commented Aug 7, 2024

GuyPaddock commented Aug 18, 2024

GuyPaddock commented Aug 18, 2024

laraws commented May 10, 2024 •

edited

Loading