cuda does not install #71

dorooddorood606 · 2021-04-12T17:47:43Z

python : 3.7
cuda : 11.1
pytorch : 1.8

I am trying to compile the cuda code which does not work, could you have a look please? thanks

Traceback (most recent call last):
  File "jit.py", line 3, in <module>
    'lltm_cuda', ['lltm_cuda.cpp', 'lltm_cuda_kernel.cu'], verbose=True)
  File "/user/x/anaconda3/envs/test1/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1091, in load
    keep_intermediates=keep_intermediates)
  File "/user/x/libs/anaconda3/envs/test1/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1302, in _jit_compile
    is_standalone=is_standalone)
  File "/user/x/libs/anaconda3/envs/test1/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1400, in _write_ninja_file_and_build_library
    is_standalone=is_standalone)
  File "/user/x/libs/anaconda3/envs/test1/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1782, in _write_ninja_file_to_build_library
    cuda_flags = common_cflags + COMMON_NVCC_FLAGS + _get_cuda_arch_flags()
  File "/user/x/libs/anaconda3/envs/test1/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1561, in _get_cuda_arch_flags
    arch_list[-1] += '+PTX'

IndexError: list index out of range

The text was updated successfully, but these errors were encountered:

gaetan-landreau · 2021-04-22T15:01:22Z

Tried to investigate a bit this issue since I've faced the same problem in one of my Docker container.

If you're currently running your code through a setup.py , you should first add TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" to run:

python TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" setup.py install

(or an ARG TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" in your Dockerfile for instance )

Additional infos. can be found here: https://pytorch.org/docs/stable/cpp_extension.html

kuzand · 2021-07-29T15:32:24Z

Tried to investigate a bit this issue since I've faced the same problem in one of my Docker container.

If you're currently running your code through a setup.py , you should first add TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" to run:

python TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" setup.py install

(or an ARG TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" in your Dockerfile for instance )

Additional infos. can be found here: https://pytorch.org/docs/stable/cpp_extension.html

How to find the "YOUR_GPUs_CC+PTX" of my gpu?

gaetan-landreau · 2021-07-29T15:40:11Z

You should find everything you need on this link (go to section CUDA-Enabled NVIDIA Quadro and NVIDIA RTX)

darkdevahm · 2021-10-17T21:06:03Z

Tried to investigate a bit this issue since I've faced the same problem in one of my Docker container.
If you're currently running your code through a setup.py , you should first add TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" to run:
python TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" setup.py install
(or an ARG TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" in your Dockerfile for instance )
Additional infos. can be found here: https://pytorch.org/docs/stable/cpp_extension.html

How to find the "YOUR_GPUs_CC+PTX" of my gpu?

Have you solved this issue?

oliver-batchelor · 2021-10-17T21:08:02Z

Is torch.cuda.is_available() False? I have had this only when I try to compile with a broken install of pytorch or cuda.

darkdevahm · 2021-10-17T21:12:21Z

Is torch.cuda.is_available() False? I have had this only when I try to compile with a broken install of pytorch or cuda.

Which cuda and pytorch version did you use?

oliver-batchelor · 2021-10-17T22:17:24Z

It came to my attention last night when I was trying to compile for 1.8.2 - and I realized this was because torch.cuda.is_available() was False. Once I fixed my cuda this compile error was also gone.

…

On Mon, Oct 18, 2021 at 10:12 AM Ahmed Ahmed ***@***.***> wrote: Is torch.cuda.is_available() False? I have had this only when I try to compile with a broken install of pytorch or cuda. Which cuda and pytorch version did you use? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#71 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAITRZJIU77MP5ADH4HZEYLUHM337ANCNFSM42ZYAHGA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

MalteEbner · 2022-03-08T15:10:30Z

The solution that worked for me on Linux:
The docker requires access to the cuda library during build time. To ensure this, make sure that
your /etc/docker/daemon.json file looks as follows:

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}

If not, you need to change it and then restart docker with

sudo systemctl restart docker

ClementPinard · 2022-05-25T14:16:27Z

Hello, for anyone visiting this issue, the problem is caused here : https://github.com/pytorch/pytorch/blob/master/torch/utils/cpp_extension.py#L1694

basically, the arch_list is supposed to be constructed with discovered architectures with torch.cuda.get_device_capability(i)

The thing is, when no CUDA card is detected, the function torch.cuda.device_count() returns 0 and thus no architecture is added to that list.

The leads to the last line, which essentially says "add '+PTX' to the name of last architecture, whicvh obviously fails when the arch_list is empty

As such, this problem is essentially because no cuda hardware was found by torch. Possible reasons and solutions:

driver / cuda mismatch. Probably due to updating of cuda, reboot and driver will be updated
docker context. See comments above ( cuda does not install #71 (comment) )

If there is no way to detect gpu at build time, but you know what architecture it should run on, you can explicitly set it with environment variable, like said in this comment ( #71 (comment) )

apatsekin · 2022-07-13T21:01:52Z

if you are building in Nvidia docker container without actual GPU, you can use something like this:

CUDA_VERSION=$(/usr/local/cuda/bin/nvcc --version | sed -n 's/^.*release \([0-9]\+\.[0-9]\+\).*$/\1/p')
if [[ ${CUDA_VERSION} == 9.0* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;7.0+PTX"
elif [[ ${CUDA_VERSION} == 9.2* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0+PTX"
elif [[ ${CUDA_VERSION} == 10.* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5+PTX"
elif [[ ${CUDA_VERSION} == 11.0* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0+PTX"
elif [[ ${CUDA_VERSION} == 11.* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX"
else
    echo "unsupported cuda version."
    exit 1
fi

andriworld · 2022-08-20T14:30:17Z

I had the same error running in WSL on Windows. The above solutions of setting the TORCH_CUDA_ARCH_LIST environment variable fixed the issue.

XiangFeng66 · 2023-02-28T07:13:47Z

how to solve this problem on windows platform @gaetan-landreau @ClementPinard

earor-R · 2023-03-07T12:22:57Z

Tried to investigate a bit this issue since I've faced the same problem in one of my Docker container.
If you're currently running your code through a setup.py , you should first add TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" to run:
python TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" setup.py install
(or an ARG TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" in your Dockerfile for instance )
Additional infos. can be found here: https://pytorch.org/docs/stable/cpp_extension.html

How to find the "YOUR_GPUs_CC+PTX" of my gpu?

If the gpu driver is loaded correctly, execute the following statement in the python console

>>> torch.cuda.get_device_capability(0)
(6, 1)

that means TORCH_CUDA_ARCH_LIST="6.1". However, in most cases, cuda is unavailable because you have specified gpu incorrectly, such as whether you have set CUDA_ VISIBLE_ DEVICES and the specified gpu is not available?

mnpenner · 2023-04-15T07:17:58Z

I got cuda working inside of docker on Windows 10 thanks to the instructions here and a little help from ChatGPT.

The issue is as @earor-R said, you can figure out the TORCH_CUDA_ARCH_LIST but the GPU still isn't available during docker build. You can, however, make it available during docker run by adding --gpus=all.

So you can set up half the Dockerfile automated like

FROM nvidia/cuda:11.7.1-devel-ubuntu22.04

WORKDIR /srv

RUN apt update && apt install -y curl build-essential git

RUN curl -sL "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > /tmp/miniconda.sh

RUN bash /tmp/miniconda.sh -b -p /opt/miniconda

ENV PATH="/opt/miniconda/bin:$PATH"

RUN pip install torch torchvision torchaudio

RUN git clone https://github.com/oobabooga/text-generation-webui .

RUN mkdir /srv/repositories
RUN cd /srv/repositories && git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda

Then build it:

docker build . -t oobabooga --progress=plain

Then run it, give the container a name, add --gpus all, and don't add --rm:

docker run --gpus all -it --name temp-container oobabooga /bin/bash

Then once inside you can get the cuda version like @earor-R said and finish the install:

python -c 'import torch; print(".".join(map(str, torch.cuda.get_device_capability(0))))'
export TORCH_CUDA_ARCH_LIST=="8.6+PTX"
cd /srv/repositories/GPTQ-for-LLaMa && python setup_cuda.py install

Then exit the container and commit it back into an image:

 docker commit temp-container oobabooga-run

And then finally you can run it:

docker run -it --gpus=all --rm -p 7860:7860 --mount "type=bind,src=$(wslpath -w text-generation-webui/models),dst=/srv/models,readonly" oobabooga-run python server.py --auto-devices --chat --model=gpt4-x-alpaca-13b-native-4bit-128g --wbits=4 --groupsize=128 --gpu-memory=18 --listen

I wish I could automate the build easier so this is maintainable but that's the best I've got right now.

alexmeri98 · 2023-05-15T14:00:27Z

Tried to investigate a bit this issue since I've faced the same problem in one of my Docker container.
If you're currently running your code through a setup.py , you should first add TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" to run:
python TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" setup.py install
(or an ARG TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" in your Dockerfile for instance )
Additional infos. can be found here: https://pytorch.org/docs/stable/cpp_extension.html

How to find the "YOUR_GPUs_CC+PTX" of my gpu?

You can use the next scrip to obtain your GPUs arch:
import torch torch.cuda.get_arch_list()

You will get ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86'] and you will have to parse this into "3.7 5.0 6.0 7.0 7.5 8.0 8.6+PTX"

imzeroan · 2023-07-21T12:15:27Z

I solve this by running: # TORCH_CUDA_ARCH_LIST="6.1+PTX" python setup.py install for my GTX1080ti. The GPU_CC number6.1 is according to 1080ti refer to https://developer.nvidia.com/cuda-gpus

Leask · 2023-10-25T01:22:52Z

CUDA_VERSION=$(/usr/local/cuda/bin/nvcc --version | sed -n 's/^.release ([0-9]+.[0-9]+).$/\1/p')
if [[ ${CUDA_VERSION} == 9.0* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;7.0+PTX"
elif [[ ${CUDA_VERSION} == 9.2* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0+PTX"
elif [[ ${CUDA_VERSION} == 10.* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5+PTX"
elif [[ ${CUDA_VERSION} == 11.0* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0+PTX"
elif [[ ${CUDA_VERSION} == 11.* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX"
else
echo "unsupported cuda version."
exit 1
fi

updated this workaround to support cuda v12:

CUDA_VERSION=$(/usr/local/cuda/bin/nvcc --version | sed -n 's/^.*release \([0-9]\+\.[0-9]\+\).*$/\1/p')
if [[ ${CUDA_VERSION} == 9.0* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;7.0+PTX"
elif [[ ${CUDA_VERSION} == 9.2* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0+PTX"
elif [[ ${CUDA_VERSION} == 10.* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5+PTX"
elif [[ ${CUDA_VERSION} == 11.0* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0+PTX"
elif [[ ${CUDA_VERSION} == 11.* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX"
elif [[ ${CUDA_VERSION} == 12.* ]]; then
    export TORCH_CUDA_ARCH_LIST="5.0;5.2;5.3;6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6;8.7;8.9;9.0+PTX"
else
    echo "unsupported cuda version."
    exit 1
fi

VAllens · 2024-10-06T15:48:52Z

if you are building in Nvidia docker container without actual GPU, you can use something like this:

CUDA_VERSION=$(/usr/local/cuda/bin/nvcc --version | sed -n 's/^.*release \([0-9]\+\.[0-9]\+\).*$/\1/p')
if [[ ${CUDA_VERSION} == 9.0* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;7.0+PTX"
elif [[ ${CUDA_VERSION} == 9.2* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0+PTX"
elif [[ ${CUDA_VERSION} == 10.* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5+PTX"
elif [[ ${CUDA_VERSION} == 11.0* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0+PTX"
elif [[ ${CUDA_VERSION} == 11.* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX"
else
    echo "unsupported cuda version."
    exit 1
fi

This works for me, thanks.
I develop and compile projects on computers without NVIDIA graphics installed,
and run test programs on computers with NVIDIA graphics installed.

ashawkey mentioned this issue Mar 18, 2022

CUDA Issues running NeRF example ashawkey/torch-ngp#31

Closed

zzj403 mentioned this issue May 26, 2022

IndexError: list index out of range MhLiao/DB#270

Open

Maurdekye mentioned this issue Oct 27, 2022

Steps I had to take in order to run this project on my machine locally Newbeeer/Poisson_flow#4

Closed

notmahi mentioned this issue Nov 14, 2022

Last step of installation throws errors notmahi/clip-fields#3

Closed

adizhol mentioned this issue Nov 21, 2022

_ext undefined symbol when using source-compiled pytorch open-mmlab/mmcv#2439

Closed

mmbannert mentioned this issue Feb 4, 2023

pip install fails with "No CUDA runtime is found" despite existing CUDA toolkit installation ClementPinard/Pytorch-Correlation-extension#90

Closed

JamesLong199 mentioned this issue Feb 8, 2023

Error installing tcnn when building Docker image (Dockerfile:54) openxrlab/xrnerf#30

Closed

Zijie-Tian mentioned this issue Feb 9, 2023

Docker step fails in object_detection benchmark mlcommons/training#619

Closed

ClementPinard mentioned this issue Mar 8, 2023

some errors with python setup.py install ClementPinard/Pytorch-Correlation-extension#91

Closed

gsgoldma mentioned this issue Mar 22, 2023

WSL problems oobabooga/text-generation-webui#467

Closed

1 task

oobabooga mentioned this issue Mar 22, 2023

Dockerfile oobabooga/text-generation-webui#174

Closed

li126com mentioned this issue Aug 22, 2023

Dockerfile build ERROR: Could not build wheels for fused-dense-lib, which is required to install pyproject.toml-based projects Dao-AILab/flash-attention#471

Open

eolivi-fy mentioned this issue Sep 6, 2023

Added a Dockerfile and instructions to build it graphdeco-inria/gaussian-splatting#163

Open

Silverster98 mentioned this issue Sep 14, 2023

_get_cuda_arch_flags arch_list[-1] += '+PTX' Silverster98/pointops#3

Open

chongkuiqi mentioned this issue Sep 30, 2023

同样是运行2（3）时出错 chongkuiqi/S2ANet#11

Open

Zanue mentioned this issue Oct 12, 2023

No module named 'addict' Jumpat/SegmentAnythingin3D#30

Closed

ClementPinard mentioned this issue Dec 28, 2023

IndexError: list index out of range ClementPinard/Pytorch-Correlation-extension#104

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda does not install #71

cuda does not install #71

dorooddorood606 commented Apr 12, 2021 •

edited

Loading

gaetan-landreau commented Apr 22, 2021 •

edited

Loading

kuzand commented Jul 29, 2021

gaetan-landreau commented Jul 29, 2021

darkdevahm commented Oct 17, 2021

oliver-batchelor commented Oct 17, 2021

darkdevahm commented Oct 17, 2021

oliver-batchelor commented Oct 17, 2021 via email

MalteEbner commented Mar 8, 2022 •

edited

Loading

ClementPinard commented May 25, 2022

apatsekin commented Jul 13, 2022 •

edited

Loading

andriworld commented Aug 20, 2022 •

edited

Loading

XiangFeng66 commented Feb 28, 2023

earor-R commented Mar 7, 2023 •

edited

Loading

mnpenner commented Apr 15, 2023

alexmeri98 commented May 15, 2023

imzeroan commented Jul 21, 2023

Leask commented Oct 25, 2023

VAllens commented Oct 6, 2024

cuda does not install #71

cuda does not install #71

Comments

dorooddorood606 commented Apr 12, 2021 • edited Loading

gaetan-landreau commented Apr 22, 2021 • edited Loading

kuzand commented Jul 29, 2021

gaetan-landreau commented Jul 29, 2021

darkdevahm commented Oct 17, 2021

oliver-batchelor commented Oct 17, 2021

darkdevahm commented Oct 17, 2021

oliver-batchelor commented Oct 17, 2021 via email

MalteEbner commented Mar 8, 2022 • edited Loading

ClementPinard commented May 25, 2022

apatsekin commented Jul 13, 2022 • edited Loading

andriworld commented Aug 20, 2022 • edited Loading

XiangFeng66 commented Feb 28, 2023

earor-R commented Mar 7, 2023 • edited Loading

mnpenner commented Apr 15, 2023

alexmeri98 commented May 15, 2023

imzeroan commented Jul 21, 2023

Leask commented Oct 25, 2023

VAllens commented Oct 6, 2024

dorooddorood606 commented Apr 12, 2021 •

edited

Loading

gaetan-landreau commented Apr 22, 2021 •

edited

Loading

MalteEbner commented Mar 8, 2022 •

edited

Loading

apatsekin commented Jul 13, 2022 •

edited

Loading

andriworld commented Aug 20, 2022 •

edited

Loading

earor-R commented Mar 7, 2023 •

edited

Loading