-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda does not install #71
Comments
Tried to investigate a bit this issue since I've faced the same problem in one of my Docker container. If you're currently running your code through a setup.py , you should first add TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" to run:
(or an ARG TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" in your Dockerfile for instance ) Additional infos. can be found here: https://pytorch.org/docs/stable/cpp_extension.html |
How to find the "YOUR_GPUs_CC+PTX" of my gpu? |
You should find everything you need on this link (go to section CUDA-Enabled NVIDIA Quadro and NVIDIA RTX) |
Have you solved this issue? |
Is torch.cuda.is_available() False? I have had this only when I try to compile with a broken install of pytorch or cuda. |
Which cuda and pytorch version did you use? |
It came to my attention last night when I was trying to compile for 1.8.2 -
and I realized this was because torch.cuda.is_available() was False. Once I
fixed my cuda this compile error was also gone.
…On Mon, Oct 18, 2021 at 10:12 AM Ahmed Ahmed ***@***.***> wrote:
Is torch.cuda.is_available() False? I have had this only when I try to
compile with a broken install of pytorch or cuda.
Which cuda and pytorch version did you use?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#71 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAITRZJIU77MP5ADH4HZEYLUHM337ANCNFSM42ZYAHGA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
The solution that worked for me on Linux:
If not, you need to change it and then restart docker with sudo systemctl restart docker |
Hello, for anyone visiting this issue, the problem is caused here : https://github.com/pytorch/pytorch/blob/master/torch/utils/cpp_extension.py#L1694 basically, the The thing is, when no CUDA card is detected, the function The leads to the last line, which essentially says "add '+PTX' to the name of last architecture, whicvh obviously fails when the arch_list is empty As such, this problem is essentially because no cuda hardware was found by torch. Possible reasons and solutions:
If there is no way to detect gpu at build time, but you know what architecture it should run on, you can explicitly set it with environment variable, like said in this comment ( #71 (comment) ) |
if you are building in Nvidia docker container without actual GPU, you can use something like this:
|
I had the same error running in WSL on Windows. The above solutions of setting the TORCH_CUDA_ARCH_LIST environment variable fixed the issue. |
how to solve this problem on windows platform @gaetan-landreau @ClementPinard |
If the gpu driver is loaded correctly, execute the following statement in the python console
that means |
I got cuda working inside of docker on Windows 10 thanks to the instructions here and a little help from ChatGPT. The issue is as @earor-R said, you can figure out the So you can set up half the Dockerfile automated like FROM nvidia/cuda:11.7.1-devel-ubuntu22.04
WORKDIR /srv
RUN apt update && apt install -y curl build-essential git
RUN curl -sL "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > /tmp/miniconda.sh
RUN bash /tmp/miniconda.sh -b -p /opt/miniconda
ENV PATH="/opt/miniconda/bin:$PATH"
RUN pip install torch torchvision torchaudio
RUN git clone https://github.com/oobabooga/text-generation-webui .
RUN mkdir /srv/repositories
RUN cd /srv/repositories && git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda Then build it: docker build . -t oobabooga --progress=plain Then run it, give the container a name, add docker run --gpus all -it --name temp-container oobabooga /bin/bash Then once inside you can get the cuda version like @earor-R said and finish the install: python -c 'import torch; print(".".join(map(str, torch.cuda.get_device_capability(0))))'
export TORCH_CUDA_ARCH_LIST=="8.6+PTX"
cd /srv/repositories/GPTQ-for-LLaMa && python setup_cuda.py install Then exit the container and commit it back into an image: docker commit temp-container oobabooga-run And then finally you can run it: docker run -it --gpus=all --rm -p 7860:7860 --mount "type=bind,src=$(wslpath -w text-generation-webui/models),dst=/srv/models,readonly" oobabooga-run python server.py --auto-devices --chat --model=gpt4-x-alpaca-13b-native-4bit-128g --wbits=4 --groupsize=128 --gpu-memory=18 --listen I wish I could automate the build easier so this is maintainable but that's the best I've got right now. |
You can use the next scrip to obtain your GPUs arch: You will get |
I solve this by running: |
updated this workaround to support cuda v12:
|
This works for me, thanks. |
python : 3.7
cuda : 11.1
pytorch : 1.8
I am trying to compile the cuda code which does not work, could you have a look please? thanks
The text was updated successfully, but these errors were encountered: