Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Compose - Container Failing To Start Due to GPU Passthrough Error #4

Open
au70ma70n opened this issue Sep 23, 2024 · 1 comment

Comments

@au70ma70n
Copy link

au70ma70n commented Sep 23, 2024

Forge is entering a fatal state due to a torch gpu error. I have attached the error as well as the relevant nvidia driver and tool versions. The command used to deploy was docker compose up per the documentation.

Error:

supervisor-1  | Starting SD Web UI Forge...
supervisor-1  | Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]
supervisor-1  | Version: f0.0.1-v1.7.0d
supervisor-1  | Commit hash: f53d0b42cc0ed5098dec2ab2315d8f907786e175
supervisor-1  | Traceback (most recent call last):
supervisor-1  |   File "/workspace/stable-diffusion-webui-forge/launch.py", line 48, in <module>
supervisor-1  |     main()
supervisor-1  |   File "/workspace/stable-diffusion-webui-forge/launch.py", line 39, in main
supervisor-1  |     prepare_environment()
supervisor-1  |   File "/workspace/stable-diffusion-webui-forge/modules/launch_utils.py", line 429, in prepare_environment
supervisor-1  |     raise RuntimeError(
supervisor-1  | RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
supervisor-1  | 
supervisor-1  | ==> /var/log/supervisor/supervisor.log <==
supervisor-1  | 2024-09-23 17:48:00,937 INFO spawned: 'forge' with pid 1859
supervisor-1  | 2024-09-23 17:48:01,745 INFO exited: forge (exit status 1; not expected)
supervisor-1  | 2024-09-23 17:48:01,933 INFO gave up: forge entered FATAL state, too many start retries too quickly

Docker version:

Docker version 27.2.1, build 9e34c9bb39

Nvidia SMI

Mon Sep 23 13:00:52 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0  On |                  Off |
|  0%   56C    P0             57W /  500W |    3150MiB /  24564MiB |     14%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

NVCC:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:10:22_PDT_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0

Nvidia Drivers:

NVRM version: NVIDIA UNIX x86_64 Kernel Module  560.35.03  Fri Aug 16 21:39:15 UTC 2024
GCC version:  gcc version 14.2.1 20240910 (GCC) 

Nvidia Container Toolkit:

NVIDIA Container Toolkit CLI version 1.16.1

I have also verified that gpu passthrough is functioning:

docker run -it --rm --gpus all ubuntu nvidia-smi 

Mon Sep 23 17:58:11 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0  On |                  Off |
|  0%   52C    P5             32W /  500W |    3008MiB /  24564MiB |     34%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
@robballantyne
Copy link
Member

robballantyne commented Sep 25, 2024

My fault. The docker-compose.yaml is missing the following

deploy:
    resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

I'll push an update later today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants