Docker Compose - Container Failing To Start Due to GPU Passthrough Error #4

au70ma70n · 2024-09-23T18:04:39Z

Forge is entering a fatal state due to a torch gpu error. I have attached the error as well as the relevant nvidia driver and tool versions. The command used to deploy was docker compose up per the documentation.

Error:

supervisor-1  | Starting SD Web UI Forge...
supervisor-1  | Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]
supervisor-1  | Version: f0.0.1-v1.7.0d
supervisor-1  | Commit hash: f53d0b42cc0ed5098dec2ab2315d8f907786e175
supervisor-1  | Traceback (most recent call last):
supervisor-1  |   File "/workspace/stable-diffusion-webui-forge/launch.py", line 48, in <module>
supervisor-1  |     main()
supervisor-1  |   File "/workspace/stable-diffusion-webui-forge/launch.py", line 39, in main
supervisor-1  |     prepare_environment()
supervisor-1  |   File "/workspace/stable-diffusion-webui-forge/modules/launch_utils.py", line 429, in prepare_environment
supervisor-1  |     raise RuntimeError(
supervisor-1  | RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
supervisor-1  | 
supervisor-1  | ==> /var/log/supervisor/supervisor.log <==
supervisor-1  | 2024-09-23 17:48:00,937 INFO spawned: 'forge' with pid 1859
supervisor-1  | 2024-09-23 17:48:01,745 INFO exited: forge (exit status 1; not expected)
supervisor-1  | 2024-09-23 17:48:01,933 INFO gave up: forge entered FATAL state, too many start retries too quickly

Docker version:

Docker version 27.2.1, build 9e34c9bb39

Nvidia SMI

Mon Sep 23 13:00:52 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0  On |                  Off |
|  0%   56C    P0             57W /  500W |    3150MiB /  24564MiB |     14%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

NVCC:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:10:22_PDT_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0

Nvidia Drivers:

NVRM version: NVIDIA UNIX x86_64 Kernel Module  560.35.03  Fri Aug 16 21:39:15 UTC 2024
GCC version:  gcc version 14.2.1 20240910 (GCC)

Nvidia Container Toolkit:

NVIDIA Container Toolkit CLI version 1.16.1

I have also verified that gpu passthrough is functioning:

docker run -it --rm --gpus all ubuntu nvidia-smi 

Mon Sep 23 17:58:11 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0  On |                  Off |
|  0%   52C    P5             32W /  500W |    3008MiB /  24564MiB |     34%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

The text was updated successfully, but these errors were encountered:

robballantyne · 2024-09-25T05:08:57Z

My fault. The docker-compose.yaml is missing the following

deploy:
    resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

I'll push an update later today

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker Compose - Container Failing To Start Due to GPU Passthrough Error #4

Docker Compose - Container Failing To Start Due to GPU Passthrough Error #4

au70ma70n commented Sep 23, 2024 •

edited

Loading

robballantyne commented Sep 25, 2024 •

edited

Loading

Docker Compose - Container Failing To Start Due to GPU Passthrough Error #4

Docker Compose - Container Failing To Start Due to GPU Passthrough Error #4

Comments

au70ma70n commented Sep 23, 2024 • edited Loading

robballantyne commented Sep 25, 2024 • edited Loading

au70ma70n commented Sep 23, 2024 •

edited

Loading

robballantyne commented Sep 25, 2024 •

edited

Loading