Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Error: nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) #1388

Closed
5 of 8 tasks
JingL1014 opened this issue Sep 23, 2020 · 55 comments
Closed
5 of 8 tasks

Error: nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) #1388

JingL1014 opened this issue Sep 23, 2020 · 55 comments

Comments

@JingL1014
Copy link

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Also, before reporting a new issue, please make sure that:


1. Issue or feature description

I am following the instruction on github to install nvidia-docker on Ubuntu20.04 but failed with the following error. Could you help me to identify the problem? Thank you!

sudo apt-get update
Hit:1 https://nvidia.github.io/libnvidia-container/stable/ubuntu20.04/amd64 InRelease
Hit:2 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu20.04/amd64 InRelease
Hit:3 https://nvidia.github.io/nvidia-docker/ubuntu20.04/amd64 InRelease
Get:4 https://download.docker.com/linux/ubuntu focal InRelease [36.2 kB]
Hit:5 http://security.ubuntu.com/ubuntu focal-security InRelease
Hit:6 http://archive.lambdalabs.com/ubuntu focal InRelease
Hit:7 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:8 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:9 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Fetched 36.2 kB in 1s (46.2 kB/s)
Reading package lists... Done

sudo apt-get install -y nvidia-docker2
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

2. Steps to reproduce the issue

sudo apt-get install -y nvidia-docker2

3. Information to attach (optional if deemed irrelevant)

  • Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info

I0923 20:39:55.953720 464021 nvc.c:282] initializing library context (version=1.2.0, build=)
I0923 20:39:55.953761 464021 nvc.c:256] using root /
I0923 20:39:55.953766 464021 nvc.c:257] using ldcache /etc/ld.so.cache
I0923 20:39:55.953770 464021 nvc.c:258] using unprivileged user 4163:4163
I0923 20:39:55.953786 464021 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0923 20:39:55.953881 464021 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
W0923 20:39:55.956568 464022 nvc.c:187] failed to set inheritable capabilities
W0923 20:39:55.956616 464022 nvc.c:188] skipping kernel modules load due to failure
I0923 20:39:55.956875 464023 driver.c:101] starting driver service
I0923 20:39:55.959606 464021 nvc_info.c:679] requesting driver information with ''
I0923 20:39:55.960768 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.450.57
I0923 20:39:55.960809 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.450.57
I0923 20:39:55.960831 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.450.57
I0923 20:39:55.960854 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.450.57
I0923 20:39:55.960889 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.450.57
I0923 20:39:55.960923 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.450.57
I0923 20:39:55.960945 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.450.57
I0923 20:39:55.960965 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.450.57
I0923 20:39:55.961000 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.450.57
I0923 20:39:55.961033 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.450.57
I0923 20:39:55.961054 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.450.57
I0923 20:39:55.961074 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.450.57
I0923 20:39:55.961095 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.450.57
I0923 20:39:55.961128 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.450.57
I0923 20:39:55.961161 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.450.57
I0923 20:39:55.961182 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.450.57
I0923 20:39:55.961203 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.450.57
I0923 20:39:55.961235 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.450.57
I0923 20:39:55.961257 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.450.57
I0923 20:39:55.961295 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.450.57
I0923 20:39:55.961534 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.450.57
I0923 20:39:55.961646 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.450.57
I0923 20:39:55.961669 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.450.57
I0923 20:39:55.961692 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.450.57
I0923 20:39:55.961716 464021 nvc_info.c:168] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.450.57
I0923 20:39:55.961757 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-tls.so.450.57
I0923 20:39:55.961790 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.450.57
I0923 20:39:55.961827 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.450.57
I0923 20:39:55.961864 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-opencl.so.450.57
I0923 20:39:55.961887 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-ml.so.450.57
I0923 20:39:55.961923 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-ifr.so.450.57
I0923 20:39:55.961957 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.450.57
I0923 20:39:55.961989 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-glsi.so.450.57
I0923 20:39:55.962009 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-glcore.so.450.57
I0923 20:39:55.962032 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-fbc.so.450.57
I0923 20:39:55.962071 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-encode.so.450.57
I0923 20:39:55.962105 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.450.57
I0923 20:39:55.962125 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-compiler.so.450.57
I0923 20:39:55.962146 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvidia-allocator.so.450.57
I0923 20:39:55.962182 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libnvcuvid.so.450.57
I0923 20:39:55.962229 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libcuda.so.450.57
I0923 20:39:55.962272 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libGLX_nvidia.so.450.57
I0923 20:39:55.962295 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libGLESv2_nvidia.so.450.57
I0923 20:39:55.962318 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libGLESv1_CM_nvidia.so.450.57
I0923 20:39:55.962340 464021 nvc_info.c:168] selecting /usr/lib/i386-linux-gnu/libEGL_nvidia.so.450.57
W0923 20:39:55.962361 464021 nvc_info.c:349] missing library libnvidia-fatbinaryloader.so
W0923 20:39:55.962366 464021 nvc_info.c:349] missing library libvdpau_nvidia.so
W0923 20:39:55.962373 464021 nvc_info.c:353] missing compat32 library libnvidia-cfg.so
W0923 20:39:55.962379 464021 nvc_info.c:353] missing compat32 library libnvidia-fatbinaryloader.so
W0923 20:39:55.962384 464021 nvc_info.c:353] missing compat32 library libnvidia-ngx.so
W0923 20:39:55.962389 464021 nvc_info.c:353] missing compat32 library libvdpau_nvidia.so
W0923 20:39:55.962395 464021 nvc_info.c:353] missing compat32 library libnvidia-rtcore.so
W0923 20:39:55.962400 464021 nvc_info.c:353] missing compat32 library libnvoptix.so
W0923 20:39:55.962407 464021 nvc_info.c:353] missing compat32 library libnvidia-cbl.so
I0923 20:39:55.968551 464021 nvc_info.c:275] selecting /usr/bin/nvidia-smi
I0923 20:39:55.968574 464021 nvc_info.c:275] selecting /usr/bin/nvidia-debugdump
I0923 20:39:55.968597 464021 nvc_info.c:275] selecting /usr/bin/nvidia-persistenced
I0923 20:39:55.968612 464021 nvc_info.c:275] selecting /usr/bin/nvidia-cuda-mps-control
I0923 20:39:55.968631 464021 nvc_info.c:275] selecting /usr/bin/nvidia-cuda-mps-server
I0923 20:39:55.968652 464021 nvc_info.c:437] listing device /dev/nvidiactl
I0923 20:39:55.968657 464021 nvc_info.c:437] listing device /dev/nvidia-uvm
I0923 20:39:55.968663 464021 nvc_info.c:437] listing device /dev/nvidia-uvm-tools
I0923 20:39:55.968667 464021 nvc_info.c:437] listing device /dev/nvidia-modeset
I0923 20:39:55.968695 464021 nvc_info.c:316] listing ipc /run/nvidia-persistenced/socket
W0923 20:39:55.968712 464021 nvc_info.c:320] missing ipc /tmp/nvidia-mps
I0923 20:39:55.968717 464021 nvc_info.c:744] requesting device information with ''
I0923 20:39:55.975153 464021 nvc_info.c:627] listing device /dev/nvidia0 (GPU-b4284e5d-adf4-2a5e-69dd-f53c99fc475d at 00000000:01:00.0)
I0923 20:39:55.981478 464021 nvc_info.c:627] listing device /dev/nvidia1 (GPU-c2e07576-ea0a-33b0-1622-f8c2132c2086 at 00000000:21:00.0)
I0923 20:39:55.988026 464021 nvc_info.c:627] listing device /dev/nvidia2 (GPU-ce68be3f-afa6-1eb5-a43c-27640ca76732 at 00000000:4b:00.0)
I0923 20:39:55.994670 464021 nvc_info.c:627] listing device /dev/nvidia3 (GPU-b74b3210-8285-2858-0bd7-5fb7e2d40cba at 00000000:4c:00.0)
NVRM version: 450.57
CUDA version: 11.0

Device Index: 0
Device Minor: 0
Model: Quadro RTX 6000
Brand: Quadro
GPU UUID: GPU-b4284e5d-adf4-2a5e-69dd-f53c99fc475d
Bus Location: 00000000:01:00.0
Architecture: 7.5

Device Index: 1
Device Minor: 1
Model: Quadro RTX 6000
Brand: Quadro
GPU UUID: GPU-c2e07576-ea0a-33b0-1622-f8c2132c2086
Bus Location: 00000000:21:00.0
Architecture: 7.5

Device Index: 2
Device Minor: 2
Model: Quadro RTX 6000
Brand: Quadro
GPU UUID: GPU-ce68be3f-afa6-1eb5-a43c-27640ca76732
Bus Location: 00000000:4b:00.0
Architecture: 7.5

Device Index: 3
Device Minor: 3
Model: Quadro RTX 6000
Brand: Quadro
GPU UUID: GPU-b74b3210-8285-2858-0bd7-5fb7e2d40cba
Bus Location: 00000000:4c:00.0
Architecture: 7.5
I0923 20:39:55.994743 464021 nvc.c:337] shutting down library context
I0923 20:39:55.995575 464023 driver.c:156] terminating driver service
I0923 20:39:55.995902 464021 driver.c:196] driver service terminated successfully

  • Kernel version from uname -a
    Linux mlrgpu07 5.4.0-47-generic Topic/small additions #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • Any relevant kernel output lines from dmesg

  • Driver information from nvidia-smi -a
    Driver Version : 450.57
    CUDA Version : 11.0

  • Docker version from docker version

Client: Docker Engine - Community
Version: 19.03.13
API version: 1.40
Go version: go1.13.15
Git commit: 4484c46d9d
Built: Wed Sep 16 17:02:52 2020
OS/Arch: linux/amd64
Experimental: false

Server: Docker Engine - Community
Engine:
Version: 19.03.13
API version: 1.40 (minimum version 1.12)
Go version: go1.13.15
Git commit: 4484c46d9d
Built: Wed Sep 16 17:01:20 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.7
GitCommit: 8fba4e9a7d01810a393d5d25a3621dc101981175
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683

  • [] NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
  • NVIDIA container library version from nvidia-container-cli -V
    version: 1.2.0
    build date: 2020-07-09T02:45+00:00
    build revision:
    build compiler: gcc-5 5.4.0 20160609
    build platform: x86_64
    build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -Wdate-time -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections -Wl,-Bsymbolic-functions -Wl,-z,relro
  • NVIDIA container library logs (see troubleshooting)
  • Docker command, image and tag used
@klueska
Copy link
Contributor

klueska commented Sep 24, 2020

It looks like you may have an old nvidia container stack installed (since you are able to run nvidia-container-cli successfully, but it was never installed as part of the current nvidia-docker2 installation).

Can you try uninstalling libnvidia-container1 (and in doing so, all of the things that depend on it).
Then try reinstalling nvidia-docker2 again.

This shouldn't be necessary, but it's worth a shot.

Also, are you on a DGX machine?
If so, this may be relevant:
#1355 (comment)

@AlexMikhalev
Copy link

@klueska I hit the same issue on Juno laptop, so it's not hardware-specific. Following instruction and fetching
https://nvidia.github.io/nvidia-docker/ubuntu20.04/nvidia-docker.list
results in

cat /etc/apt/sources.list.d/nvidia-docker.list
deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /

which point into 18.04 repo.

@klueska
Copy link
Contributor

klueska commented Sep 24, 2020

@AlexMikhalev the fact that nvidia-docker.list contains references to ubuntu 18.04 is not an issue (in fact the 20.04 repo is just a symlink to 18.04).

Are you also seeing problems with though:

nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) but it is not going to be installed

I have tested it multiple times in various environments and am not able to reproduce the issue.

@JingL1014
Copy link
Author

@klueska I uninstalled the libnvidia-container1, but I still got the same error. I am not on a DGX machine.

This is the list of Nvidia packages from "dpkg -l 'nvidia'"

||/ Name                             Version                 Architecture Description
un  libgldispatch0-nvidia                                     (no description available)
ii  libnvidia-cfg1-450:amd64         450.57-0lambda0~20.04.1 amd64        NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any                                        (no description available)
un  libnvidia-common                                          (no description available)
ii  libnvidia-common-450             450.57-0lambda0~20.04.1 all          Shared files used by the NVIDIA libraries
ii  libnvidia-compute-450:amd64      450.57-0lambda0~20.04.1 amd64        NVIDIA libcompute package
ii  libnvidia-compute-450:i386       450.57-0lambda0~20.04.1 i386         NVIDIA libcompute package
un  libnvidia-decode                                          (no description available)
ii  libnvidia-decode-450:amd64       450.57-0lambda0~20.04.1 amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-decode-450:i386        450.57-0lambda0~20.04.1 i386         NVIDIA Video Decoding runtime libraries
un  libnvidia-encode                                          (no description available)
ii  libnvidia-encode-450:amd64       450.57-0lambda0~20.04.1 amd64        NVENC Video Encoding runtime library
ii  libnvidia-encode-450:i386        450.57-0lambda0~20.04.1 i386         NVENC Video Encoding runtime library
un  libnvidia-extra                                           (no description available)
ii  libnvidia-extra-450:amd64        450.57-0lambda0~20.04.1 amd64        Extra libraries for the NVIDIA driver
ii  libnvidia-extra-450:i386         450.57-0lambda0~20.04.1 i386         Extra libraries for the NVIDIA driver
un  libnvidia-fbc1                                            (no description available)
ii  libnvidia-fbc1-450:amd64         450.57-0lambda0~20.04.1 amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-fbc1-450:i386          450.57-0lambda0~20.04.1 i386         NVIDIA OpenGL-based Framebuffer Capture runtime library
un  libnvidia-gl                                              (no description available)
ii  libnvidia-gl-450:amd64           450.57-0lambda0~20.04.1 amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-gl-450:i386            450.57-0lambda0~20.04.1 i386         NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un  libnvidia-ifr1                                            (no description available)
ii  libnvidia-ifr1-450:amd64         450.57-0lambda0~20.04.1 amd64        NVIDIA OpenGL-based Inband Frame Readback runtime library
ii  libnvidia-ifr1-450:i386          450.57-0lambda0~20.04.1 i386         NVIDIA OpenGL-based Inband Frame Readback runtime library
ii  libnvidia-ml-dev                 10.2.89-0lambda2        amd64        NVIDIA Management Library (NVML) development package
un  libnvidia-ml1                                             (no description available)
un  nvidia-304                                                (no description available)
un  nvidia-340                                                (no description available)
un  nvidia-384                                                (no description available)
un  nvidia-common                                             (no description available)
un  nvidia-compute-utils                                      (no description available)
ii  nvidia-compute-utils-450         450.57-0lambda0~20.04.1 amd64        NVIDIA compute utilities
ii  nvidia-cuda-dev:amd64            10.2.89-0lambda2        amd64        CUDA development files
ii  nvidia-cuda-doc                  10.2.89-0lambda2        all          CUDA toolkit documentation
ii  nvidia-cuda-gdb                  10.2.89-0lambda2        amd64        CUDA Debugger
ii  nvidia-cuda-toolkit              10.2.89-0lambda2        amd64        CUDA development toolkit
ii  nvidia-dkms-450                  450.57-0lambda0~20.04.1 amd64        NVIDIA DKMS package
un  nvidia-dkms-kernel                                        (no description available)
ii  nvidia-driver-440                450.57-0lambda0~20.04.1 amd64        Transitional package for nvidia-driver-450
ii  nvidia-driver-450                450.57-0lambda0~20.04.1 amd64        NVIDIA driver metapackage
un  nvidia-driver-binary                                      (no description available)
un  nvidia-driver-meta                                        (no description available)
un  nvidia-kernel-common                                      (no description available)
ii  nvidia-kernel-common-450         450.57-0lambda0~20.04.1 amd64        Shared files used with the kernel module
un  nvidia-kernel-source                                      (no description available)
ii  nvidia-kernel-source-450         450.57-0lambda0~20.04.1 amd64        NVIDIA kernel source package
un  nvidia-legacy-304xx-vdpau-driver                          (no description available)
un  nvidia-legacy-340xx-vdpau-driver                          (no description available)
un  nvidia-libopencl1                                         (no description available)
un  nvidia-libopencl1-dev                                     (no description available)
un  nvidia-opencl-icd                                         (no description available)
un  nvidia-persistenced                                       (no description available)
un  nvidia-prime                                              (no description available)
ii  nvidia-profiler                  10.2.89-0lambda2        amd64        NVIDIA CUDA profiler
ii  nvidia-settings                  450.57-0lambda1         amd64        Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary                                    (no description available)
un  nvidia-smi                                                (no description available)
un  nvidia-utils                                              (no description available)
ii  nvidia-utils-450                 450.57-0lambda0~20.04.1 amd64        NVIDIA driver support binaries
un  nvidia-vdpau-driver                                       (no description available)
ii  xserver-xorg-video-nvidia-450    450.57-0lambda0~20.04.1 amd64        NVIDIA binary Xorg driver

@klueska
Copy link
Contributor

klueska commented Sep 24, 2020

@JingL1014 Those are just the packages you have installed.

Can you show me the list of packages available:

sudo apt-cache madison nvidia-container-runtime

Mine shows:

nvidia-container-runtime |    3.4.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.3.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.2.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.4-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.3-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.7-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.6-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.5-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.5-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.4-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.3-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.3-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.2-2 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.03.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker17.12.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages

@JingL1014
Copy link
Author

JingL1014 commented Sep 24, 2020

@klueska

Mine shows:


nvidia-container-runtime |    3.4.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.3.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.2.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.4-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.3-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime |    3.1.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.7-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.6-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.5-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.5-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.4-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.3-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.3-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.2-2 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.2-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.06.0-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.03.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker17.12.1-1 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages

@klueska
Copy link
Contributor

klueska commented Sep 24, 2020

So it clearly shows version 3.4.0-1 being available.
It is strange that you would get this error then:

nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) but it is not going to be installed

Can you manually install nvidia-container-runtime and see which version gets installed?:


sudo apt-get install -y nvidia-container-runtime

@AlexMikhalev
Copy link

sudo apt-get install -y nvidia-docker2
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies.
 nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) but 3.1.4-0pop1~1569270714~20.04~2ea45f8 is to be installed
E: Unable to correct problems, you have held broken packages.

Trying to install container-runtime

sudo apt-get install -y nvidia-container-runtime
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libnvidia-cfg1-440 libnvidia-decode-440 libnvidia-decode-440:i386 libnvidia-encode-440
  libnvidia-encode-440:i386 libnvidia-extra-440 libnvidia-fbc1-440 libnvidia-fbc1-440:i386
  libnvidia-gl-440 libnvidia-ifr1-440 libnvidia-ifr1-440:i386 libxnvctrl0
  linux-headers-5.4.0-47 linux-headers-5.4.0-47-generic linux-image-5.4.0-47-generic
  linux-modules-5.4.0-47-generic linux-modules-extra-5.4.0-47-generic linux-tools-5.4.0-47
  linux-tools-5.4.0-47-generic mousetweaks nvidia-compute-utils-440 nvidia-kernel-source-440
  nvidia-settings nvidia-utils-440 screen-resolution-extra xserver-xorg-video-nvidia-440
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  libnvidia-container-tools libnvidia-container1 libtirpc-common libtirpc3
  nvidia-container-toolkit
The following NEW packages will be installed
  libnvidia-container-tools libnvidia-container1 libtirpc-common libtirpc3
  nvidia-container-runtime nvidia-container-toolkit
0 to upgrade, 6 to newly install, 0 to remove and 1 not to upgrade.
Need to get 1,807 kB of archives.
After this operation, 6,979 kB of additional disk space will be used.
Get:1 http://ppa.launchpad.net/system76/pop/ubuntu focal/main amd64 libnvidia-container1 amd64 1.0.6-1pop1~1571281295~20.04~862e228 [57.2 kB]
Get:2 http://us.archive.ubuntu.com/ubuntu focal/main amd64 libtirpc-common all 1.2.5-1 [7,632 B]
Get:3 http://ppa.launchpad.net/system76/pop/ubuntu focal/main amd64 libnvidia-container-tools amd64 1.0.6-1pop1~1571281295~20.04~862e228 [14.5 kB]
Get:4 http://us.archive.ubuntu.com/ubuntu focal/main amd64 libtirpc3 amd64 1.2.5-1 [77.2 kB]
Get:5 http://ppa.launchpad.net/system76/pop/ubuntu focal/main amd64 nvidia-container-toolkit amd64 1.0.5-0pop1~1569270707~20.04~17cd54f [811 kB]
Get:6 http://ppa.launchpad.net/system76/pop/ubuntu focal/main amd64 nvidia-container-runtime amd64 3.1.4-0pop1~1569270714~20.04~2ea45f8 [840 kB]
Fetched 1,807 kB in 1s (3,362 kB/s)               
Selecting previously unselected package libtirpc-common.
(Reading database ... 255334 files and directories currently installed.)
Preparing to unpack .../0-libtirpc-common_1.2.5-1_all.deb ...
Unpacking libtirpc-common (1.2.5-1) ...
Selecting previously unselected package libtirpc3:amd64.
Preparing to unpack .../1-libtirpc3_1.2.5-1_amd64.deb ...
Unpacking libtirpc3:amd64 (1.2.5-1) ...
Selecting previously unselected package libnvidia-container1:amd64.
Preparing to unpack .../2-libnvidia-container1_1.0.6-1pop1~1571281295~20.04~862e228_amd64.deb ...
Unpacking libnvidia-container1:amd64 (1.0.6-1pop1~1571281295~20.04~862e228) ...
Selecting previously unselected package libnvidia-container-tools.
Preparing to unpack .../3-libnvidia-container-tools_1.0.6-1pop1~1571281295~20.04~862e228_amd64.deb ...
Unpacking libnvidia-container-tools (1.0.6-1pop1~1571281295~20.04~862e228) ...
Selecting previously unselected package nvidia-container-toolkit.
Preparing to unpack .../4-nvidia-container-toolkit_1.0.5-0pop1~1569270707~20.04~17cd54f_amd64.deb ...
Unpacking nvidia-container-toolkit (1.0.5-0pop1~1569270707~20.04~17cd54f) ...
Selecting previously unselected package nvidia-container-runtime.
Preparing to unpack .../5-nvidia-container-runtime_3.1.4-0pop1~1569270714~20.04~2ea45f8_amd64.deb ...
Unpacking nvidia-container-runtime (3.1.4-0pop1~1569270714~20.04~2ea45f8) ...
Setting up libtirpc-common (1.2.5-1) ...
Setting up libtirpc3:amd64 (1.2.5-1) ...
Setting up libnvidia-container1:amd64 (1.0.6-1pop1~1571281295~20.04~862e228) ...
Setting up libnvidia-container-tools (1.0.6-1pop1~1571281295~20.04~862e228) ...
Setting up nvidia-container-toolkit (1.0.5-0pop1~1569270707~20.04~17cd54f) ...
Setting up nvidia-container-runtime (3.1.4-0pop1~1569270714~20.04~2ea45f8) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.1) ...
(base) ➜  ~ sudo apt-get install -y nvidia-docker2          
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies.
 nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) but 3.1.4-0pop1~1569270714~20.04~2ea45f8 is to be installed
E: Unable to correct problems, you have held broken packages.

@klueska
Copy link
Contributor

klueska commented Sep 25, 2020

@AlexMikhalev It looks like you have added a ppa repository from system76, which appears to distribute their own builds of the nvidia container stack (independent of the official repos published by NVIDIA). The components they are hosting appear to be a quite a few versions behind the latest.

In addition to that, it appears that they don't actually include nvidia-docker2 as part of their distribution. So when you try and install nvidia-docker2, it pulls the latest from the NVIDIA repos, but then tries to pull the old versions of its dependent components from the system76 ppa repos.

You need to either remove the system76 repo or somehow make the NVIDIA repos higher priority so that it pulls the container stack from them instead of system76.

@sebautistam
Copy link

@AlexMikhalev It looks like you have added a ppa repository from system76, which appears to distribute their own builds of the nvidia container stack (independent of the official repos published by NVIDIA). The components they are hosting appear to be a quite a few versions behind the latest.

In addition to that, it appears that they don't actually include nvidia-docker2 as part of their distribution. So when you try and install nvidia-docker2, it pulls the latest from the NVIDIA repos, but then tries to pull the old versions of its dependent components from the system76 ppa repos.

You need to either remove the system76 repo or somehow make the NVIDIA repos higher priority so that it pulls the container stack from them instead of system76.

Hi! I am having the exact same issue. I am using pop-os (based on Ubuntu 20.04m distributed by system76), and I have previously installed and use nvidia-docker2 in machines with the same OS following the normal ubuntu guide (changing the distribution to match the ubuntu version (ubuntu20.04 in the other machines as well).

I get what you are saying about the system76 repos, even though it is weird it worked fine about 2 months ago. Anyway, I could try to make NVIDIA repos higher priority, but I don't know how to do it. Could you please guide me on this?

Thank you.

@klueska
Copy link
Contributor

klueska commented Sep 25, 2020

@sebautistam I don't know the details of what it looks like in pop-os, but I'm guessing you have some files under /etc/apt/preferences.d/ which are prioritizing the system76 repos over all others.

@sebautistam
Copy link

There is a file in that directory with this inside:

Package: *
Pin: release o=LP-PPA-system76-pop
Pin-Priority: 1001

Package: *
Pin: release o=LP-PPA-system76-proposed
Pin-Priority: 1001

@klueska
Copy link
Contributor

klueska commented Sep 25, 2020

You will need to adjust these (or add more rules for the nvidia repos) according to this:
http://manpages.ubuntu.com/manpages/bionic/man5/apt_preferences.5.html

@sebautistam
Copy link

You will need to adjust these (or add more rules for the nvidia repos) according to this:
http://manpages.ubuntu.com/manpages/bionic/man5/apt_preferences.5.html

I have modified the preferences file and now it looks like this:

`Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1001

Package: *
Pin: release o=LP-PPA-system76-pop
Pin-Priority: 901

Package: *
Pin: release o=LP-PPA-system76-proposed
Pin-Priority: 901`

And the nvidia-docker2 was successful.

Do you know the name of the package to include in the preferences file, just to be more specific and avoid future problems with other packages?

Thank you!

@klueska
Copy link
Contributor

klueska commented Sep 25, 2020

@sebautistam Sorry, I'm not that familiar with these preference files. I just know they've caused problems for people in the past, so I figured it was related here.

@JingL1014
Copy link
Author

I figured out that in my case the problem is the version 1.2.0+ds-0lambda1 of libnvidia-container1 is automatically installed.

sudo apt-cache madison libnvidia-container1
libnvidia-container1 |    1.3.0-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 | 1.2.0+ds-0lambda1 | http://archive.lambdalabs.com/ubuntu focal/main amd64 Packages
libnvidia-container1 |    1.2.0-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.1.1-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.1.0-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.0.7-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.0.5-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.0.4-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.0.3-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.0.2-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.0.1-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 |    1.0.0-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 | 1.0.0~rc.2-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages
libnvidia-container1 | 1.0.0~rc.1-1 | https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  Packages

So I manually installed version 1.3.0-1 and then manually installed other packages using the following commands. Now nvidia-docker is successfully installed. :

sudo apt-get install -y libnvidia-container1=1.3.0-1
sudo apt-get install -y libnvidia-container-tools=1.3.0-1
sudo apt-get install -y nvidia-container-runtime
sudo apt-get install -y nvidia-docker2         

@AlexMikhalev
Copy link

@JingL1014 I followed you advice successfully until the last line:

sudo apt-get install -y nvidia-docker2
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies.
 nvidia-docker2 : Depends: nvidia-container-runtime (>= 3.4.0) but 3.1.4-0pop1~1569270714~20.04~2ea45f8 is to be installed
E: Unable to correct problems, you have held broken packages.

@JingL1014
Copy link
Author

@AlexMikhalev It seems that the nvidia-container-runtime of version 3.1.4 is to be installed, probably you can try to manually install version 3.4.0 by running:
sudo apt-get install -y nvidia-container-runtime=3.4.0

And then run:
sudo apt-get install -y nvidia-docker2

@AlexMikhalev
Copy link

Fixed. The issue was the priority of system76 apt repo over Nvidia.
@klueska there is a good reason to use System76 drivers over stock Nvidia ones - for example, stock Nvidia drivers don't support external monitor/dual monitor configuration on the laptops.
It would be good to be able to pin priority of the relevant packages rather than the whole repo.

@ffahmed
Copy link

ffahmed commented Oct 3, 2020

Hi, @AlexMikhalev , which file did you change the repo priority? /etc/apt/sources.list ? I could not find system76 entry there

@ffahmed
Copy link

ffahmed commented Oct 3, 2020

I tried to install nvidia-container-runtime for same issue. But it asked me to install nvidia-container-toolkit. I tried to install and it said it's already installed. But it did not let me install nvidia-container-runtime again.
See below. Can anyone share any solution?

(base) JJteam@lambda-quad:~$ sudo apt-get install -y nvidia-container-runtime
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
nvidia-container-runtime : Depends: nvidia-container-toolkit (>= 1.3.0) but 1.2.0+ds-0lambda0.18.04.1 is to be installed
E: Unable to correct problems, you have held broken packages.
(base) JJteam@lambda-quad:$ sudo apt install nvidia-container-toolkit
Reading package lists... Done
Building dependency tree
Reading state information... Done
nvidia-container-toolkit is already the newest version (1.2.0+ds-0lambda0.18.04.1).
0 upgraded, 0 newly installed, 0 to remove and 6 not upgraded.
(base) JJteam@lambda-quad:
$ sudo apt install nvidia-container-runtime
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
nvidia-container-runtime : Depends: nvidia-container-toolkit (>= 1.3.0) but 1.2.0+ds-0lambda0.18.04.1 is to be installed
E: Unable to correct problems, you have held broken packages.

@AlexMikhalev
Copy link

@ffahmed
I changed priorities in two files:

cat /etc/apt/preferences.d/pop-default-settings 
Package: *
Pin: release o=LP-PPA-system76-pop
Pin-Priority: 901

Package: *
Pin: release o=LP-PPA-system76-proposed
Pin-Priority: 901

and

cat /etc/apt/preferences.d/nvidia-default 
Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1001

@ffahmed
Copy link

ffahmed commented Oct 4, 2020

Thanks @AlexMikhalev for prompt reply!

For me I don't have those files in /etc/apt/preferences.d/
The only file I have is cuda-repository-pin-600 and here is the content.


cat /etc/apt/preferences.d/cuda-repository-pin-600
Package: nsight-compute
Pin: origin ubuntu.com
Pin-Priority: -1

Package: nsight-systems
Pin: origin ubuntu.com
Pin-Priority: -1

Package: *
Pin: release l=NVIDIA CUDA
Pin-Priority: 600

Also for me i think issue is a little different that yours. You were able to install nvidia-container-runtime, but when I try to install it it shows dependency on nvidia-container-toolkit. I tried to install nvidia-container-toolkit and it says it's already installed. So I am kind of stuck here. See below the command and output.
I am kind of stuck here, cannot go to older nvidia-docker (1.0) version too. Any help is much appreciated! @AlexMikhalev @klueska.

sudo apt-get install -y nvidia-container-runtime
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
nvidia-container-runtime : Depends: nvidia-container-toolkit (>= 1.3.0) but 1.2.0+ds-0lambda0.18.04.1 is to be installed
E: Unable to correct problems, you have held broken packages.
(base) JJteam@lambda-quad:~$ sudo apt-get install -y nvidia-container-toolkit
Reading package lists... Done
Building dependency tree
Reading state information... Done
nvidia-container-toolkit is already the newest version (1.2.0+ds-0lambda0.18.04.1).
0 upgraded, 0 newly installed, 0 to remove and 6 not upgraded.

@klueska
Copy link
Contributor

klueska commented Oct 5, 2020

@ffahmed What repo is your version 1.2.0+ds-0lambda0.18.04.1 coming from. You need to figure out where that is coming from and why it has higher priority than the ones from the official NVIDIA repos.

@ffahmed
Copy link

ffahmed commented Oct 5, 2020

Thanks @klueska @AlexMikhalev both of you.
I had to create preference for nvidia-docker.

vi /etc/apt/preferences.d/nvidia-docker-pin-1002
with content;
Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1002

Then I didn't need to install nvidia-container-toolkit. I was able to directly do
sudo apt-get install -y nvidia-docker2

@abhidipbhattacharyya
Copy link

Thanks @klueska @AlexMikhalev both of you.
I had to create preference for nvidia-docker.

vi /etc/apt/preferences.d/nvidia-docker-pin-1002
with content;
Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1002

Then I didn't need to install nvidia-container-toolkit. I was able to directly do
sudo apt-get install -y nvidia-docker2

@ffahmed thank you. Your solution worked for me. You saved my day.

@imSrbh
Copy link

imSrbh commented Jan 8, 2022

Thanks.. @klueska @AlexMikhalev @ffahmed
I was facing the same..resolved within no time.

@abhidipbhattacharyya
Copy link

For me the following worked-
vi /etc/apt/preferences.d/nvidia-docker-pin-1002
I added the content-

Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1002

then I ran-
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker

@sandman
Copy link

sandman commented Aug 24, 2022

This does not solve my issue! I get the same exact error as before:

docker run --gpus 0 -it --shm-size=1024m -e SIZEW=1920 -e SIZEH=1080 -e PASSWD=mypasswd -e BASIC_AUTH_PASSWORD=mypasswd -e NOVNC_ENABLE=true -p 6080:8080 nvidia-egl-desktop-ros2:foxy
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

I am on PopOS 22.04 and trying to run this image: https://github.com/atinfinity/nvidia-egl-desktop-ros2/blob/main/foxy/Dockerfile

@elezar
Copy link
Member

elezar commented Aug 25, 2022

@sandman can you check the version of the NVIDIA Container CLI that you are using: nvidia-container-cli --version?

It may be that you are not using a version that supports cgroupv2.

@sandman
Copy link

sandman commented Aug 25, 2022

@elezar My version is:

cli-version: 1.8.0
lib-version: 1.8.0
build date: 2022-02-07T17:42+00:00
build revision: 
build compiler: x86_64-linux-gnu-gcc-11 11.2.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -fPIC -Wdate-time -D_FORTIFY_SOURCE=2 -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -fPIC -g -O2 -ffile-prefix-map=/build/libnvidia-container-QpPgXl/libnvidia-container-1.8.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -I/usr/include/tirpc -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro

Does this support cgroupv2?

@elezar
Copy link
Member

elezar commented Aug 25, 2022

There was a cgroup-related fix released in v1.8.1 so updating to at least that version (or 1.10.0) is recommended.

@sandman
Copy link

sandman commented Aug 29, 2022

Thanks @elezar ! I got it to work by installing 1.10.0

@jbartolozzi
Copy link

How did you manually instal 1.10.0? @sandman

@sandman
Copy link

sandman commented Aug 30, 2022

@jbartolozzi I have created a gist with the steps: https://gist.github.com/sandman/3777b07f69e117aa8bf1adede26a4e36

@adwaykanhere
Copy link

@elezar I'm still having the same issue on Pop-os. I have cli-version = 1.11.0/ and I have modified the default settings in /etc/apt/preferences.d/pop-default-settings as @sandman suggested.

@luuhaa
Copy link

luuhaa commented Oct 21, 2022

@elezar I'm still having the same issue on Pop-os. I have cli-version = 1.11.0/ and I have modified the default settings in /etc/apt/preferences.d/pop-default-settings as @sandman suggested.

Same here, any solution please

@fgoodwin
Copy link

fgoodwin commented Nov 8, 2022

@elezar I'm still having the same issue on Pop-os. I have cli-version = 1.11.0/ and I have modified the default settings in /etc/apt/preferences.d/pop-default-settings as @sandman suggested.

Ditto Pop-os 22.04:
sudo docker run --rm --gpus all nvidia/cuda:9.0-base nvidia-smi
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

@sandman
Copy link

sandman commented Nov 8, 2022

@adwaykanhere @intrainepha My Gist applies for 1.10.0 (for host PopOs 22.04 and Container running Ubuntu 20.04). I did not test 1.11.0.

@Winand
Copy link

Winand commented Nov 19, 2022

@fgoodwin @intrainepha @sandman works with 1.11.0. I've removed nvidia-docker2 and its dependencies and then reinstalled again.

  • /etc/apt/preferences.d/nvidia-docker-pin-1002:
Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1002
  • sudo apt remove nvidia-docker2
  • Remove libnvidia-container-tools libnvidia-container1 nvidia-container-toolkit nvidia-container-toolkit-base packages
    • sudo apt autoremove
    • If autoremove doesn't help try to remove them manually as @polydaks suggests sudo apt remove libnvidia-container-tools libnvidia-container1 nvidia-container-toolkit nvidia-container-toolkit-base
  • sudo apt install nvidia-docker2
  • sudo systemctl restart docker

@julianschoep
Copy link

julianschoep commented Nov 30, 2022

@Winand this still does not work for me..

> nvidia-container-cli --version
cli-version: 1.11.0
lib-version: 1.11.0
build date: 2022-09-18T23:16+00:00
build revision: 
build compiler: x86_64-linux-gnu-gcc-11 11.2.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -fPIC -Wdate-time -D_FORTIFY_SOURCE=2 -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -fPIC -g -O2 -ffile-prefix-map=/build/libnvidia-container-CeXONE/libnvidia-container-1.11.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -I/usr/include/tirpc -Wl,-zre

And installed:

Setting up nvidia-container-toolkit-base (1.11.0-0pop1~1663593585~22.04~5b13c4c) ...
Setting up libnvidia-container-tools (1.11.0-0pop1~1663542983~22.04~fbd1818) ...
Setting up nvidia-container-toolkit (1.11.0-0pop1~1663593585~22.04~5b13c4c) ...
Setting up nvidia-docker2 (2.11.0-1~1663542535~22.04~0f7519f) ...

Getting the error

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: 
unable to start container process: error during container init: error running hook #0: error running hook: 
exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

@mathisc
Copy link

mathisc commented Nov 30, 2022

@Winand the fix you mentionned does not work for me either (cli-version==1.11.0 on PopOS 22.04).
As a "temporary" (dirty?) workaround passing GPU devices manually works:
no-cgroups = true in /etc/nvidia-container-runtime/config.toml
and then running with :
docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

@Winand
Copy link

Winand commented Nov 30, 2022

@mathisc you have docker-ce or docker desktop?

@mathisc
Copy link

mathisc commented Dec 1, 2022

@Winand I don't have Docker desktop installed so I would guess I am using docker-ce installed via this procedure : https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository
Here is the docker version output :

(base) ➜  ~ docker version                            
Client: Docker Engine - Community
 Version:           20.10.21
 API version:       1.41
 Go version:        go1.18.7
 Git commit:        baeda1f
 Built:             Tue Oct 25 18:01:58 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.21
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.7
  Git commit:       3056208
  Built:            Tue Oct 25 17:59:49 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.10
  GitCommit:        770bd0108c32f3fb5c73ae1264f7e503fe7b2661
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

@julianschoep
Copy link

@mathisc your "dirty" fixed worked for me, thanks

@dubovikmaster
Copy link

@fgoodwin @intrainepha @sandmanработает с 1.11.0. Я удалил nvidia-docker2 и его зависимости, а затем снова переустановил.

  • /etc/apt/preferences.d/nvidia-docker-pin-1002:
Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1002
  • sudo apt remove nvidia-docker2
  • sudo apt autoremoveудалить также libnvidia-container-tools libnvidia-container1 nvidia-container-toolkit nvidia-container-toolkit-base
  • sudo apt install nvidia-docker2
  • sudo systemctl restart docker

Thanks! this is the only one that worked for me

@abhinand5
Copy link

abhinand5 commented Dec 13, 2022

@Winand the fix you mentionned does not work for me either (cli-version==1.11.0 on PopOS 22.04). As a "temporary" (dirty?) workaround passing GPU devices manually works: no-cgroups = true in /etc/nvidia-container-runtime/config.toml and then running with : docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

@mathisc Your quick and dirty fix is the only one that worked for me!

@vngabriel
Copy link

@Winand the fix you mentionned does not work for me either (cli-version==1.11.0 on PopOS 22.04). As a "temporary" (dirty?) workaround passing GPU devices manually works: no-cgroups = true in /etc/nvidia-container-runtime/config.toml and then running with : docker run --rm --gpus all --privileged -v /dev:/dev nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

this workaround was the only one that worked for me, is there any official solution about this issue?

@viajeradelaluz
Copy link

viajeradelaluz commented Jan 13, 2023

I'm still stuck on this problem.

  • First, I tried the @Winand solition to reinstall nvidia-docker2 after changing the priority to Pin: origin nvidia.github.io, but it didn't work,
  • Then, the @mathisc "dirty" solution didn't work for me either.
  • I also tried with Tensorman (utility developed by Pop OS to run the Nvidia GPU with Tensoflow) but got the same docker error:
docker: Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

PopOS: 22.04
Nvidia Driver: 525.60.111
CUDA version: 12.0
Docker version: 20.10.12

Any new idea that may help?

@GdMacmillan
Copy link

I'm still stuck on this problem.

* First, I tried the @Winand solition to reinstall `nvidia-docker2` after changing the priority to `Pin: origin nvidia.github.io`, but it didn't work,

* Then, the @mathisc "dirty" solution didn't work for me either.

* I also tried with [Tensorman](https://support.system76.com/articles/tensorman/) (utility developed by Pop OS to run the Nvidia GPU with Tensoflow) but got the same docker error:
docker: Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

PopOS: 22.04 Nvidia Driver: 525.60.111 CUDA version: 12.0 Docker version: 20.10.12

Any new idea that may help?

I have the same exact system and am experiencing this problem as well.

@GdMacmillan
Copy link

Followup: I was able to get the following output after overwriting the sources.list again from the installation guide with:

 distribution=$(. /etc/os-release;echo ubuntu22.04) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container.list | \
         sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
         sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

(notice the hardcoded ubuntu22.04)

Then I installed nvidia-docker2 and restarted docker

sudo apt-get install nvidia-docker2

Successful Output:

gmacmillan@pop-os:~$ sudo docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu20.04 nvidia-smi
Fri Jan 20 22:21:17 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.78.01    Driver Version: 525.78.01    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:2F:00.0  On |                  Off |
|  0%   40C    P8    13W / 450W |    509MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

@polydaks
Copy link

@fgoodwin @intrainepha @sandman works with 1.11.0. I've removed nvidia-docker2 and its dependencies and then reinstalled again.

  • /etc/apt/preferences.d/nvidia-docker-pin-1002:
Package: *
Pin: origin nvidia.github.io
Pin-Priority: 1002
  • sudo apt remove nvidia-docker2
  • sudo apt autoremove to remove also libnvidia-container-tools libnvidia-container1 nvidia-container-toolkit nvidia-container-toolkit-base
  • sudo apt install nvidia-docker2
  • sudo systemctl restart docker

After multiple frustrating attempts to follow this advice, I realised the issue was sudo apt autoremove did not actually remove the offending libraries. Once I manually removed them with sudo apt remove libnvidia-container-tools libnvidia-container1 nvidia-container-toolkit nvidia-container-toolkit-base, all was good in the world. :)

@criadoperez
Copy link

For anybody out there using a TUXEDO with TUXEDO OS , I fixed mine by simply adding the nvidia libnvidia container repo and doing an apt update && apt upgrade . Note that it's without installing anything new. Just upgrading using the new repository.

So with one command:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \
  && \
    sudo apt-get update && sudo apt upgrade -y

@elezar
Copy link
Member

elezar commented Dec 6, 2023

Thanks @criadoperez. As you pointed out, please refer to the updated installation documenation

Where nvidia-container-toolkit is the top-level package.

Please create an issue against https://github.com/NVIDIA/nvidia-container-toolkit if there are still problems.

@elezar elezar closed this as completed Dec 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests