Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

libnvidia-container-tools (>= 1.2.0) is required? #1355

Closed
KinWaiCheuk opened this issue Jul 24, 2020 · 3 comments
Closed

libnvidia-container-tools (>= 1.2.0) is required? #1355

KinWaiCheuk opened this issue Jul 24, 2020 · 3 comments

Comments

@KinWaiCheuk
Copy link

1. Issue or feature description

I am trying to install nvidia-docker via sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit, but it gives the following error

The following packages have unmet dependencies: nvidia-container-toolkit : Depends: libnvidia-container-tools (>= 1.2.0) but 1.0.1-1 is to be installed E: Unable to correct problems, you have held broken packages.

But libnvidia-container-tools=1.2.0 seems to be unavailable at the moment

2. Steps to reproduce the issue

When doing sudo apt install libnvidia-container-tools, it shows

Reading package lists... Done Building dependency tree Reading state information... Done libnvidia-container-tools is already the newest version (1.0.1-1). 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Then when I force install version 1.2.0 via sudo apt install libnvidia-container-tools=1.2.0, it shows

Reading package lists... Done Building dependency tree Reading state information... Done E: Version '1.2.0' for 'libnvidia-container-tools' was not found

3. Information to attach (optional if deemed irrelevant)

nvidia-smi in the base system shows

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64       Driver Version: 430.64       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-DGXS...  Off  | 00000000:07:00.0  On |                    0 |
| N/A   40C    P0    39W / 300W |    355MiB / 32505MiB |     16%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-DGXS...  Off  | 00000000:08:00.0 Off |                    0 |
| N/A   39C    P0    39W / 300W |      0MiB / 32508MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-DGXS...  Off  | 00000000:0E:00.0 Off |                    0 |
| N/A   38C    P0    40W / 300W |      0MiB / 32508MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-DGXS...  Off  | 00000000:0F:00.0 Off |                    0 |
| N/A   39C    P0    38W / 300W |      0MiB / 32508MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1585      G   /usr/lib/xorg/Xorg                           179MiB |
|    0      2994      G   compiz                                       163MiB |
|    0     11416      G   /usr/lib/firefox/firefox                      10MiB |
+-----------------------------------------------------------------------------+

nvcc --version in the base system shows

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

dpkg -l | grep -i docker in the base system shows

ii  dgx-docker-cleanup                         1.0-1                                           amd64        DGX Docker cleanup script
rc  dgx-docker-options                         1.0-7                                           amd64        DGX docker daemon options
ii  dgx-docker-repo                            1.0-1                                           amd64        docker repository configuration file
ii  docker-ce                                  5:19.03.12~3-0~ubuntu-xenial                    amd64        Docker: the open-source application container engine
ii  docker-ce-cli                              5:19.03.12~3-0~ubuntu-xenial                    amd64        Docker CLI: the open-source application container engine
ii  nvidia-container-runtime                   2.0.0+docker18.09.2-1                           amd64        NVIDIA container runtime

docker version shows

Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:45:49 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:44:20 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.2
  GitCommit:        9754871865f7fe2f4e74d43e2fc7ccd237edcbce
 runc:
  Version:          1.0.0-rc6+dev
  GitCommit:        09c8266bf2fcf9519a651b04ae54c967b9ab86ec
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

lsb_release -a shows

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.6 LTS
Release:	16.04
Codename:	xenial
@klueska
Copy link
Contributor

klueska commented Jul 24, 2020

Version 1.2.0 definitely exists where it is supposed to:
https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/stable/ubuntu16.04/amd64

Can you double check there is nothing weird in your setup under /etc/apt/sources.list.d/.

@klueska
Copy link
Contributor

klueska commented Jul 24, 2020

One thing to keep in mind if you are on a DGX system (which I just noticed it looks like you are). DGX systems have a dgx.list file under /etc/apt/sources.list.d/.

The DGX repos listed in this file are pinned to a higher priority than the "normal" nvidia repos so that people end up using the DGX SW stack validated by NVIDIA's internal QA.

If you want the latest and greatest nvidia-container-toolkit, then you'll either need to:

  1. Deprioritize the DGX repos by editing the "/etc/apt/preferences.d/nvidia", and then change the Pin-Priority to 500; OR
  2. Disable the DGX repos by commenting out the various "/etc/apt/source.list.d/" files that reference international.download.nvidia.com

@KinWaiCheuk
Copy link
Author

One thing to keep in mind if you are on a DGX system (which I just noticed it looks like you are). DGX systems have a dgx.list file under /etc/apt/sources.list.d/.

I see, that is indeed the problem, thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants