What's the difference between the lastest nvidia-docker and nvidia container runtime？ #1268

gkd2020 · 2020-05-06T17:13:24Z

I saw an interesting blog.
https://collabnix.com/introducing-new-docker-cli-api-support-for-nvidia-gpus-under-docker-engine-19-03-0-beta-release/
After installing docker19.03, the blogger installed the docker container runtime instead of
nvidia-docker. I feel very confused.
Can you tell me the difference and connection between them?

gkd2020 · 2020-05-06T17:32:58Z

I also find another project, it called nvidia container toolkit.
I really confused now，which project should I choose when I create gpu environment？
@RenaudWasTaken

klueska · 2020-05-22T13:33:43Z

I originally posted a similar answer here, but hopefully this clears things up:
NVIDIA/k8s-device-plugin#168 (comment)

The set of packages collectively referred to as nvidia-docker consists of the following components (and their dependencies from top to bottom):

nvidia-docker2
nvidia-container-runtime
nvidia-container-toolkit
libnvidia-container

Unfortunately, the documentation across the repos that host code for these projects is inconsistent and misleading at times.

Some places say that nvidia-docker2 should no longer be installed for docker versions 19.03+ because it is deprecated, and you should install nvidia-container-toolkit instead. Other places say that nvidia-docker2 is required (even for docker versions 19.03+) if you plan on running Kubernetes on top of docker.

While both statements are technically true, I can see why things might be a little confusing.

Starting from the bottom, below is a description of what each of these components is responsible for:

libnvidia-container:
This package does the heavy-lifting of making sure that a container is set up to run with NVIDIA GPU support. It is designed to be container-runtime agnostic and provides a well-defined API and a wrapper CLI that different runtimes can invoke to inject NVIDIA GPU support into their containers.

nvidia-container-toolkit:
This package includes a script that implements the interface required by a runC prestart hook. This script is invoked by runC after a container has been created, but before it has been started, and is given access to the config.json associated with the container (e.g. this config.json). It then takes information contained in the config.json and uses it to invoke the libnvidia-container CLI with an appropriate set of flags. One of the most important flags being which specific GPU devices should be injected into the container.

nvidia-container-runtime:
This package used to be a complete fork of runC with NVIDIA specific code injected into it. Nowadays, it is a thin wrapper around the native runC installed on your machine. All it does is take a runC spec as input, inject the nvidia-container-toolkit script as a prestart hook into it, and then call out to the native runC, passing it the modified runC spec with that hook set. It's important to note that this package is not necessarily specific to docker (but it is specific to runC).

nvidia-docker2:
This package is the only docker-specific package of any of them. It takes the script associated with the nvidia-container-runtime and installs it into docker's /etc/docker/daemon.json file for you. This then allows you to run (for example) docker run --runtime=nvidia ... to automatically add GPU support to your containers. It also installs a wrapper script around the native docker CLI called nvidia-docker which lets you invoke docker without needing to specify --runtime=nvidia every single time. It also lets you set an environment variable on the host (NV_GPU) to specify which GPUs should be injected into a container.

Given this hierarchy of components it's easy to see that if you only install nvidia-container-toolkit (which is recommended for Docker 19.03+), then you will not get nvidia-container-runtime installed as part of it, and thus --runtime=nvidia will not be available to you. This is OK for Docker 19.03+ because it calls directly out to nvidia-container-toolkit when you pass it the --gpus option instead of relying on the nvidia-container-runtime as a proxy.

However, if you want to use Kubernetes with Docker 19.03, you actually need to continue using nvidia-docker2 because Kubernetes doesn't support passing GPU information down to docker through the --gpus flag yet. It still relies on the nvidia-container-runtime to pass GPU information down the stack via a set of environment variables.

So you are basically running on the exact same stack as you would be whether you install nvidia-docker2 or nvidia-container-toolkit, except that nvidia-docker2 will install a thin runtime that can proxy GPU information down to nvidia-container-toolkit via environment variables instead of relying on the --gpus flag to have Docker do it directly.

loophole64 · 2020-10-07T19:58:14Z

@klueska
Thank you very much! This cleared up a lot of confusion for me about the different packages.

rtrobin · 2022-03-29T04:07:09Z

@klueska Thank you so much for the clarification. I'm glad I run into this post. Can you please add this into nvidia official doc? For example, this doc site?

elezar · 2022-03-29T05:29:23Z

@rtrobin is your suggestion to update the content here https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/arch-overview.html#arch-overview with @klueska's descriptions above?

rtrobin · 2022-03-29T07:25:02Z

Basically, the doc(https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/arch-overview.html) shown now is same as @klueska's post above. But the description here is more comprehensible to me, with different docker version and different docker parameters described.

By the way, the driver containers overview doc(https://docs.nvidia.com/datacenter/cloud-native/driver-containers/overview.html) is an independent tab. Maybe it could be part of architecture overview content(https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/arch-overview.html), as a different way to utilize GPU?

pejrich · 2023-02-06T02:24:18Z

Hey Nvidia, I think you could be accused of a lot of things, but one thing I will NEVER accuse you of, is making developers lives easier.

klueska closed this as completed May 22, 2020

This was referenced May 22, 2020

Nvidia runtime fails to run any container #1252

Closed

Trying to get nvidia-container-runtime on other distribution to work NVIDIA/nvidia-container-runtime#101

Closed

This was referenced Aug 28, 2020

What is the most recent stable beta and what do your tags mean? NVIDIA/k8s-device-plugin#193

Closed

suse tumbleweed & nvidia-container-toolkit & could not select device driver "" #1377

Closed

klueska mentioned this issue Sep 23, 2020

Which one should I use, nvidia-docker, nvidia-container-toolkit, or nvidia-container-runtime? #1387

Closed

klueska mentioned this issue Oct 5, 2020

Understanding the relationship between nvidia-docker-toolkit and nvidia-docker-runtime #1035

Closed

klueska mentioned this issue Nov 8, 2020

What's the differences among nvidia-container-runtime, nvidia-docker and nvidia-container-toolkit? #1408

Closed

bwinsto2 mentioned this issue Nov 20, 2020

nvidia-container-cli: detection error: open failed: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.418.39: no such file or directory NVIDIA/nvidia-container-toolkit#297

Open

klueska mentioned this issue Mar 8, 2021

Docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]] #1470

Closed

klueska mentioned this issue Mar 17, 2021

Official documentation missmatch with last README.md change #1474

Closed

captn3m0 mentioned this issue May 22, 2021

Support runtime parameter for docker_container kreuzwerker/terraform-provider-docker#85

Closed

marcusvinicius178 mentioned this issue Jan 24, 2022

libGL error: No matching fbConfigs or visuals found, failed to load driver: swrast AuroAi/carla_apollo_bridge#30

Closed

klueska mentioned this issue Jan 28, 2022

Ubuntu 20.04 - Issues installing nvidia-docker2 #1594

Closed

rokopi-byte mentioned this issue Nov 24, 2022

cannot get nvidia-docker2 #1698

Closed

NVIDIA locked as resolved and limited conversation to collaborators Feb 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the difference between the lastest nvidia-docker and nvidia container runtime？ #1268

What's the difference between the lastest nvidia-docker and nvidia container runtime？ #1268

gkd2020 commented May 6, 2020

gkd2020 commented May 6, 2020

klueska commented May 22, 2020 •

edited

Loading

loophole64 commented Oct 7, 2020

rtrobin commented Mar 29, 2022

elezar commented Mar 29, 2022

rtrobin commented Mar 29, 2022

pejrich commented Feb 6, 2023

What's the difference between the lastest nvidia-docker and nvidia container runtime？ #1268

What's the difference between the lastest nvidia-docker and nvidia container runtime？ #1268

Comments

gkd2020 commented May 6, 2020

gkd2020 commented May 6, 2020

klueska commented May 22, 2020 • edited Loading

loophole64 commented Oct 7, 2020

rtrobin commented Mar 29, 2022

elezar commented Mar 29, 2022

rtrobin commented Mar 29, 2022

pejrich commented Feb 6, 2023

klueska commented May 22, 2020 •

edited

Loading