-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DeviceRequests to HostConfig to support NVIDIA GPUs #38828
Conversation
Some linting failures;
|
Why are we doing this instead of the PR from Nvidia? |
@cpuguy83 Because there's nothing specific about GPUs, it's just devices with prestart hooks. It's akin to Kubernetes device plugins. On the CLI however, it's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single small comment otherwise ship it!
Driver string // Name of device driver | ||
Count int // Number of devices to request (-1 = All) | ||
DeviceIDs []string // List of device IDs as recognizable by the device driver | ||
Capabilities [][]string // An OR list of AND lists of device capabilities (e.g. "gpu") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were discussing Capabilities
as a name for this (as it could be confused for Capabilities
on the container itself (i.e. Linux capabilities)), but I can't come up with good alternatives; perhaps Features
, but not sure if that's a good match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it but on the other hand, it's literally a list of what the device is capable of doing, what capabilities it provides. In this case it provides "gpu" capability, as well as nvidia-specific capabilities like "compute", etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only other names I can think of is "requirements" or "constraints", but I'm unsure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only matches if all of these are matched, correct? "constraints" could work, but possibly too generic? idk. Naming is really hard on this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
design LGTM
6904937
to
0e85958
Compare
Codecov Report
@@ Coverage Diff @@
## master #38828 +/- ##
=========================================
Coverage ? 36.41%
=========================================
Files ? 617
Lines ? 45950
Branches ? 0
=========================================
Hits ? 16732
Misses ? 26929
Partials ? 2289 |
Updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments inline, and saw that containerd/containerd#3093 was merged (so we can use the exported list)
Also could you add code to ignore the new field on older API versions on container-create?
moby/api/server/router/container/container_routes.go
Lines 468 to 485 in ca0b64e
if hostConfig != nil && versions.LessThan(version, "1.40") { | |
// Ignore BindOptions.NonRecursive because it was added in API 1.40. | |
for _, m := range hostConfig.Mounts { | |
if bo := m.BindOptions; bo != nil { | |
bo.NonRecursive = false | |
} | |
} | |
// Ignore KernelMemoryTCP because it was added in API 1.40. | |
hostConfig.KernelMemoryTCP = 0 | |
// Ignore Capabilities because it was added in API 1.40. | |
hostConfig.Capabilities = nil | |
// Older clients (API < 1.40) expects the default to be shareable, make them happy | |
if hostConfig.IpcMode.IsEmpty() { | |
hostConfig.IpcMode = container.IpcMode("shareable") | |
} | |
} |
func (daemon *Daemon) handleDevice(req container.DeviceRequest, spec *specs.Spec) error { | ||
if req.Driver == "" { | ||
for _, dd := range deviceDrivers { | ||
if selected := dd.capset.Match(req.Capabilities); selected != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'm wondering: here, we match capabilities against the driver. So if a machine has (e.g.) two GPUs, and one of them supports "capA" and one of them "capB", then the driver would register itself with all of those (so driver says: "I provide capA and capB") correct?
This could result in a situation where none of the GPUs support the requested list of capabilities, i.e.;
Request | GPU-A | GPU-B | Driver | Driver Match | GPU Match |
---|---|---|---|---|---|
"capA,capB" | "capA, capC" | "capB, capC" | "capA,capB,capC" | ✅ | ❌ |
What would happen in that case? (i.e., conversion to OCI succeeds, hook is registered, but no GPU is found)? Will a proper error be produced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could make it an OR list of ANDs as well instead of a map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should if this is a concern, so in that case the driver would report itself as;
{
"capabilities": [
["capA", "capB"],
["capB", "capC"]
]
}
Could even decide to make it just return a list of capabilities for each GPU (then we can even determine the number of GPUs available);
{
"capabilities": [
["capA", "capB"],
["capA", "capB"],
["capA", "capB"],
["capA", "capB"],
["capB", "capC"]
]
}
But perhaps that breaks the abstraction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we punt on the problem since the problem is extremely unlikely to happen at this time and the structure is that of the device driver so it's internal, we can change it. The API needs to be locked down.
items: | ||
type: "string" | ||
example: | ||
# gpu AND nvidia AND compute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an example for OR
here?
# gpu AND nvidia AND compute | |
# gpu AND nvidia AND compute, OR gpu AND intel | |
- ["gpu", "nvidia", "compute"] | |
- ["gpu", "intel"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's fine, the reason I put it there is so that we can support it in the future without breaking the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't want to support OR
yet; we should error out if len(capabilities) > 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that it is supported, but not from the CLI.
7100ebf
to
accc53a
Compare
@thaJeztah thanks for your review, I updated again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
ping @cpuguy83 @kolyshkin ptal |
docs/api/version-history.md
Outdated
@@ -49,6 +49,8 @@ keywords: "API, Docker, rcli, REST, documentation" | |||
* `GET /info` now returns information about `DataPathPort` that is currently used in swarm | |||
* `GET /info` now returns `PidsLimit` boolean to indicate if the host kernel has | |||
PID limit support enabled. | |||
* `GET /containers/create` now accepts `DeviceRequests` as part of `HostConfig`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, erm,
* `GET /containers/create` now accepts `DeviceRequests` as part of `HostConfig`. | |
* `POST /containers/create` now accepts `DeviceRequests` as part of `HostConfig`. | |
* `GET /containers/{id}/json` now returns `DeviceRequests` as part of `HostConfig`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
facepalm
@tiborvass vendoring is failing;
|
This patch hard-codes support for NVIDIA GPUs. In a future patch it should move out into its own Device Plugin. Signed-off-by: Tibor Vass <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🐯
But @tiborvass, we need to create issues related to the TODO's in there (to track future work)
I'm merging since I see those windows errors are unrelated, they are also on other PRs. |
This patch hard-codes support for NVIDIA GPUs.
In a future patch it should move out into its own Device Plugin.
Signed-off-by: Tibor Vass [email protected]
Closes #37434 #37504
The CLI part is at docker/cli#1714
Notes
I tried to keep the API generic enough for devices other than GPU for the future.
In addition to "options", there's this generic notion of "capabilities" that any device can advertise a set of, and then there's a matching happening when requesting devices. This should help with @thaJeztah and @tonistiigi's concerns about mixing "constraints" and "settings" (aka "options"). This is going to be even more important for orchestration and secure setups like in buildkit, where you'll have a separation between device requesters and providers.
For instance, any GPU vendor should add the "gpu" device capability to its docker device driver. They can also advertise other caps which can be used for matching the driver (and in the future, the node in a cluster).
Currently, the nvidia driver doesn't do anything with "options", but in the future it could decide to limit how much GPU memory should be used for instance. It's all the settings that would not be used for scheduling. Device capabilities include NVIDIA capabilities (compute, utility, etc.).
I'm happy to bikeshed on names, but would love to get review on the API itself.
cc @RenaudWasTaken @cpuguy83 @crosbymichael