You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 30, 2020. It is now read-only.
Warning Failed 19s (x3 over 46s) kubelet, 9.114.75.219 Error: could not create container: could not spawn container: could not create oci bundle: could not generate oci spec for container: could not configure devices: could not get device: not a device node
What happens?
I have two worker nodes: 1 running Docker and 1 running Singularity. Submitting this YAML to the Docker side (adjusted to pull a Docker image) works fine with the HCA devices appearing in the container. With the Singularity worker, the job is rejected.
If I comment out the rdma/hca: 1 line under limits: then it runs, but without /dev/infiniband mounted.
What were you expecting to happen?
The container is started with /dev/infiniband mounted inside the container. Allows for direct access to the InfiniBand network device(s).
Any logs, error output, comments, etc?
Looking at the systemd journal (journalctl -f) while sycri was running (in debug mode) I captured this context for the error message.
If I had to guess I would say it is the devices section that sycri is stumbling upon. The /dev/nvidia0 is a character special file (which it mounts fine), but /dev/infiniband is a directory of devices (which it stumbles on):
What are the steps to reproduce this issue?
kubectl create -f ./tiny.yaml
(see tiny-ib.yaml )What happens?
I have two worker nodes: 1 running Docker and 1 running Singularity. Submitting this YAML to the Docker side (adjusted to pull a Docker image) works fine with the HCA devices appearing in the container. With the Singularity worker, the job is rejected.
If I comment out the
rdma/hca: 1
line underlimits:
then it runs, but without/dev/infiniband
mounted.What were you expecting to happen?
The container is started with
/dev/infiniband
mounted inside the container. Allows for direct access to the InfiniBand network device(s).Any logs, error output, comments, etc?
Looking at the systemd journal (
journalctl -f
) whilesycri
was running (in debug mode) I captured this context for the error message.If I had to guess I would say it is the devices section that sycri is stumbling upon. The
/dev/nvidia0
is a character special file (which it mounts fine), but/dev/infiniband
is a directory of devices (which it stumbles on):Snippet (expand for full log):
Environment?
This is an IBM Cloud Private 3.1.2 environment running Kubernetes v1.12.4.
OS distribution and version: RHEL 7.6 (ppc64le)
go version:
1.11.5
go env:
Singularity-CRI version:
v1.0.0-beta.5
Singularity version:
3.2.1-1.el7
Kubernetes version:
v1.12.4+icp-ee
The text was updated successfully, but these errors were encountered: