Skip to content
This repository has been archived by the owner on Dec 30, 2020. It is now read-only.

Containers using the Mellanox IB device plugin are rejected #352

Closed
jjhursey opened this issue Aug 12, 2019 · 4 comments · Fixed by #353
Closed

Containers using the Mellanox IB device plugin are rejected #352

jjhursey opened this issue Aug 12, 2019 · 4 comments · Fixed by #353
Assignees
Labels
bug Something isn't working

Comments

@jjhursey
Copy link

What are the steps to reproduce this issue?

  1. Using a cluster with Infiniband (MOFED 4.5-2). Install the Kubernetes HCA device plugin from Mellanox.
kubectl create -f https://cdn.rawgit.com/Mellanox/k8s-rdma-sriov-dev-plugin/7b27f8cf/example/hca/rdma-hca-node-config.yml
kubectl create -f https://raw.githubusercontent.com/Mellanox/k8s-rdma-sriov-dev-plugin/master/example/device-plugin.yaml
  1. kubectl create -f ./tiny.yaml (see tiny-ib.yaml )
  2. The job will be rejected with the error:
  Warning  Failed     19s (x3 over 46s)  kubelet, 9.114.75.219  Error: could not create container: could not spawn container: could not create oci bundle: could not generate oci spec for container: could not configure devices: could not get device: not a device node

What happens?

I have two worker nodes: 1 running Docker and 1 running Singularity. Submitting this YAML to the Docker side (adjusted to pull a Docker image) works fine with the HCA devices appearing in the container. With the Singularity worker, the job is rejected.

If I comment out the rdma/hca: 1 line under limits: then it runs, but without /dev/infiniband mounted.

What were you expecting to happen?

The container is started with /dev/infiniband mounted inside the container. Allows for direct access to the InfiniBand network device(s).

Any logs, error output, comments, etc?

Looking at the systemd journal (journalctl -f) while sycri was running (in debug mode) I captured this context for the error message.

If I had to guess I would say it is the devices section that sycri is stumbling upon. The /dev/nvidia0 is a character special file (which it mounts fine), but /dev/infiniband is a directory of devices (which it stumbles on):

Snippet (expand for full log):
"devices":[{"
container_path":"/dev/nvidia3","host_path":"/dev/nvidia3","permissions":"mrw"},{"container_path":"/dev/nvidia0","host_path":"/dev/
nvidia0","permissions":"mrw"},{"container_path":"/dev/nvidiactl","host_path":"/dev/nvidiactl","permissions":"mrw"},{"container_pat
h":"/dev/nvidia-uvm","host_path":"/dev/nvidia-uvm","permissions":"mrw"},{"container_path":"/dev/nvidia-uvm-tools","host_path":"/de
v/nvidia-uvm-tools","permissions":"mrw"},{"container_path":"/dev/infiniband","host_path":"/dev/infiniband","permissions":"rwm"}],"
Aug 05 16:11:04 c712f6n09 sycri[30472]: E0805 16:11:04.488113   30472 main.go:276] /runtime.v1alpha2.RuntimeService/CreateContainer
Aug 05 16:11:04 c712f6n09 sycri[30472]: Request: {"pod_sandbox_id":"d3424d8598adc3c1825d21007ef19208ade9d9555a98de8a366524f3159b43
03","config":{"metadata":{"name":"test-sycri"},"image":{"image":"ee300c6a96e763e6bcd1fe30758c901046c92b323cc60b03429c18ef85c5db
b8"},"command":["/opt/mpi/bin/pause"],"working_dir":"/nfs/testuser/","envs":[{"key":"LD_LIBRARY_PATH","value":"/usr/local/nvidia/lib
:/usr/local/nvidia/lib64"},{"key":"MY_POD_NAME","value":"test-sycri-bjmzh"},{"key":"MY_POD_NAMESPACE","value":"default"},{"key"
:"MY_POD_IP","value":"9.1.2.3"},{"key":"SHELL","value":"/bin/bash"},{"key":"MY_NODE_NAME","value":"9.1.2.3"},{"key":"KUB
ERNETES_PORT_443_TCP_PORT","value":"443"},{"key":"KUBERNETES_PORT_443_TCP_ADDR","value":"10.8.0.1"},{"key":"KUBERNETES_SERVICE_HOS
T","value":"10.8.0.1"},{"key":"KUBERNETES_SERVICE_PORT","value":"443"},{"key":"KUBERNETES_SERVICE_PORT_HTTPS","value":"443"},{"key
":"KUBERNETES_PORT","value":"tcp://10.8.0.1:443"},{"key":"KUBERNETES_PORT_443_TCP","value":"tcp://10.8.0.1:443"},{"key":"KUBERNETE
S_PORT_443_TCP_PROTO","value":"tcp"}],"mounts":[{"container_path":"/usr/local/nvidia","host_path":"/var/lib/kubelet/device-plugins
/nvidia-driver/418.67","readonly":true},{"container_path":"/nfs/testuser","host_path":"/nfs/testuser"},{"container_path":"/tmp-play","
host_path":"/tmp"},{"container_path":"/etc/passwd","host_path":"/etc/passwd","readonly":true},{"container_path":"/etc/group","host
_path":"/etc/group","readonly":true},{"container_path":"/var/run/secrets/kubernetes.io/serviceaccount","host_path":"/var/lib/kubel
et/pods/1b6c8efa-b7bd-11e9-9a23-000af7737dd0/volumes/kubernetes.io~secret/default-token-xgh8k","readonly":true},{"container_path":
"/etc/hosts","host_path":"/var/lib/kubelet/pods/1b6c8efa-b7bd-11e9-9a23-000af7737dd0/etc-hosts"},{"container_path":"/dev/terminati
on-log","host_path":"/var/lib/kubelet/pods/1b6c8efa-b7bd-11e9-9a23-000af7737dd0/containers/test-sycri/2f90f041"}],"devices":[{"
container_path":"/dev/nvidia3","host_path":"/dev/nvidia3","permissions":"mrw"},{"container_path":"/dev/nvidia0","host_path":"/dev/
nvidia0","permissions":"mrw"},{"container_path":"/dev/nvidiactl","host_path":"/dev/nvidiactl","permissions":"mrw"},{"container_pat
h":"/dev/nvidia-uvm","host_path":"/dev/nvidia-uvm","permissions":"mrw"},{"container_path":"/dev/nvidia-uvm-tools","host_path":"/de
v/nvidia-uvm-tools","permissions":"mrw"},{"container_path":"/dev/infiniband","host_path":"/dev/infiniband","permissions":"rwm"}],"
labels":{"io.kubernetes.container.name":"test-sycri","io.kubernetes.pod.name":"test-sycri-bjmzh","io.kubernetes.pod.namespac
e":"default","io.kubernetes.pod.uid":"1b6c8efa-b7bd-11e9-9a23-000af7737dd0"},"annotations":{"io.kubernetes.container.hash":"1229b6
c0","io.kubernetes.container.restartCount":"0","io.kubernetes.container.terminationMessagePath":"/dev/termination-log","io.kuberne
tes.container.terminationMessagePolicy":"File","io.kubernetes.pod.terminationGracePeriod":"30"},"log_path":"test-sycri/0.log","
linux":{"resources":{"cpu_period":100000,"cpu_shares":10240,"oom_score_adj":999},"security_context":{"capabilities":{"add_capabili
ties":["CAP_IPC_LOCK","CAP_SYS_NICE"]},"namespace_options":{"network":2,"pid":1},"run_as_user":{"value":123},"supplemental_gro
ups":[100],"seccomp_profile_path":"unconfined","masked_paths":["/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/pro
c/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"readonly_paths":["/proc/asound","/proc/bus","
/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}}},"sandbox_config":{"metadata":{"name":"test-sycri-bjmzh","uid":"1b6c
8efa-b7bd-11e9-9a23-000af7737dd0","namespace":"default"},"log_directory":"/var/log/pods/1b6c8efa-b7bd-11e9-9a23-000af7737dd0","dns
_config":{"servers":["9.3.2.1","192.1.3.2"],"searches":["my.domain.com"]},"labels":{"controller-uid":"1b68bd13-b7bd-
11e9-9a23-000af7737dd0","io.kubernetes.pod.name":"test-sycri-bjmzh","io.kubernetes.pod.namespace":"default","io.kubernetes.pod.
uid":"1b6c8efa-b7bd-11e9-9a23-000af7737dd0","job-name":"test-sycri"},"annotations":{"kubernetes.io/config.seen":"2019-08-05T16:
10:35.67972256-04:00","kubernetes.io/config.source":"api","kubernetes.io/psp":"ibm-privileged-psp"},"linux":{"cgroup_parent":"/kub
epods/burstable/pod1b6c8efa-b7bd-11e9-9a23-000af7737dd0","security_context":{"namespace_options":{"network":2,"pid":1},"run_as_use
r":{"value":123},"supplemental_groups":[100]}}}}
Aug 05 16:11:04 c712f6n09 sycri[30472]: Response: null
Aug 05 16:11:04 c712f6n09 sycri[30472]: Error: rpc error: code = Internal desc = could not create container: could not spawn container: could not create oci bundle: could not generate oci spec for container: could not configure devices: could not get device: not a device node

Environment?

This is an IBM Cloud Private 3.1.2 environment running Kubernetes v1.12.4.

OS distribution and version: RHEL 7.6 (ppc64le)

go version: 1.11.5

go env:

Singularity-CRI version: v1.0.0-beta.5

Singularity version: 3.2.1-1.el7

Kubernetes version: v1.12.4+icp-ee

@jjhursey jjhursey added the bug Something isn't working label Aug 12, 2019
@sashayakovtseva sashayakovtseva self-assigned this Aug 13, 2019
@sashayakovtseva
Copy link
Contributor

Hello @jjhursey

Nice catch, indeed Singularity-CRI does not interpret device directories as directories.

However, I wonder whether this is CRI's responsibility to handle this.
For instance, a different rdma device plugin (https://github.com/hustcat/k8s-rdma-device-plugin) is allocating devices rather that full directory.
Do you mind trying out a different device plugin?

@sashayakovtseva
Copy link
Contributor

However, I see that CRI-o handles this directory case.... Will need to handle as well, I think

@sashayakovtseva
Copy link
Contributor

Hey @jjhursey, there is a pending PR #353 that should fix that. I would appreciate if you check that out before it gets merged 🙂

@sashayakovtseva
Copy link
Contributor

Hopefully opencontainers/runc#2107 will save us from copy pasting.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants