Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

missing libcuda.so #508

Closed
mpekalski opened this issue Oct 27, 2017 · 4 comments
Closed

missing libcuda.so #508

mpekalski opened this issue Oct 27, 2017 · 4 comments

Comments

@mpekalski
Copy link

I am trying to build tensorflow v1.4.0-rc1 on top of nvidia/cuda:9.0-cuddn7-devel-ubuntu16.04 using docker-ce 17.09 and nvidia-docker2. After the build tensorflow started to complain that it is missing libcuda.so. I found it in /usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs/ and I added it to LD_LIBRARY_PATH but it was still missing. Then I started to investigate the build.

I think that /usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs/ should be added to /etc/ld.so.conf.d/nvidia.conf as many people (looking at previous issues) had problems with finding/linking libcuda.so. I am not sure though why it only found libcuda.so.1 and not libcuda.so, but I am not expert in compiling code and linking libraries. :)

So during the build I run a couple of commands to debug the problem.

RUN ls /etc/ld.so.conf.d/
RUN ldconfig && ldconfig -p | grep libcuda
RUN cat /etc/ld.so.conf.d/cuda-9-0.conf
RUN echo "/usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs/" >> /etc/ld.so.conf.d/nvidia_mp.conf
RUN updatedb && ldconfig && find / -name 'libcuda.so*' | grep -v bazel
RUN env | grep stubs
RUN ldconfig && ldconfig -p | grep libcuda

and here is the outcome:

Step 21/96 : RUN ls /etc/ld.so.conf.d/
 ---> Running in 12905bda0a54
cuda-9-0.conf
libc.conf
nvidia.conf
x86_64-linux-gnu.conf
x86_64-linux-gnu_EGL.conf
x86_64-linux-gnu_GL.conf
zz_i386-biarch-compat.conf
zz_x32-biarch-compat.conf
 ---> d6b4402e48be
Removing intermediate container 12905bda0a54
Step 22/96 : RUN ldconfig && ldconfig -p | grep libcuda
 ---> Running in f651ac560c54
        libcudart.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudart.so.9.0
        libcudart.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudart.so
 ---> 650165e79a23
Removing intermediate container f651ac560c54
Step 23/96 : RUN cat /etc/ld.so.conf.d/cuda-9-0.conf
 ---> Running in fb53b7c5c6b5
/usr/local/cuda-9.0/targets/x86_64-linux/lib
 ---> dbf0a67b732f
Removing intermediate container fb53b7c5c6b5
Step 24/96 : RUN echo "/usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs/" >> /etc/ld.so.conf.d/nvidia_mp.conf
 ---> Running in daf77d037aeb
 ---> d74bbc9793d7
Removing intermediate container daf77d037aeb
Step 25/96 : RUN updatedb && ldconfig && find / -name 'libcuda.so*' | grep -v bazel
 ---> Running in 291c85e7e857
/usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs/libcuda.so
/usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs/libcuda.so.1
 ---> 8cca6b26fa68
Removing intermediate container 291c85e7e857
Step 26/96 : RUN env | grep stubs
 ---> Running in b2cef115a154
LIBRARY_PATH=/usr/local/cuda/lib64/stubs:
LD_LIBRARY_PATH=/usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs/:/lib/amd64/server/:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server:/usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs;/opt/boost/lib:/usr/local/cuda/lib64/stubs:/opt/conda/lib/:/usr/local/cuda/lib64/:/opt/conda/lib/R/lib/:/usr/local/nvidia/lib64/:/usr/local/nvidia/lib:/lib/x86_64-linux-gnu:/usr/local/cuda/extras/CUPTI/lib64
PATH=/opt/conda/bin:/usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs:/bin:/opt/boost/bin:/usr/lib/jvm/java-8-openjdk-amd64:/opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
 ---> 4739bf15b98a
Removing intermediate container b2cef115a154
Step 27/96 : RUN ldconfig && ldconfig -p | grep libcuda
 ---> Running in 3e487cdd125e
        libcudart.so.9.0 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudart.so.9.0
        libcudart.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudart.so
        libcuda.so.1 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs/libcuda.so.1
 ---> 02a1a5281718
@flx42
Copy link
Member

flx42 commented Oct 27, 2017

I found it in /usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs/ and I added it to LD_LIBRARY_PATH but it was still missing.

If it was from -lcuda argument, it's because it needs to be LIBRARY_PATH, not LD_LIBRARY_PATH.

I think that /usr/local/cuda-9.0/targets/x86_64-linux/lib/stubs/ should be added to /etc/ld.so.conf.d/nvidia.conf

No, this library is only for build time, not for runtime. You don't want to pick the stub at runtime.

But also be aware that TensorFlow needs to do a hack with the stubs because of Bazel, see this discussion:
tensorflow/tensorflow#13399 (comment)

The current Dockerfile:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel-gpu-cuda9-cudnn7

@mpekalski
Copy link
Author

Maybe that is why TF was looking for libcuda.so when I was trying to run it (I had libcuda.so in LD_LIBRARY_PATH and maybe even in PATH). I will build my docker image over night (UTC+1) and see does it work for me.

@mpekalski
Copy link
Author

Everything works like a charm. Thank you.

@flx42
Copy link
Member

flx42 commented Oct 29, 2017

You're welcome!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants