`torchvision` breaks in official `pytorch` Docker image: `RuntimeError: Couldn't load custom C++ ops.` #4222

joek13 · 2021-07-29T13:41:07Z

🐛 Bug

I'm using the pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime Docker image and trying to install torchvision on top. The installation proceeds as expected, but if I try to call a function that uses custom C++ ops (such as torchvision.ops.nms), I get the following error message:

RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.

I can confirm that the installed versions are compatible by bashing into the container and opening a Python prompt:

>>> import torch
>>> torch.__version__
'1.9.0'
>>> import torchvision
>>> torchvision.__version__
'0.10.0'
>>> import torchvision.ops

This issue occurs regardless of if I install pytorch by:

Using pip, i.e., RUN pip install torchvision
Using conda without a version pin, i.e., RUN conda install -c pytorch torchvision
Using conda with a version pin, i.e., RUN conda install -c pytorch torchvision=0.10.0

To Reproduce

Steps to reproduce the behavior:

In a new directory:

Create a minimal Dockerfile with the following content:

FROM pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime

RUN conda install -c pytorch torchvision

COPY ./test.py ./test.py

ENTRYPOINT ["python", "test.py"]

Create a minimal test.py with the following content:

import torchvision.ops

torchvision.ops.nms(None, None, 0.0)

Build and run the container:

docker build -t torchvisiondockerbug . && docker run torchvisiondockerbug

Observe the following output:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    torchvision.ops.nms(None, None, 0.0)
  File "/opt/conda/lib/python3.7/site-packages/torchvision/ops/boxes.py", line 34, in nms
    _assert_has_ops()
  File "/opt/conda/lib/python3.7/site-packages/torchvision/extension.py", line 63, in _assert_has_ops
    "Couldn't load custom C++ ops. This can happen if your PyTorch and "
RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.

Expected behavior

I expect to be able to load custom C++ ops, because torch 1.9.0 and torchvision 0.10.0 are marked as compatible in torchvision's compatibility matrix.

In a working environment, the output of test.py looks like this:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    torchvision.ops.nms(None, None, 0.0)
  File "/home/joe/.pyenv/versions/pytorch_problem/lib/python3.7/site-packages/torchvision/ops/boxes.py", line 35, in nms
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
RuntimeError: torchvision::nms() Expected a value of type 'Tensor' for argument 'dets' but instead found type 'NoneType'.
Position: 0
Value: None
Declaration: torchvision::nms(Tensor dets, Tensor scores, float iou_threshold) -> (Tensor)
Cast error details: Unable to cast Python instance to C++ type (compile in debug mode for details)

(Yes, this is still an error, but it at least demonstrates that _assert_has_ops is successful.)

Environment

Output of running collect_env.py inside the Docker container:

Collecting environment information...
PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.10

Python version: 3.7.10 (default, Feb 26 2021, 18:47:35)  [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-5.4.72-microsoft-standard-WSL2-x86_64-with-debian-buster-sid
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.9.0
[pip3] torchelastic==0.2.0
[pip3] torchtext==0.10.0
[pip3] torchvision==0.10.0
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.2.89              h6bb024c_0    nvidia
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.2.0           h06a4308_296
[conda] mkl-service               2.3.0            py37h27cfd23_1
[conda] mkl_fft                   1.3.0            py37h42c9631_2
[conda] mkl_random                1.2.1            py37ha9443f7_2
[conda] numpy                     1.20.2           py37h2d18471_0
[conda] numpy-base                1.20.2           py37hfae3a4d_0
[conda] pytorch                   1.9.0           py3.7_cuda10.2_cudnn7.6.5_0    pytorch
[conda] torchelastic              0.2.0                    pypi_0    pypi
[conda] torchtext                 0.10.0                     py37    pytorch
[conda] torchvision               0.10.0               py37_cu102    pytorch

The text was updated successfully, but these errors were encountered:

joek13 · 2021-08-02T13:48:35Z

In case anyone else struggles with this, the workaround I'm using is to start with the base nvidia/cuda image and install Python, torch, and torchvision on top.

The beginning of my Dockerfile looks like this:

FROM nvidia/cuda:11.4.0-runtime-ubuntu20.04

WORKDIR /app

# Setting DEBIAN_FRONTEND=noninteractive allows installation
# of some packages to complete without user input.
# Install Python3.8
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y python3.8 python3-pip \
    && rm -rf /var/lib/apt/lists/*

# copy Python dependencies
COPY requirements.txt .
# install them
RUN pip install -r requirements.txt

indam · 2021-08-10T01:32:00Z

Having the same issue with 1.9.0-cuda11.1-cudnn8-runtime

vfdev-5 · 2021-08-11T21:21:18Z

Yes, same for me.

Cc @seemethere

sberryman · 2021-08-14T16:44:04Z

FYI: I was able to get torchvision to work using the pytorch/pytorch:1.9.0-cuda11.1-cudnn8-devel container.

RUN pip3 install \
    torchvision==0.10.0+cu111 \
    -f https://download.pytorch.org/whl/torch_stable.html

KimSangYeon-DGU · 2022-02-15T08:41:33Z

In my case, the workaround is to uninstall torchvision and reinstall it. After that, the version of PyTorch was subsequently upgraded from 1.9.0 to 1.10.2 (torchvision: 0.11.3).

pip uninstall torchvision
pip install torchvision

malfet · 2022-03-02T22:08:36Z

I can not reproduce the problem using 1.10.0-cuda11.3-cudnn8-runtime

$ docker build -t torchvisiondockerbug . && docker run torchvisiondockerbug
Sending build context to Docker daemon  3.072kB
Step 1/4 : FROM pytorch/pytorch:1.10.0-cuda11.3-cudnn8-runtime
 ---> c3f17e5ac010
Step 2/4 : RUN conda install -c pytorch torchvision
 ---> Running in 0fb646354e70
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /opt/conda

  added / updated specs:
    - torchvision


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2022.2.1   |       h06a4308_0         122 KB
    certifi-2021.10.8          |   py37h06a4308_2         151 KB
    conda-4.11.0               |   py37h06a4308_0        14.4 MB
    openssl-1.1.1m             |       h7f8727e_0         2.5 MB
    torchvision-0.11.1         |       py37_cu113        30.3 MB  pytorch
    ------------------------------------------------------------
                                           Total:        47.6 MB

The following packages will be UPDATED:

  ca-certificates                      2021.9.30-h06a4308_1 --> 2022.2.1-h06a4308_0
  certifi                          2021.10.8-py37h06a4308_0 --> 2021.10.8-py37h06a4308_2
  conda                               4.10.3-py37h06a4308_0 --> 4.11.0-py37h06a4308_0
  openssl                                 1.1.1l-h7f8727e_0 --> 1.1.1m-h7f8727e_0
  torchvision                             0.11.0-py37_cu113 --> 0.11.1-py37_cu113


Proceed ([y]/n)? 

Downloading and Extracting Packages
certifi-2021.10.8    | 151 KB    | ########## | 100% 
ca-certificates-2022 | 122 KB    | ########## | 100% 
torchvision-0.11.1   | 30.3 MB   | ########## | 100% 
openssl-1.1.1m       | 2.5 MB    | ########## | 100% 
conda-4.11.0         | 14.4 MB   | ########## | 100% 
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Removing intermediate container 0fb646354e70
 ---> 44b573a1432a
Step 3/4 : COPY ./test.py ./test.py
 ---> 7f91b82fa28a
Step 4/4 : ENTRYPOINT ["python", "test.py"]
 ---> Running in 917ee4855033
Removing intermediate container 917ee4855033
 ---> 14aed0ea9819
Successfully built 14aed0ea9819
Successfully tagged torchvisiondockerbug:latest
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    torchvision.ops.nms(None, None, 0.0)
  File "/opt/conda/lib/python3.7/site-packages/torchvision/ops/boxes.py", line 35, in nms
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
RuntimeError: torchvision::nms() Expected a value of type 'Tensor' for argument 'dets' but instead found type 'NoneType'.
Position: 0
Value: None
Declaration: torchvision::nms(Tensor dets, Tensor scores, float iou_threshold) -> (Tensor)
Cast error details: Unable to cast Python instance to C++ type (compile in debug mode for details)
$ cat Dockerfile 
FROM pytorch/pytorch:1.10.0-cuda11.3-cudnn8-runtime

RUN conda install -c pytorch torchvision

COPY ./test.py ./test.py

ENTRYPOINT ["python", "test.py"]

Also, please note, that torchvision is already pre-installed in the container, so running something like

$ docker run -it pytorch/pytorch:1.10.0-cuda11.3-cudnn8-runtime python -c "import torchvision;torchvision.ops.nms(None, None, 0.0)"

Produces the same result. Closing. Please do not hesitate to reopen a new one if it will be reproduced in new builds

nepeta2o · 2022-03-04T03:47:55Z

@malfet This issue still exists in pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime
I'm not able to use any newer image because the nvidia driver on my machine is compatible only up to cuda 10.2. Could you please provide any suggestions?

To reproduce:
Running

docker run -it --gpus all pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime python -c "import torchvision;torchvision.ops.nms(None, None, 0.0)"

Produce error messages:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.7/site-packages/torchvision/ops/boxes.py", line 34, in nms
    _assert_has_ops()
  File "/opt/conda/lib/python3.7/site-packages/torchvision/extension.py", line 63, in _assert_has_ops
    "Couldn't load custom C++ ops. This can happen if your PyTorch and "
RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further
 information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your to
rchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.

vfdev-5 added module: ops topic: binaries labels Aug 11, 2021

seemethere added the high priority label Aug 11, 2021

pytorch-probot bot added the triage review label Aug 11, 2021

seemethere removed the triage review label Aug 11, 2021

seemethere assigned seemethere and malfet Feb 28, 2022

malfet closed this as completed Mar 2, 2022

sueszli mentioned this issue Sep 16, 2024

Improve Cross-Platform Compatibility and Build Process #8652

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`torchvision` breaks in official `pytorch` Docker image: `RuntimeError: Couldn't load custom C++ ops.` #4222

`torchvision` breaks in official `pytorch` Docker image: `RuntimeError: Couldn't load custom C++ ops.` #4222

joek13 commented Jul 29, 2021

joek13 commented Aug 2, 2021

indam commented Aug 10, 2021

vfdev-5 commented Aug 11, 2021

sberryman commented Aug 14, 2021

KimSangYeon-DGU commented Feb 15, 2022 •

edited

Loading

malfet commented Mar 2, 2022

nepeta2o commented Mar 4, 2022

torchvision breaks in official pytorch Docker image: RuntimeError: Couldn't load custom C++ ops. #4222

torchvision breaks in official pytorch Docker image: RuntimeError: Couldn't load custom C++ ops. #4222

Comments

joek13 commented Jul 29, 2021

🐛 Bug

To Reproduce

Expected behavior

Environment

joek13 commented Aug 2, 2021

indam commented Aug 10, 2021

vfdev-5 commented Aug 11, 2021

sberryman commented Aug 14, 2021

KimSangYeon-DGU commented Feb 15, 2022 • edited Loading

malfet commented Mar 2, 2022

nepeta2o commented Mar 4, 2022

`torchvision` breaks in official `pytorch` Docker image: `RuntimeError: Couldn't load custom C++ ops.` #4222

`torchvision` breaks in official `pytorch` Docker image: `RuntimeError: Couldn't load custom C++ ops.` #4222

KimSangYeon-DGU commented Feb 15, 2022 •

edited

Loading