Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diffusion_mnist broken throws CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3) #367

Open
mashu opened this issue Oct 5, 2022 · 5 comments
Open

Comments

@mashu
Copy link

mashu commented Oct 5, 2022

mateusz@debian:~/model-zoo/vision/diffusion_mnist$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.2 (2022-09-29)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(@v1.8) pkg> activate .
  Activating project at `~/model-zoo/vision/diffusion_mnist`

julia> include("diffusion_mnist.jl")

julia> train()
[ Info: Training on GPU
[ Info: Start Training, total 50 epochs
[ Info: Epoch 1
ERROR: CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3)
Stacktrace:
  [1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)
    @ CUDA.CUDNN ~/.julia/packages/CUDA/DfvRa/lib/cudnn/error.jl:22
  [2] macro expansion
    @ ~/.julia/packages/CUDA/DfvRa/lib/cudnn/error.jl:35 [inlined]
  [3] cudnnSetConvolutionNdDescriptor(convDesc::Ptr{Nothing}, arrayLength::Int32, padA::Vector{Int32}, filterStrideA::Vector{Int32}, dilationA::Vector{Int32}, mode::CUDA.CUDNN.cudnnConvolutionMode_t, computeType::CUDA.CUDNN.cudnnDataType_t)
    @ CUDA.CUDNN ~/.julia/packages/CUDA/DfvRa/lib/utils/call.jl:26
  [4] cudnnSetConvolutionDescriptor(ptr::Ptr{Nothing}, padding::Vector{Int32}, stride::Vector{Int32}, dilation::Vector{Int32}, mode::CUDA.CUDNN.cudnnConvolutionMode_t, dataType::CUDA.CUDNN.cudnnDataType_t, mathType::CUDA.CUDNN.cudnnMathType_t, reorderType::CUDA.CUDNN.cudnnReorderType_t, groupCount::Int32)
    @ CUDA.CUDNN ~/.julia/packages/CUDA/DfvRa/lib/cudnn/convolution.jl:135
  [5] CUDA.CUDNN.cudnnConvolutionDescriptor(::Vector{Int32}, ::Vararg{Any})
    @ CUDA.CUDNN ~/.julia/packages/CUDA/DfvRa/lib/cudnn/descriptors.jl:39
  [6] CUDA.CUDNN.cudnnConvolutionDescriptor(cdims::DenseConvDims{2, 2, 2, 4, 2}, x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, pad::Tuple{Int64, Int64})
    @ NNlibCUDA ~/.julia/packages/NNlibCUDA/kCpTE/src/cudnn/conv.jl:48
  [7] cudnnConvolutionDescriptorAndPaddedInput
    @ ~/.julia/packages/NNlibCUDA/kCpTE/src/cudnn/conv.jl:43 [inlined]
  [8] ∇conv_data!(dx::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, dy::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, w::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{2, 2, 2, 4, 2}; alpha::Int64, beta::Int64, algo::Int64)
    @ NNlibCUDA ~/.julia/packages/NNlibCUDA/kCpTE/src/cudnn/conv.jl:98
  [9] ∇conv_data!
    @ ~/.julia/packages/NNlibCUDA/kCpTE/src/cudnn/conv.jl:89 [inlined]
 [10] #∇conv_data#198
    @ ~/.julia/packages/NNlib/0QnJJ/src/conv.jl:99 [inlined]
 [11] ∇conv_data
    @ ~/.julia/packages/NNlib/0QnJJ/src/conv.jl:95 [inlined]
 [12] #rrule#318
    @ ~/.julia/packages/NNlib/0QnJJ/src/conv.jl:326 [inlined]
 [13] rrule
    @ ~/.julia/packages/NNlib/0QnJJ/src/conv.jl:316 [inlined]
 [14] rrule
    @ ~/.julia/packages/ChainRulesCore/C73ay/src/rules.jl:134 [inlined]
 [15] chain_rrule
    @ ~/.julia/packages/Zygote/dABKa/src/compiler/chainrules.jl:218 [inlined]
 [16] macro expansion
    @ ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0 [inlined]
 [17] _pullback
    @ ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:9 [inlined]
 [18] _pullback
    @ ~/.julia/packages/Flux/4k0Ls/src/layers/conv.jl:333 [inlined]
 [19] _pullback(ctx::Zygote.Context{true}, f::ConvTranspose{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Bool}, args::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
 [20] _pullback
    @ ~/model-zoo/vision/diffusion_mnist/diffusion_mnist.jl:140 [inlined]
 [21] _pullback(::Zygote.Context{true}, ::UNet, ::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, ::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
 [22] _pullback
    @ ~/model-zoo/vision/diffusion_mnist/diffusion_mnist.jl:185 [inlined]
 [23] _pullback(::Zygote.Context{true}, ::typeof(model_loss), ::UNet, ::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, ::Float32)
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
 [24] _pullback
    @ ~/model-zoo/vision/diffusion_mnist/diffusion_mnist.jl:176 [inlined]
 [25] _pullback(::Zygote.Context{true}, ::typeof(model_loss), ::UNet, ::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
 [26] _pullback
    @ ~/model-zoo/vision/diffusion_mnist/diffusion_mnist.jl:265 [inlined]
 [27] _pullback(::Zygote.Context{true}, ::var"#12#14"{UNet})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
 [28] pullback(f::Function, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:373
 [29] withgradient(f::Function, args::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:123
 [30] train(; kws::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Main ~/model-zoo/vision/diffusion_mnist/diffusion_mnist.jl:264
 [31] train()
    @ Main ~/model-zoo/vision/diffusion_mnist/diffusion_mnist.jl:222
 [32] top-level scope
    @ REPL[3]:1
 [33] top-level scope
    @ ~/.julia/packages/CUDA/DfvRa/src/initialization.jl:52

julia> CUDA.device()
CuDevice(0): NVIDIA GeForce RTX 3080

@mashu
Copy link
Author

mashu commented Oct 5, 2022

Some additional information

OpenGL renderer string: NVIDIA GeForce RTX 3080/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 510.85.02
OpenGL core profile shading language version string: 4.60 NVIDIA
mateusz@debian:~/model-zoo/vision/diffusion_mnist$ dpkg -l | grep nvidia
ii  glx-alternative-nvidia                     1.2.1                              amd64        allows the selection of NVIDIA as GLX provider
ii  libegl-nvidia0:amd64                       510.85.02-2                        amd64        NVIDIA binary EGL library
ii  libgl1-nvidia-glvnd-glx:amd64              510.85.02-2                        amd64        NVIDIA binary OpenGL/GLX library (GLVND variant)
ii  libgles-nvidia1:amd64                      510.85.02-2                        amd64        NVIDIA binary OpenGL|ES 1.x library
ii  libgles-nvidia2:amd64                      510.85.02-2                        amd64        NVIDIA binary OpenGL|ES 2.x library
ii  libglx-nvidia0:amd64                       510.85.02-2                        amd64        NVIDIA binary GLX library
ii  libnvidia-allocator1:amd64                 510.85.02-2                        amd64        NVIDIA allocator runtime library
ii  libnvidia-cfg1:amd64                       510.85.02-2                        amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-egl-gbm1:amd64                   1.1.0-1                            amd64        GBM EGL external platform library for NVIDIA
ii  libnvidia-egl-wayland1:amd64               1:1.1.10-1                         amd64        Wayland EGL External Platform library -- shared library
ii  libnvidia-eglcore:amd64                    510.85.02-2                        amd64        NVIDIA binary EGL core libraries
ii  libnvidia-encode1:amd64                    510.85.02-2                        amd64        NVENC Video Encoding runtime library
ii  libnvidia-glcore:amd64                     510.85.02-2                        amd64        NVIDIA binary OpenGL/GLX core libraries
ii  libnvidia-glvkspirv:amd64                  510.85.02-2                        amd64        NVIDIA binary Vulkan Spir-V compiler library
ii  libnvidia-ml1:amd64                        510.85.02-2                        amd64        NVIDIA Management Library (NVML) runtime library
ii  libnvidia-ptxjitcompiler1:amd64            510.85.02-2                        amd64        NVIDIA PTX JIT Compiler library
ii  libnvidia-rtcore:amd64                     510.85.02-2                        amd64        NVIDIA binary Vulkan ray tracing (rtcore) library
ii  nvidia-alternative                         510.85.02-2                        amd64        allows the selection of NVIDIA as GLX provider
ii  nvidia-driver                              510.85.02-2                        amd64        NVIDIA metapackage
ii  nvidia-driver-bin                          510.85.02-2                        amd64        NVIDIA driver support binaries
ii  nvidia-driver-libs:amd64                   510.85.02-2                        amd64        NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
ii  nvidia-egl-common                          510.85.02-2                        amd64        NVIDIA binary EGL driver - common files
ii  nvidia-egl-icd:amd64                       510.85.02-2                        amd64        NVIDIA EGL installable client driver (ICD)
ii  nvidia-installer-cleanup                   20220217+1                         amd64        cleanup after driver installation with the nvidia-installer
ii  nvidia-kernel-common                       20220217+1                         amd64        NVIDIA binary kernel module support files
ii  nvidia-kernel-dkms                         510.85.02-2                        amd64        NVIDIA binary kernel module DKMS source
ii  nvidia-kernel-support                      510.85.02-2                        amd64        NVIDIA binary kernel module support files
ii  nvidia-legacy-check                        510.85.02-2                        amd64        check for NVIDIA GPUs requiring a legacy driver
ii  nvidia-modprobe                            515.48.07-1                        amd64        utility to load NVIDIA kernel modules and create device nodes
ii  nvidia-persistenced                        470.129.06-1                       amd64        daemon to maintain persistent software state in the NVIDIA driver
ii  nvidia-powerd                              510.85.02-1                        amd64        NVIDIA Dynamic Boost (daemon)
ii  nvidia-settings                            510.85.02-1                        amd64        tool for configuring the NVIDIA graphics driver
ii  nvidia-smi                                 510.85.02-2                        amd64        NVIDIA System Management Interface
ii  nvidia-support                             20220217+1                         amd64        NVIDIA binary graphics driver support files
ii  nvidia-vaapi-driver:amd64                  0.0.5-1+b1                         amd64        VA-API implementation that uses NVDEC as a backend
ii  nvidia-vdpau-driver:amd64                  510.85.02-2                        amd64        Video Decode and Presentation API for Unix - NVIDIA driver
ii  nvidia-vulkan-common                       510.85.02-2                        amd64        NVIDIA Vulkan driver - common files
ii  nvidia-vulkan-icd:amd64                    510.85.02-2                        amd64        NVIDIA Vulkan installable client driver (ICD)
ii  xserver-xorg-video-nvidia                  510.85.02-2                        amd64        NVIDIA binary Xorg driver
(diffusion_mnist) pkg> st
Status `~/model-zoo/vision/diffusion_mnist/Project.toml`
  [fbb218c0] BSON v0.3.5
  [052768ef] CUDA v3.12.0
  [0c46a032] DifferentialEquations v7.5.0
  [634d3b9d] DrWatson v2.10.0
  [5789e2e9] FileIO v1.15.0
  [587475ba] Flux v0.13.6
  [916415d5] Images v0.25.2
⌅ [eb30cadb] MLDatasets v0.6.0
  [d96e819e] Parameters v0.12.3
  [91a5bcdd] Plots v1.35.2
  [92933f4c] ProgressMeter v1.7.2
  [899adc3e] TensorBoardLogger v0.1.19
  [56ddb016] Logging
  [9a3f8284] Random
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated`

(diffusion_mnist) pkg> status --outdated
Status `~/model-zoo/vision/diffusion_mnist/Project.toml`
⌅ [eb30cadb] MLDatasets v0.6.0 (<v0.7.5) [compat]

julia> using CUDA
julia> CUDA.version()
v"11.6.0"
julia> CUDA.CUDNN.version()
v"8.30.2"

julia> using Flux

julia> x = rand(1000) |> gpu
1000-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 0.12327004
 0.77609754
 0.97980183
 ...

@mashu
Copy link
Author

mashu commented Oct 6, 2022

Follow up on this

  1. NVIDIA drivers
    I removed Debian's 11.6 drivers and upgraded to 11.8 drivers with matching CUDNN from Nvidia website according to their instructions for Debian 11 ( https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#debian-installation)
    Which boils down to adding repo and installing apt install cuda libcudnn8
julia> CUDA.versioninfo()
  Downloaded artifact: CUDA
CUDA toolkit 11.7, artifact installation
NVIDIA driver 520.61.5, for CUDA 11.8
CUDA driver 11.8

Libraries: 
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+520.61.5
  Downloaded artifact: CUDNN
- CUDNN: 8.30.2 (for CUDA 11.5.0)
  Downloaded artifact: CUTENSOR
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.8.2
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 3080 (sm_86, 9.198 GiB / 10.000 GiB available)

I was still getting error and suspected mismatch between 11.7 artifacts which is most recent provided by CUDA.jl and system's driver (however this shouldn't cause issues).

  1. Disabled CUDA.jl's artifacts
    I disabled binary builder by starting Julia with
JULIA_CUDA_USE_BINARYBUILDER=false julia

Now I am using system wide CUDA toolkit and CUDNN libraries, both are matching according to

julia> CUDA.versioninfo()
CUDA toolkit 11.8, local installation
NVIDIA driver 520.61.5, for CUDA 11.8
CUDA driver 11.8

Libraries: 
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+520.61.5
- CUDNN: 8.60.0 (for CUDA 11.8.0)
- CUTENSOR: 1.6.1 (for CUDA 11.8.0)

Toolchain:
- Julia: 1.8.2
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

Environment:
- JULIA_CUDA_USE_BINARYBUILDER: false

1 device:
  0: NVIDIA GeForce RTX 3080 (sm_86, 9.194 GiB / 10.000 GiB available)

Still the same error.

  1. CPU training works therefore debug CUDNN
    With some help on Slack we concluded that error actually is not from artifacts API mismatch. Given that model trains just fine on CPU, some wrapper in CUDA.jl is suspect.
    I tried training with
JULIA_DEBUG=CUDNN JULIA_CUDA_USE_BINARYBUILDER=false julia

Which produced following trace

julia> train()
[ Info: Training on GPU
[ Info: Start Training, total 50 epochs
[ Info: Epoch 1
┌ Debug: CuDNN (v8600) function cudnnGetVersion() called:
│ Time: 2022-10-06T08:51:43.469044 (0d+0h+0m+57s since start)
│ Process=10202; Thread=10202; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ CUDA.CUDNN ~/.julia/packages/CUDA/DfvRa/lib/cudnn/CUDNN.jl:136
┌ Debug: CuDNN (v8600) function cudnnCreateConvolutionDescriptor() called:
│     convDesc: location=host; addr=0x7f0fdf75d340;
│ Time: 2022-10-06T08:51:43.625100 (0d+0h+0m+57s since start)
│ Process=10202; Thread=10202; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ CUDA.CUDNN ~/.julia/packages/CUDA/DfvRa/lib/cudnn/CUDNN.jl:136
ERROR: ┌ Debug: CuDNN (v8600) function cudnnSetConvolutionNdDescriptor() called:
│     convDesc: location=host; addr=0x4b73bbf0;
│     arrayLength: type=int; val=2;
│     padA: type=int; val=[0,0];
│     strideA: type=int; val=[1,1];
│     dilationA: type=int; val=[1,1];
│     mode: type=cudnnConvolutionMode_t; val=CUDNN_CONVOLUTION (0);
│     dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
│ Time: 2022-10-06T08:51:43.687233 (0d+0h+0m+57s since start)
│ Process=10202; Thread=10202; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ CUDA.CUDNN ~/.julia/packages/CUDA/DfvRa/lib/cudnn/CUDNN.jl:136
CUDNNError: ┌ Debug: CuDNN (v8600) function cudnnSetConvolutionMathType() called:
│     convDesc: location=host; addr=0x4b73bbf0;
│     mathType: type=cudnnMathType_t; val=CUDNN_TENSOR_OP_MATH (1);
│ Time: 2022-10-06T08:51:43.687308 (0d+0h+0m+57s since start)
│ Process=10202; Thread=10202; GPU=NULL; Handle=NULL; StreamId=NULL.
└ @ CUDA.CUDNN ~/.julia/packages/CUDA/DfvRa/lib/cudnn/CUDNN.jl:136
CUDNN_STATUS_BAD_PARAM (code 3)
Stacktrace:
  [1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)
    @ CUDA.CUDNN ~/.julia/packages/CUDA/DfvRa/lib/cudnn/error.jl:22
  [2] macro expansion
    @ ~/.julia/packages/CUDA/DfvRa/lib/cudnn/error.jl:35 [inlined]
  [3] cudnnSetConvolutionNdDescriptor(convDesc::Ptr{Nothing}, arrayLength::Int32, padA::Vector{Int32}, filterStrideA::Vector{Int32}, dilationA::Vector{Int32}, mode::CUDA.CUDNN.cudnnConvolutionMode_t, computeType::CUDA.CUDNN.cudnnDataType_t)
    @ CUDA.CUDNN ~/.julia/packages/CUDA/DfvRa/lib/utils/call.jl:26
  [4] cudnnSetConvolutionDescriptor(ptr::Ptr{Nothing}, padding::Vector{Int32}, stride::Vector{Int32}, dilation::Vector{Int32}, mode::CUDA.CUDNN.cudnnConvolutionMode_t, dataType::CUDA.CUDNN.cudnnDataType_t, mathType::CUDA.CUDNN.cudnnMathType_t, reorderType::CUDA.CUDNN.cudnnReorderType_t, groupCount::Int32)
    @ CUDA.CUDNN ~/.julia/packages/CUDA/DfvRa/lib/cudnn/convolution.jl:135
  [5] CUDA.CUDNN.cudnnConvolutionDescriptor(::Vector{Int32}, ::Vararg{Any})
    @ CUDA.CUDNN ~/.julia/packages/CUDA/DfvRa/lib/cudnn/descriptors.jl:39
  [6] CUDA.CUDNN.cudnnConvolutionDescriptor(cdims::DenseConvDims{2, 2, 2, 4, 2}, x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, pad::Tuple{Int64, Int64})
    @ NNlibCUDA ~/.julia/packages/NNlibCUDA/kCpTE/src/cudnn/conv.jl:48
  [7] cudnnConvolutionDescriptorAndPaddedInput
    @ ~/.julia/packages/NNlibCUDA/kCpTE/src/cudnn/conv.jl:43 [inlined]
  [8] ∇conv_data!(dx::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, dy::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, w::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{2, 2, 2, 4, 2}; alpha::Int64, beta::Int64, algo::Int64)
    @ NNlibCUDA ~/.julia/packages/NNlibCUDA/kCpTE/src/cudnn/conv.jl:98
  [9] ∇conv_data!
    @ ~/.julia/packages/NNlibCUDA/kCpTE/src/cudnn/conv.jl:89 [inlined]
 [10] #∇conv_data#198
    @ ~/.julia/packages/NNlib/0QnJJ/src/conv.jl:99 [inlined]
 [11] ∇conv_data
    @ ~/.julia/packages/NNlib/0QnJJ/src/conv.jl:95 [inlined]
 [12] #rrule#318
    @ ~/.julia/packages/NNlib/0QnJJ/src/conv.jl:326 [inlined]
 [13] rrule
    @ ~/.julia/packages/NNlib/0QnJJ/src/conv.jl:316 [inlined]
 [14] rrule
    @ ~/.julia/packages/ChainRulesCore/C73ay/src/rules.jl:134 [inlined]
 [15] chain_rrule
    @ ~/.julia/packages/Zygote/dABKa/src/compiler/chainrules.jl:218 [inlined]
 [16] macro expansion
    @ ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0 [inlined]
 [17] _pullback
    @ ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:9 [inlined]
 [18] _pullback
    @ ~/.julia/packages/Flux/4k0Ls/src/layers/conv.jl:333 [inlined]
 [19] _pullback(ctx::Zygote.Context{true}, f::ConvTranspose{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Bool}, args::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
 [20] _pullback
    @ ~/model-zoo/vision/diffusion_mnist/diffusion_mnist.jl:140 [inlined]
 [21] _pullback(::Zygote.Context{true}, ::UNet, ::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, ::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
 [22] _pullback
    @ ~/model-zoo/vision/diffusion_mnist/diffusion_mnist.jl:185 [inlined]
 [23] _pullback(::Zygote.Context{true}, ::typeof(model_loss), ::UNet, ::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, ::Float32)
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
 [24] _pullback
    @ ~/model-zoo/vision/diffusion_mnist/diffusion_mnist.jl:176 [inlined]
 [25] _pullback(::Zygote.Context{true}, ::typeof(model_loss), ::UNet, ::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
 [26] _pullback
    @ ~/model-zoo/vision/diffusion_mnist/diffusion_mnist.jl:265 [inlined]
 [27] _pullback(::Zygote.Context{true}, ::var"#12#14"{UNet})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
 [28] pullback(f::Function, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:373
 [29] withgradient(f::Function, args::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:123
 [30] train(; kws::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Main ~/model-zoo/vision/diffusion_mnist/diffusion_mnist.jl:264
 [31] train()
    @ Main ~/model-zoo/vision/diffusion_mnist/diffusion_mnist.jl:222
 [32] top-level scope
    @ REPL[3]:1
 [33] top-level scope
    @ ~/.julia/packages/CUDA/DfvRa/src/initialization.jl:52

@mashu
Copy link
Author

mashu commented Oct 6, 2022

I did few more tests and

model trains cuda driver cudnn kernel driver binary builder
diffusion_mnist no 11.8 8.30.2 (for CUDA 11.5.0) 520.61.5 (for CUDA 11.8) yes
diffusion_mnist no 11.8 8.60.0 (for CUDA 11.8.0) 520.61.5 (for CUDA 11.8) no
vae_mnist yes 11.8 8.30.2 (for CUDA 11.5.0) 520.61.5 (for CUDA 11.8) yes
vae_mnist yes 11.8 8.60.0 (for CUDA 11.8.0) 520.61.5 (for CUDA 11.8) no
vgg_cifar10 yes/no* 11.8 8.30.2 (for CUDA 11.5.0) 520.61.5 (for CUDA 11.8) yes
vgg_cifar10 yes 11.8 8.60.0 (for CUDA 11.8.0) 520.61.5 (for CUDA 11.8) no
conv_minst yes 11.8 8.30.2 (for CUDA 11.5.0) 520.61.5 (for CUDA 11.8) yes
conv_minst yes 11.8 8.60.0 (for CUDA 11.8.0) 520.61.5 (for CUDA 11.8) no

Regarding vgg_cifar10. First time I started julia session and updated packages, I got CUDNN_STATUS_INTERNAL_ERROR (code 4). But after restating Julia session and trying again, this model trains just fine. This is not the case with diffusion_mnist which never works and throws CUDNN_STATUS_BAD_PARAM (code 3).

diffusion_mnist trains on CPU also fine.

@ToucheSir
Copy link
Member

If you're able to narrow it down to the particular conv layer which is throwing the error, we can try to replicate those conditions in a MWE.

@mashu
Copy link
Author

mashu commented Jan 8, 2023

The model is broken because it uses negative padding which is not supported on GPU
MWE below

tconv3=ConvTranspose((3, 3), 64 => 32, stride=2,pad=(0, -1, 0, -1), bias=false) |> gpu
X = randn(28,28,64,32) |> gpu
tconv3(X)
ERROR: CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3)
Stacktrace:
  [1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)

But CPU version works as author indended

[size(ConvTranspose((3,3), 64=>32, pad=i)(X)) for i in [1,0,-1,-2]]
 (28, 28, 32, 32)
 (30, 30, 32, 32)
 (32, 32, 32, 32)
 (34, 34, 32, 32)
 [size(Conv((3,3), 64=>32, pad=i)(X)) for i in [1,0,-1,-2]]
  (28, 28, 32, 32)
 (26, 26, 32, 32)
 (24, 24, 32, 32)
 (22, 22, 32, 32)

I don't know if this is same in the original model though, I doubt so, because PyTorch does not support negative padding (despite there has been discussions about supporting it).

torch.nn.Conv2d(64,32,(3,3),stride=2,padding=(-1,-1)).to("cuda")(X)
RuntimeError: negative padding is not supported

and not supported by NVIDIA CUDA
https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetConvolutionNdDescriptor

CUDNN_STATUS_BAD_PARAM
- One of the elements of padA is strictly negative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants