Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia Xavier devices: exception thrown during kernel execution on device Xavier #349

Closed
IanButterworth opened this issue Aug 5, 2020 · 6 comments
Labels
bug Something isn't working

Comments

@IanButterworth
Copy link
Contributor

Describe the bug

Using ObjectDetector.jl (uses CUDA via Flux) on v1.5.0 I hit this kernel issue NVidia Xavier devices

Note that this is one of the NVIDIA jetpack-based devices, that uses a modified CUDA

julia> using ObjectDetector
julia> yolomod = YOLO.v3_608_COCO(batch=1, silent=true)
julia> batch = emptybatch(yolomod)
julia> res = yolomod(batch, detectThresh=0.5, overlapThresh=0.8)

ERROR: InitError: KernelException: exception thrown during kernel execution on device Xavier
Stacktrace:
 [1] check_exceptions() at /home/ian/.julia/packages/CUDA/7vLVC/src/compiler/exceptions.jl:95
 [2] prepare_cuda_call at /home/ian/.julia/packages/CUDA/7vLVC/src/state.jl:37 [inlined]
 [3] context at /home/ian/.julia/packages/CUDA/7vLVC/src/state.jl:104 [inlined]
 [4] _cufunction(::GPUCompiler.FunctionSpec{typeof(CUDA.partial_mapreduce_grid),Tuple{typeof(identity),typeof(Base.add_sum),Int64,CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}},CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}},Val{true},CUDA.CuDeviceArray{Int64,3,CUDA.AS.Global},CUDA.CuDeviceArray{Int64,2,CUDA.AS.Global}}}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/ian/.julia/packages/CUDA/7vLVC/src/compiler/execution.jl:304
 [5] _cufunction at /home/ian/.julia/packages/CUDA/7vLVC/src/compiler/execution.jl:304 [inlined]
 [6] check_cache(::typeof(CUDA._cufunction), ::GPUCompiler.FunctionSpec{typeof(CUDA.partial_mapreduce_grid),Tuple{typeof(identity),typeof(Base.add_sum),Int64,CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}},CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}},Val{true},CUDA.CuDeviceArray{Int64,3,CUDA.AS.Global},CUDA.CuDeviceArray{Int64,2,CUDA.AS.Global}}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/ian/.julia/packages/GPUCompiler/pCBTA/src/cache.jl:24
 [7] partial_mapreduce_grid at /home/ian/.julia/packages/CUDA/7vLVC/src/mapreduce.jl:93 [inlined]
 [8] cached_compilation(::typeof(CUDA._cufunction), ::GPUCompiler.FunctionSpec{typeof(CUDA.partial_mapreduce_grid),Tuple{typeof(identity),typeof(Base.add_sum),Int64,CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}},CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}},Val{true},CUDA.CuDeviceArray{Int64,3,CUDA.AS.Global},CUDA.CuDeviceArray{Int64,2,CUDA.AS.Global}}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/ian/.julia/packages/GPUCompiler/pCBTA/src/cache.jl:0
 [9] cached_compilation at /home/ian/.julia/packages/GPUCompiler/pCBTA/src/cache.jl:40 [inlined]
 [10] cufunction(::typeof(CUDA.partial_mapreduce_grid), ::Type{Tuple{typeof(identity),typeof(Base.add_sum),Int64,CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}},CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}},Val{true},CUDA.CuDeviceArray{Int64,3,CUDA.AS.Global},CUDA.CuDeviceArray{Int64,2,CUDA.AS.Global}}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/ian/.julia/packages/CUDA/7vLVC/src/compiler/execution.jl:298
 [11] cufunction at /home/ian/.julia/packages/CUDA/7vLVC/src/compiler/execution.jl:293 [inlined]
 [12] mapreducedim!(::typeof(identity), ::typeof(Base.add_sum), ::CUDA.CuArray{Int64,2}, ::CUDA.CuArray{Int64,2}; init::Int64) at /home/ian/.julia/packages/CUDA/7vLVC/src/mapreduce.jl:196
 [13] mapreducedim!(::typeof(identity), ::typeof(Base.add_sum), ::CUDA.CuArray{Int64,1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Tuple{Base.OneTo{Int64}},Base.var"#211#212"{typeof(identity)},Tuple{CUDA.CuArray{Int32,1}}}; init::Int64) at /home/ian/.julia/packages/CUDA/7vLVC/src/mapreduce.jl:240
 [14] _mapreduce(::Base.var"#211#212"{typeof(identity)}, ::typeof(Base.add_sum), ::CUDA.CuArray{Int32,1}; dims::Colon, init::Int64) at /home/ian/.julia/packages/GPUArrays/PkHCM/src/host/mapreduce.jl:62
 [15] #mapreduce#23 at /home/ian/.julia/packages/GPUArrays/PkHCM/src/host/mapreduce.jl:28 [inlined]
 [16] #count#624 at ./reducedim.jl:390 [inlined]
 [17] #count#623 at ./reducedim.jl:389 [inlined]
 [18] count at ./reducedim.jl:389 [inlined]
 [19] keepdetections(::CUDA.CuArray{Float32,2}) at /home/ian/.julia/packages/ObjectDetector/vo1Bl/src/yolo/yolo.jl:504
 [20] (::ObjectDetector.YOLO.yolo)(::CUDA.CuArray{Float32,4}; detectThresh::Float64, overlapThresh::Float64) at /home/ian/.julia/packages/ObjectDetector/vo1Bl/src/yolo/yolo.jl:618
 
Manifest.toml

[052768ef] CUDA v1.2.1

Expected behavior

A clear and concise description of what you expected to happen.

Version info

Details on Julia:

Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Linux (aarch64-unknown-linux-gnu)
  CPU: unknown
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, generic)
Environment:
  JULIA_DEBUG = CUDA
  JULIA_NUM_THREADS = 8

Details on CUDA:

julia> using CUDA

julia> CUDA.version()
┌ Debug: Initializing CUDA driver
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/src/initialization.jl:91
┌ Debug: Trying to use artifacts...
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/bindeps.jl:126
┌ Debug: Selecting artifacts based on driver version 10.0.0
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/bindeps.jl:135
┌ Debug: Could not find a compatible artifact.
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/bindeps.jl:149
┌ Debug: Trying to use local installation...
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/bindeps.jl:182
┌ Debug: Request to look for binary ptxas
│   locations = String[]
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:108
┌ Debug: Looking for binary ptxas
│   locations = String[]
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:118
┌ Debug: Found binary ptxas at /usr/local/cuda-10.0/bin/ptxas
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:124
┌ Debug: Looking for CUDA toolkit via ptxas binary
│   path = "/usr/local/cuda-10.0/bin/ptxas"
│   dir = "/usr/local/cuda-10.0"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:254
┌ Debug: Request to look for library cudart nothing
│   locations = String[]
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:69
┌ Debug: Looking for library libcudart.so
│   locations = String[]
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:89
┌ Debug: Found library libcudart.so at /usr/local/cuda-10.0/lib64
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:97
┌ Debug: Looking for CUDA toolkit via CUDA runtime library
│   path = "/usr/local/cuda-10.0/lib64/libcudart.so"
│   dir = "/usr/local/cuda-10.0"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:266
┌ Debug: Looking for CUDA toolkit via default installation directories
│   dirs =
│    1-element Array{String,1}:
│     "/usr/local/cuda-10.0"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:293
┌ Debug: Found CUDA toolkit at /usr/local/cuda-10.0
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:299
┌ Debug: Request to look for binary nvdisasm
│   locations =
│    1-element Array{String,1}:
│     "/usr/local/cuda-10.0"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:108
┌ Debug: Looking for binary nvdisasm
│   locations =
│    2-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/bin"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:118
┌ Debug: Found binary nvdisasm at /usr/local/cuda-10.0/bin/nvdisasm
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:124
┌ Debug: CUDA toolkit identified as 10.0.326
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:315
┌ Debug: Request to look for library cupti 10.0.0
│   locations =
│    2-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/extras/CUPTI"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:69
┌ Debug: Looking for library libcupti.so.10.0, libcupti.so.10
│   locations =
│    8-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/lib"
│     "/usr/local/cuda-10.0/lib64"
│     "/usr/local/cuda-10.0/libx64"
│     "/usr/local/cuda-10.0/extras/CUPTI"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib64"
│     "/usr/local/cuda-10.0/extras/CUPTI/libx64"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:89
┌ Debug: Found library libcupti.so.10.0 at /usr/local/cuda-10.0/extras/CUPTI/lib64
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:97
┌ Debug: Request to look for library nvToolsExt 1.0.0
│   locations =
│    2-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/extras/CUPTI"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:69
┌ Debug: Looking for library libnvToolsExt.so.1.0, libnvToolsExt.so.1
│   locations =
│    8-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/lib"
│     "/usr/local/cuda-10.0/lib64"
│     "/usr/local/cuda-10.0/libx64"
│     "/usr/local/cuda-10.0/extras/CUPTI"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib64"
│     "/usr/local/cuda-10.0/extras/CUPTI/libx64"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:89
┌ Debug: Found library libnvToolsExt.so.1 at /usr/local/cuda-10.0/lib64
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:97
┌ Debug: Request to look for libcudadevrt 
│   locations =
│    2-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/extras/CUPTI"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:361
┌ Debug: Looking for CUDA device runtime library libcudadevrt.a
│   locations =
│    6-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/lib"
│     "/usr/local/cuda-10.0/lib64"
│     "/usr/local/cuda-10.0/extras/CUPTI"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib64"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:390
┌ Debug: Found CUDA device runtime library libcudadevrt.a at /usr/local/cuda-10.0/lib64
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:397
┌ Debug: Request to look for libdevice
│   locations =
│    2-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/extras/CUPTI"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:328
┌ Debug: Look for libdevice
│   locations =
│    3-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/nvvm/libdevice"
│     "/usr/local/cuda-10.0/extras/CUPTI"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:340
┌ Debug: Found unified device library at /usr/local/cuda-10.0/nvvm/libdevice/libdevice.10.bc
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:345
┌ Debug: Request to look for library cublas 10.0.0
│   locations =
│    2-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/extras/CUPTI"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:69
┌ Debug: Looking for library libcublas.so.10.0, libcublas.so.10
│   locations =
│    8-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/lib"
│     "/usr/local/cuda-10.0/lib64"
│     "/usr/local/cuda-10.0/libx64"
│     "/usr/local/cuda-10.0/extras/CUPTI"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib64"
│     "/usr/local/cuda-10.0/extras/CUPTI/libx64"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:89
┌ Debug: Found library libcublas.so.10.0 at /usr/local/cuda-10.0/lib64
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:97
┌ Debug: Request to look for library cusparse 10.0.0
│   locations =
│    2-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/extras/CUPTI"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:69
┌ Debug: Looking for library libcusparse.so.10.0, libcusparse.so.10
│   locations =
│    8-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/lib"
│     "/usr/local/cuda-10.0/lib64"
│     "/usr/local/cuda-10.0/libx64"
│     "/usr/local/cuda-10.0/extras/CUPTI"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib64"
│     "/usr/local/cuda-10.0/extras/CUPTI/libx64"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:89
┌ Debug: Found library libcusparse.so.10.0 at /usr/local/cuda-10.0/lib64
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:97
┌ Debug: Request to look for library cusolver 10.0.0
│   locations =
│    2-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/extras/CUPTI"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:69
┌ Debug: Looking for library libcusolver.so.10.0, libcusolver.so.10
│   locations =
│    8-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/lib"
│     "/usr/local/cuda-10.0/lib64"
│     "/usr/local/cuda-10.0/libx64"
│     "/usr/local/cuda-10.0/extras/CUPTI"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib64"
│     "/usr/local/cuda-10.0/extras/CUPTI/libx64"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:89
┌ Debug: Found library libcusolver.so.10.0 at /usr/local/cuda-10.0/lib64
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:97
┌ Debug: Request to look for library cufft 10.0.0
│   locations =
│    2-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/extras/CUPTI"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:69
┌ Debug: Looking for library libcufft.so.10.0, libcufft.so.10
│   locations =
│    8-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/lib"
│     "/usr/local/cuda-10.0/lib64"
│     "/usr/local/cuda-10.0/libx64"
│     "/usr/local/cuda-10.0/extras/CUPTI"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib64"
│     "/usr/local/cuda-10.0/extras/CUPTI/libx64"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:89
┌ Debug: Found library libcufft.so.10.0 at /usr/local/cuda-10.0/lib64
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:97
┌ Debug: Request to look for library curand 10.0.0
│   locations =
│    2-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/extras/CUPTI"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:69
┌ Debug: Looking for library libcurand.so.10.0, libcurand.so.10
│   locations =
│    8-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/lib"
│     "/usr/local/cuda-10.0/lib64"
│     "/usr/local/cuda-10.0/libx64"
│     "/usr/local/cuda-10.0/extras/CUPTI"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib64"
│     "/usr/local/cuda-10.0/extras/CUPTI/libx64"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:89
┌ Debug: Found library libcurand.so.10.0 at /usr/local/cuda-10.0/lib64
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:97
┌ Debug: Found local CUDA 10.0.326 at /usr/local/cuda-10.0, /usr/local/cuda-10.0/extras/CUPTI
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/bindeps.jl:226
┌ Debug: Request to look for library cudnn 8.0.0
│   locations =
│    2-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/extras/CUPTI"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:69
┌ Debug: Looking for library libcudnn.so.8.0, libcudnn.so.8
│   locations =
│    8-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/lib"
│     "/usr/local/cuda-10.0/lib64"
│     "/usr/local/cuda-10.0/libx64"
│     "/usr/local/cuda-10.0/extras/CUPTI"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib64"
│     "/usr/local/cuda-10.0/extras/CUPTI/libx64"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:89
┌ Debug: Request to look for library cudnn 7.0.0
│   locations =
│    2-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/extras/CUPTI"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:69
┌ Debug: Looking for library libcudnn.so.7.0, libcudnn.so.7
│   locations =
│    8-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/lib"
│     "/usr/local/cuda-10.0/lib64"
│     "/usr/local/cuda-10.0/libx64"
│     "/usr/local/cuda-10.0/extras/CUPTI"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib64"
│     "/usr/local/cuda-10.0/extras/CUPTI/libx64"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:89
┌ Debug: Found library libcudnn.so.7 at /usr/lib/aarch64-linux-gnu
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:97
┌ Debug: Using local CUDNN at /usr/lib/aarch64-linux-gnu/libcudnn.so.7
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/bindeps.jl:291
┌ Debug: Request to look for library cutensor 1.0.0
│   locations =
│    2-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/extras/CUPTI"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:69
┌ Debug: Looking for library libcutensor.so.1.0, libcutensor.so.1
│   locations =
│    8-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/lib"
│     "/usr/local/cuda-10.0/lib64"
│     "/usr/local/cuda-10.0/libx64"
│     "/usr/local/cuda-10.0/extras/CUPTI"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib64"
│     "/usr/local/cuda-10.0/extras/CUPTI/libx64"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:89
┌ Debug: Request to look for library cutensor nothing
│   locations =
│    2-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/extras/CUPTI"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:69
┌ Debug: Looking for library libcutensor.so
│   locations =
│    8-element Array{String,1}:
│     "/usr/local/cuda-10.0"
│     "/usr/local/cuda-10.0/lib"
│     "/usr/local/cuda-10.0/lib64"
│     "/usr/local/cuda-10.0/libx64"
│     "/usr/local/cuda-10.0/extras/CUPTI"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib"
│     "/usr/local/cuda-10.0/extras/CUPTI/lib64"
│     "/usr/local/cuda-10.0/extras/CUPTI/libx64"
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/discovery.jl:89
┌ Debug: Toolchain with LLVM 9.0.1, CUDA driver 10.0.0 and toolkit 10.0.326 supports devices 3.0, 3.2, 3.5, 3.7, 5.0, 5.2, 5.3, 6.0, 6.1, 6.2, 7.0, 7.2 and 7.5; PTX 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1 and 6.3
└ @ CUDA ~/.julia/packages/CUDA/7vLVC/deps/compatibility.jl:239
v"10.0.0"

julia> CUDA.versioninfo()
CUDA toolkit 10.0.326, local installation
CUDA driver 10.0.0

Libraries: 
- CUBLAS: 10.0.0
- CURAND: 10.0.0
- CUFFT: 10.0.0
- CUSOLVER: 10.0.0
- CUSPARSE: 10.0.0
- CUPTI: 12.0.0
- NVML: missing
- CUDNN: 7.6.3 (for CUDA 10.0.0)
- CUTENSOR: missing

Toolchain:
- Julia: 1.5.0
- LLVM: 9.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3
- Device support: sm_30, sm_32, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75

1 device(s):
- Xavier (sm_72, 25.845 GiB / 31.171 GiB available)


Additional context

@IanButterworth IanButterworth added the bug Something isn't working label Aug 5, 2020
@maleadt
Copy link
Member

maleadt commented Aug 6, 2020

Since you have these packages devved, could you add some debug statement to the call to mapreducedim! to see which exact invocation triggers the device-side exception?

Also, did you not see additional output from the device? It should have printed something (if anything, a recommendation to run under -g2 to get to see a stack trace).

This also probably has nothing to do with running on a Xavier.

@IanButterworth
Copy link
Contributor Author

did you not see additional output from the device?

Sorry, that wasn't a verbatim output sequence. I just added the top 4 lines as a MWE, then pasted the error below. I'll re-run this on the device and report more

@IanButterworth
Copy link
Contributor Author

IanButterworth commented Aug 7, 2020

Yeah, this is happening on a GeForce GTX 1650 system too, with JLL-based CUDA.

I've extracted the code which serves as a MWE.

By the way, this code block has been unchanged recently, except for a switch from CuArrays.zeros to CUDA.zeros

function keepdetections(input::CuArray) # THREADS:BLOCKS CAN BE OPTIMIZED WITH BETTER KERNEL
    rows, cols = size(input)
    bools = CUDA.zeros(Int32, cols)
    @cuda blocks=cols threads=rows kern_genbools(input, bools)
    idxs = cumsum(bools)
    n = count(bools)
    output = CuArray{Float32, 2}(undef, rows, n)
    @cuda blocks=cols threads=rows kern_keepdetections(input, output, bools, idxs)
    return output
end
function kern_genbools(input::CuDeviceArray, output::CuDeviceArray)
    col = (blockIdx().x-1) * blockDim().x + threadIdx().x
    cols = gridDim().x
    if col < cols && input[5, col] > Float32(0)
        @inbounds output[col] = Int32(1)
    end
    return
end
@inline function kern_keepdetections(input::CuDeviceArray, output::CuDeviceArray,
    bools::CuDeviceArray, idxs::CuDeviceArray)
    col = blockIdx().x
    row = threadIdx().x
    if bools[col] == Int32(1)
        idx = idxs[col]
        @inbounds output[row, idx] = input[row, col]
    end
    return
end
julia> keepdetections(cu(rand(Float32, 2, 1)))

ERROR: a type error was thrown during kernel execution.
Stacktrace:
 [1] #211 at reduce.jl:843
 [2] _broadcast_getindex_evalf at broadcast.jl:648
 [3] _broadcast_getindex at broadcast.jl:621
 [4] getindex at broadcast.jl:575
 [5] _map_getindex at /home/ian/.julia/packages/CUDA/7vLVC/src/mapreduce.jl:84
 [6] partial_mapreduce_grid at /home/ian/.julia/packages/CUDA/7vLVC/src/mapreduce.jl:119
ERROR: KernelException: exception thrown during kernel execution on device GeForce GTX 1650 with Max-Q Design
Stacktrace:
 [1] check_exceptions() at /home/ian/.julia/packages/CUDA/7vLVC/src/compiler/exceptions.jl:95
 [2] prepare_cuda_call at /home/ian/.julia/packages/CUDA/7vLVC/src/state.jl:37 [inlined]
 [3] context at /home/ian/.julia/packages/CUDA/7vLVC/src/state.jl:104 [inlined]
 [4] CuArray{Float32,2}(::CuPtr{Float32}, ::Tuple{Int64,Int64}, ::Bool) at /home/ian/.julia/packages/CUDA/7vLVC/src/array.jl:19 (repeats 2 times)
 [5] CuArray{Float32,2}(::UndefInitializer, ::Tuple{Int64,Int64}) at /home/ian/.julia/packages/CUDA/7vLVC/src/array.jl:117
 [6] CuArray at /home/ian/.julia/packages/CUDA/7vLVC/src/array.jl:121 [inlined]
 [7] keepdetections(::CuArray{Float32,2}) at ./REPL[11]:7
 [8] top-level scope at REPL[16]:1

Any idea why this has broken?

@IanButterworth
Copy link
Contributor Author

Seems to be a problem with count

julia> bools = CUDA.zeros(Int32, 1)
1-element CuArray{Int32,1}:
 0
julia> count(bools)
ERROR: a type error was thrown during kernel execution.
Stacktrace:
 [1] #211 at reduce.jl:843
 [2] _broadcast_getindex_evalf at broadcast.jl:648
 [3] _broadcast_getindex at broadcast.jl:621
 [4] getindex at broadcast.jl:575
 [5] _map_getindex at /home/ian/.julia/packages/CUDA/7vLVC/src/mapreduce.jl:84
 [6] partial_mapreduce_grid at /home/ian/.julia/packages/CUDA/7vLVC/src/mapreduce.jl:119
0

@IanButterworth
Copy link
Contributor Author

Fixed with this. But I'm not sure why this started erroring.
Perhaps it's a 1.5 issue.

julia> count(isone, bools)
0

@maleadt
Copy link
Member

maleadt commented Aug 7, 2020

Yeah, it's probably this type error you were seeing:

julia> count(zeros(Int32, 1))
ERROR: TypeError: non-boolean (Int32) used in boolean context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants