Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect grid size in kron #2410

Closed
avik-pal opened this issue Jun 9, 2024 · 2 comments · Fixed by #2418
Closed

Incorrect grid size in kron #2410

avik-pal opened this issue Jun 9, 2024 · 2 comments · Fixed by #2418
Labels
bug Something isn't working cuda array Stuff about CuArray. good first issue Good for newcomers

Comments

@avik-pal
Copy link

avik-pal commented Jun 9, 2024

Describe the bug

kron throws error for specific matrix sizes

To reproduce

The Minimal Working Example (MWE) for this bug:

julia> size(kron(rand(100,1), rand(3, 1)))
(300, 1)

julia> size(kron(cu(rand(100,1)), cu(rand(3, 1))))
ERROR: Grid dimensions should be non-null
Long Error Message

ERROR: Grid dimensions should be non-null
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] diagnose_launch_failure(f::CuFunction, err::CuError; blockdim::CuDim3, threaddim::CuDim3, shmem::Int64)
    @ CUDA ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:84
  [3] launch(::CuFunction, ::CUDA.KernelState, ::CuDeviceMatrix{…}, ::CuDeviceMatrix{…}, ::CuDeviceMatrix{…}, ::Int64, ::Int64, ::Int64, ::Int64; blocks::Tuple{…}, threads::Tuple{…}, cooperative::Bool, shmem::Int64, stream::CuStream)
    @ CUDA ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:73
  [4] launch
    @ ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:52 [inlined]
  [5] #972
    @ ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:189 [inlined]
  [6] macro expansion
    @ ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:149 [inlined]
  [7] macro expansion
    @ ./none:0 [inlined]
  [8] convert_arguments
    @ ./none:0 [inlined]
  [9] #cudacall#971
    @ ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:191 [inlined]
 [10] cudacall
    @ ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:187 [inlined]
 [11] macro expansion
    @ ~/.julia/packages/CUDA/75aiI/src/compiler/execution.jl:268 [inlined]
 [12] macro expansion
    @ ./none:0 [inlined]
 [13] call
    @ ./none:0 [inlined]
 [14] (::CUDA.HostKernel{…})(::CuArray{…}, ::CuArray{…}, ::CuArray{…}, ::Int64, ::Int64, ::Int64, ::Int64; threads::Tuple{…}, blocks::Tuple{…}, kwargs::@Kwargs{})
    @ CUDA ~/.julia/packages/CUDA/75aiI/src/compiler/execution.jl:390
 [15] HostKernel
    @ ~/.julia/packages/CUDA/75aiI/src/compiler/execution.jl:389 [inlined]
 [16] kron!(C::CuArray{Float32, 2, CUDA.DeviceMemory}, A::CuArray{Float32, 2, CUDA.DeviceMemory}, B::CuArray{Float32, 2, CUDA.DeviceMemory})
    @ CUDA.CUBLAS ~/.julia/packages/CUDA/75aiI/lib/cublas/linalg.jl:761
 [17] kron(A::CuArray{Float32, 2, CUDA.DeviceMemory}, B::CuArray{Float32, 2, CUDA.DeviceMemory})
    @ CUDA.CUBLAS ~/.julia/packages/CUDA/75aiI/lib/cublas/linalg.jl:773
 [18] top-level scope
    @ REPL[72]:1
 [19] top-level scope
    @ none:1

caused by: CUDA error: invalid argument (code 1, ERROR_INVALID_VALUE)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/.julia/packages/CUDA/75aiI/lib/cudadrv/libcuda.jl:30
  [2] check
    @ ~/.julia/packages/CUDA/75aiI/lib/cudadrv/libcuda.jl:37 [inlined]
  [3] cuLaunchKernel
    @ ~/.julia/packages/CUDA/75aiI/lib/utils/call.jl:34 [inlined]
  [4] (::CUDA.var"#966#967"{Bool, Int64, CuStream, CuFunction, CuDim3, CuDim3})(kernelParams::Vector{Ptr{Nothing}})
    @ CUDA ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:66
  [5] macro expansion
    @ ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:33 [inlined]
  [6] macro expansion
    @ ./none:0 [inlined]
  [7] pack_arguments(::CUDA.var"#966#967"{Bool, Int64, CuStream, CuFunction, CuDim3, CuDim3}, ::CUDA.KernelState, ::CuDeviceMatrix{Float32, 1}, ::CuDeviceMatrix{Float32, 1}, ::CuDeviceMatrix{Float32, 1}, ::Int64, ::Int64, ::Int64, ::Int64)
    @ CUDA ./none:0
  [8] launch(::CuFunction, ::CUDA.KernelState, ::CuDeviceMatrix{…}, ::CuDeviceMatrix{…}, ::CuDeviceMatrix{…}, ::Int64, ::Int64, ::Int64, ::Int64; blocks::Tuple{…}, threads::Tuple{…}, cooperative::Bool, shmem::Int64, stream::CuStream)
    @ CUDA ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:59
  [9] launch
    @ ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:52 [inlined]
 [10] #972
    @ ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:189 [inlined]
 [11] macro expansion
    @ ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:149 [inlined]
 [12] macro expansion
    @ ./none:0 [inlined]
 [13] convert_arguments
    @ ./none:0 [inlined]
 [14] #cudacall#971
    @ ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:191 [inlined]
 [15] cudacall
    @ ~/.julia/packages/CUDA/75aiI/lib/cudadrv/execution.jl:187 [inlined]
 [16] macro expansion
    @ ~/.julia/packages/CUDA/75aiI/src/compiler/execution.jl:268 [inlined]
 [17] macro expansion
    @ ./none:0 [inlined]
 [18] call
    @ ./none:0 [inlined]
 [19] (::CUDA.HostKernel{…})(::CuArray{…}, ::CuArray{…}, ::CuArray{…}, ::Int64, ::Int64, ::Int64, ::Int64; threads::Tuple{…}, blocks::Tuple{…}, kwargs::@Kwargs{})
    @ CUDA ~/.julia/packages/CUDA/75aiI/src/compiler/execution.jl:390
 [20] HostKernel
    @ ~/.julia/packages/CUDA/75aiI/src/compiler/execution.jl:389 [inlined]
 [21] kron!(C::CuArray{Float32, 2, CUDA.DeviceMemory}, A::CuArray{Float32, 2, CUDA.DeviceMemory}, B::CuArray{Float32, 2, CUDA.DeviceMemory})
    @ CUDA.CUBLAS ~/.julia/packages/CUDA/75aiI/lib/cublas/linalg.jl:761
 [22] kron(A::CuArray{Float32, 2, CUDA.DeviceMemory}, B::CuArray{Float32, 2, CUDA.DeviceMemory})
    @ CUDA.CUBLAS ~/.julia/packages/CUDA/75aiI/lib/cublas/linalg.jl:773
 [23] top-level scope
    @ REPL[72]:1
 [24] top-level scope
    @ none:1
Some type information was truncated. Use `show(err)` to see complete types.

Version info

Details on Julia:

Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × AMD Ryzen 5 4600H with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
Threads: 12 default, 0 interactive, 6 GC (on 12 virtual cores)
Environment:
  JULIA_EDITOR = vim

Details on CUDA:

CUDA runtime 12.5, artifact installation
CUDA driver 12.4
NVIDIA driver 550.78.0

CUDA libraries: 
- CUBLAS: 12.5.2
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.2
- CUSPARSE: 12.4.1
- CUPTI: 23.0.0
- NVML: 12.0.0+550.78

Julia packages: 
- CUDA: 5.4.2
- CUDA_Driver_jll: 0.9.0+0
- CUDA_Runtime_jll: 0.14.0+1

Toolchain:
- Julia: 1.10.4
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce GTX 1650 (sm_75, 2.801 GiB / 4.000 GiB available)
@avik-pal avik-pal added the bug Something isn't working label Jun 9, 2024
@maleadt maleadt added good first issue Good for newcomers cuda array Stuff about CuArray. labels Jun 10, 2024
@maleadt
Copy link
Member

maleadt commented Jun 11, 2024

I can't reproduce, so this is likely launch configuration-related (i.e., only triggering on certain devices where specific launch configurations are used). Could you perhaps @show the following variables to show what's happening?

m, n = size(A)
p, q = size(B)
# Use different kernels depending on the size of the matrices
# choosing to parallelize the matrix with the largest number of elements
m*n >= p*q ? (kernel = @cuda launch=false _kron_mat_kernelA!(C, A, B, m, n, p, q)) :
(kernel = @cuda launch=false _kron_mat_kernelB!(C, A, B, m, n, p, q))
m*n >= p*q ? (sizes = (m, n)) : (sizes = (p, q))
config = launch_configuration(kernel.fun)
dim_ratio = sizes[1] / sizes[2]
max_threads_i = floor(Int, sqrt(config.threads * dim_ratio))
max_threads_j = floor(Int, sqrt(config.threads / dim_ratio))
max_blocks_i = floor(Int, sqrt(config.blocks * dim_ratio))
max_blocks_j = floor(Int, sqrt(config.blocks / dim_ratio))
threads_i = min(sizes[1], max_threads_i)
threads_j = min(sizes[2], max_threads_j)
threads = (threads_i, threads_j)
blocks_i = min(cld(sizes[1], threads_i), max_blocks_i)
blocks_j = min(cld(sizes[2], threads_j), max_blocks_j)
blocks = (blocks_i, blocks_j)

@avik-pal
Copy link
Author

julia> size(kron(cu(rand(100,1)), cu(rand(3, 1))))
(m, n, p, q) = (100, 1, 3, 1)
sizes = (100, 1)
config = (blocks = 14, threads = 640)
dim_ratio = 100.0
(max_threads_i, max_threads_j, max_blocks_i, max_blocks_j) = (252, 2, 37, 0)
(threads_i, threads_j, blocks_i, blocks_j) = (100, 1, 1, 0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda array Stuff about CuArray. good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants