-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEMM of non-contiguous inputs should dispatch to fallback implementation #2412
Comments
To debug this, I would step through the execution and figure out why this doesn't dispatch to either the generic GPUArrays' version, or the optimized CUBLAS ones, both of which are invoked from here: Lines 285 to 330 in e1e5be2
|
Ok, the problem is that the @view(prova1[1, 1:10]) # 10-element view(::CuArray{Int64, 2, CUDA.DeviceMemory}, 1, 1:10) with eltype Int64
@view(prova1[1, 1:10]) isa AbstractVector # true
@view(prova1[1, 1:10]) isa CuVector # false Is this because the indices Moreover, I noticed that the my_vec = CUDA.rand(Int, 100)
@view(my_vec[1:10]) # 10-element CuArray{Int64, 1, CUDA.DeviceMemory} Following this, the I also noticed that, making a view of the vector using a vector of indices, instead of a |
mul!(A, B, C)
with A
a view of a Matrix
I don't think that's the problem. You have to check against |
One would need the stacktrace of the non-working cases, to see the call path. Then we'd get an idea which dispatch is failing. And yes, |
|
So, how does this work with |
GPUArrays.jl' GEMM has no eltype restrictions: https://github.com/JuliaGPU/GPUArrays.jl/blob/3f8734280311e0954318ccc118b28a179bac6fff/src/host/linalg.jl#L342-L373 |
Aha, and the ones in here https://github.com/search?q=repo%3AJuliaGPU%2FCUDA.jl%20generic_matvecmul!&type=code don't apply because the first argument is a view, and not a straight |
That signature can be relaxed. At least, the wrappers ( |
BTW, that method signature is so generic, I wonder if it will be chosen here: Line 210 in e1e5be2
|
Describe the bug
The in-place multiplication
mul!(A, B, C)
fails whenA
is a vector view of a matrix.To reproduce
The Minimal Working Example (MWE) for this bug:
Surprisingly, it works for matrix views of matrices
Manifest.toml
Expected behavior
No scalar indexing error.
Version info
Details on Julia:
Details on CUDA:
The text was updated successfully, but these errors were encountered: