fix sort bug #2344

xaellison · 2024-04-25T01:40:30Z

No description provided.

maleadt · 2024-04-25T11:56:31Z

You can reproduce this locally by restricting the number of threads kernel1 can use while setting the number for kernel2 higher than that (which shouldn't matter since it's the blocksize of a kernel we won't execute):

diff --git a/src/sorting.jl b/src/sorting.jl
index 7dd563831..3568aaafe 100644
--- a/src/sorting.jl
+++ b/src/sorting.jl
@@ -909,7 +909,7 @@ function bitonic_sort!(c; by = identity, lt = isless, rev = false, dims=1)
     # compile kernels (using Int32 for indexing, if possible, yielding a 70% speedup)
     I = c_len <= typemax(Int32) ? Int32 : Int
     args1 = (c, I(c_len), one(I), one(I), one(I), by, lt, Val(rev), Val(dims))
-    kernel1 = @cuda launch=false comparator_small_kernel(args1...)
+    kernel1 = @cuda maxthreads=896 launch=false comparator_small_kernel(args1...)

     config1 = launch_configuration(kernel1.fun, shmem = threads -> bitonic_shmem(c, threads))
     args2 = (c, I(c_len), one(I), one(I), by, lt, Val(rev), Val(dims))
@@ -917,6 +917,7 @@ function bitonic_sort!(c; by = identity, lt = isless, rev = false, dims=1)
     config2 = launch_configuration(kernel2.fun, shmem = threads -> bitonic_shmem(c, threads))
     # blocksize for kernel2 MUST be a power of 2
     threads2 = prevpow(2, config2.threads)
+    threads2 = 1024 # doesn't matter since we'll pick kernel1

     # determines cutoff for when to use kernel1 vs kernel2
     log_threads = threads2 |> log2 |> Int

julia> CUDA.bitonic_sort!(CUDA.rand(Int32, (2, 2, 50000)); dims=3)
ERROR: Number of threads per block exceeds kernel limit (1024 > 896).

maleadt · 2024-04-27T06:44:16Z

#2353

attempt to add pipeline step

5d1ae37

xaellison changed the title ~~attempt to add pipeline step~~ fix sort bug Apr 25, 2024

xaellison added 10 commits April 24, 2024 21:41

rename group

87235cd

new key value

490bca2

unique label

d1f1c3c

depends on

bf4c19e

reformat cuda req

3fcc023

relax gpu req

2116440

constraints

58a7ef9

pin cuda in matrix

11e5cb2

drop gpuarrays tests

d1f2d71

revert ci changes

712fb9a

maleadt closed this Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix sort bug #2344

fix sort bug #2344

xaellison commented Apr 25, 2024

maleadt commented Apr 25, 2024

maleadt commented Apr 27, 2024

fix sort bug #2344

fix sort bug #2344

Conversation

xaellison commented Apr 25, 2024

maleadt commented Apr 25, 2024

maleadt commented Apr 27, 2024