Intermittent load_store_tg test failure #428

maleadt · 2024-09-24T13:03:56Z

Error in testset device/intrinsics:
Test Failed at /Users/tim/Julia/pkg/Metal/test/device/intrinsics.jl:388
  Expression: Array(a) == Array(b)
   Evaluated: Float32[0.010908663 0.45874983 … 0.45182294 0.19928169; 0.049171805 0.6918573 … 0.7434447 0.11716604; … ; 0.72251517 0.859477 … 0.5544929 0.733749; 0.024094284 0.85571766 … 0.8398908 0.4304701] == Float32[0.0 0.0 … 0.45182294 0.19928169; 0.0 0.0 … 0.7434447 0.11716604; … ; 0.0 0.0 … 0.5544929 0.733749; 0.0 0.0 … 0.8398908 0.4304701]

Observed on:

macOS 15.0.0, Darwin 24.0.0

Toolchain:
- Julia: 1.10.5
- LLVM: 15.0.7

Julia packages:
- Metal.jl: 1.3.0
- GPUArrays: 10.3.1
- GPUCompiler: 0.27.8
- KernelAbstractions: 0.9.27
- ObjectiveC: 3.1.0
- LLVM: 9.1.2
- LLVMDowngrader_jll: 0.3.0+1

1 device:
- Apple M3 Pro (80.000 KiB allocated)

The text was updated successfully, but these errors were encountered:

christiangnrd · 2024-09-24T15:30:50Z

I can reproduce on M2 Max Julia 1.10 and 1.11.

Take between 350 and 23000 iterations to reproduce using the code below.

count = 0
for _ in 1:50000
    for typ in [Float16, Float32]
        function kernel(a::MtlDeviceArray{T}, b::MtlDeviceArray{T}) where {T}
            pos = thread_position_in_threadgroup_2d()

            tg_a = MtlThreadGroupArray(T, (8, 8))
            tg_a[pos.x, pos.y] = a[pos.x, pos.y]
            sg_a = simdgroup_load(tg_a)

            tg_b = MtlThreadGroupArray(T, (8, 8))
            simdgroup_store(sg_a, tg_b)
            b[pos.x, pos.y] = tg_b[pos.x, pos.y]

            return
        end

        a = MtlArray(rand(typ, 8, 8))
        b = MtlArray(zeros(typ, 8, 8))
        @metal threads=(8, 8) kernel(a, b)
        if Array(a) != Array(b)
            @test Array(a) == Array(b)
        end
    end
    count += 1
end

christiangnrd · 2024-09-25T21:56:36Z

This didn't fail for 1000000 iterations just by adding threadgroup_barrier(Metal.MemoryFlagThreadGroup) as shown below. Is this just a fluke? I still don't fully understand what barriers do so I don't know if this actually fixes the problem or just hides it.

count = 0
for _ in 1:50000
    for typ in [Float16, Float32]
        function kernel(a::MtlDeviceArray{T}, b::MtlDeviceArray{T}) where {T}
            pos = thread_position_in_threadgroup_2d()

            tg_a = MtlThreadGroupArray(T, (8, 8))
            tg_a[pos.x, pos.y] = a[pos.x, pos.y]
            threadgroup_barrier(Metal.MemoryFlagThreadGroup)
            sg_a = simdgroup_load(tg_a)

            tg_b = MtlThreadGroupArray(T, (8, 8))
            simdgroup_store(sg_a, tg_b)
            b[pos.x, pos.y] = tg_b[pos.x, pos.y]

            return
        end

        a = MtlArray(rand(typ, 8, 8))
        b = MtlArray(zeros(typ, 8, 8))
        @metal threads=(8, 8) kernel(a, b)
        if Array(a) != Array(b)
            @test Array(a) == Array(b)
        end
    end
    count += 1
end

maleadt · 2024-09-26T11:37:50Z

Yeah, that does seem correct. I guess the simdgroup_load could result in a different thread reading the threadgroup array, so we need a memory fence for the threagroup memory (aka. threadgroup_barrier(Metal.MemoryFlagThreadGroup)) to ensure the correct memory is read by the SIMD operation afterwards.

Possibly we need another fence after the simdgroup_store, since we're not directly having the wavefront store to device memory, but go through the threadgroup array first. I'll add one to be sure.

maleadt mentioned this issue Sep 26, 2024

Add a missing memory fence to a SIMD test. #432

Merged

maleadt closed this as completed in #432 Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent load_store_tg test failure #428

Intermittent load_store_tg test failure #428

maleadt commented Sep 24, 2024

christiangnrd commented Sep 24, 2024

christiangnrd commented Sep 25, 2024

maleadt commented Sep 26, 2024

Intermittent load_store_tg test failure #428

Intermittent load_store_tg test failure #428

Comments

maleadt commented Sep 24, 2024

christiangnrd commented Sep 24, 2024

christiangnrd commented Sep 25, 2024

maleadt commented Sep 26, 2024