[LLVMGPU][ROCm] Isel error for `%llvm.amdgcn.wmma.f16.16x16x16.f16` for GFX1100. #18060

EllisLambda · 2024-07-31T10:02:08Z

What happened?

Using sdxl-script compile for sdxl-scripts/fp16-model/base_ir/stable_diffusion_xl_base_1_0_64_fp16_prompt_encoder.mlir. Configure the iree-compile with

--iree-hal-target-backends=rocm
--iree-input-type=torch
--iree-rocm-target-chip=gfx1100
--iree-rocm-bc-dir=/opt/rocm-6.1.2/lib/llvm/lib/clang/17/lib/amdgcn/bitcode
--iree-global-opt-propagate-transposes=true
--iree-opt-outer-dim-concat=true 
--iree-opt-const-eval=false
--iree-llvmgpu-enable-prefetch 
--iree-flow-enable-aggressive-fusion 
--iree-global-opt-enable-fuse-horizontal-contractions=true
--iree-opt-aggressively-propagate-transposes=true 
--iree-codegen-llvmgpu-use-vector-distribution=true
--iree-execution-model=async-external 
--iree-hal-dump-executable-configurations-to=configurations/clip
--iree-hal-dump-executable-sources-to=sources/clip 
--iree-hal-dump-executable-binaries-to=binaries/clip 
--iree-hal-dump-executable-benchmarks-to=benchmarks/clip

The rocdl dialect operation can be observed in --mlir-print-ir-before="iree-util-fuse-globals" dump outputs

%303 = rocdl.wmma.f16.16x16x16.f16 %270, %302, %20, %0 : (vector<16xf16>, vector<16xf16>, vector<8xf16>, i1) -> vector<8xf16>

While the HAL codegen process, the LLVM pass crash

LLVM ERROR: Cannot select: intrinsic %llvm.amdgcn.wmma.f16.16x16x16.f16
Please report issues to https://github.com/iree-org/iree/issues and include the crash backtrace.
Stack dump:
0.      Running pass 'CallGraph Pass Manager' on module 'encode_prompts$async_dispatch_8'.
1.      Running pass 'AMDGPU DAG->DAG Pattern Instruction Selection' on function '@"encode_prompts$async_dispatch_8_batch_matmul_transpose_b_12x64x64x64_f16"'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
LLVM ERROR: Cannot select: intrinsic %llvm.amdgcn.wmma.f16.16x16x16.f16`

There seems llvm.amdgcn.wmma.f16.16x16x16.f16.v8f16.v16f16 instruction in --print-isel-input output dump
There's no issue when the param is --iree-rocm-target-chip=gfx940

Steps to reproduce your issue

Running compile-clip.sh with the params above.

What component(s) does this issue relate to?

MLIR, Compiler

Version information

IREE (https://iree.dev):
IREE compiler version (#17985)
LLVM version 19.0.0git
Optimized build

Additional context

No response

The text was updated successfully, but these errors were encountered:

nirvedhmeshram · 2024-08-12T20:48:25Z

Here is a reduced repro for anyone interested with the following minimal_repro.mlir

func.func @gemm_failure(%9 : tensor<12x64x64xf16>, %10 : tensor<12x64x64xf16>) -> tensor<12x64x64xf16>  {   
    %cst = arith.constant 0.000000e+00 : f16    
    %11 = tensor.empty() : tensor<12x64x64xf16>
    %12 = linalg.fill ins(%cst : f16) outs(%11 : tensor<12x64x64xf16>) -> tensor<12x64x64xf16>
    %13 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>, 
        affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel", "reduction"]} 
        ins(%9, %10 : tensor<12x64x64xf16>, tensor<12x64x64xf16>) outs(%12 : tensor<12x64x64xf16>) {
          ^bb0(%in: f16, %in_0: f16, %out: f16):
            %14 = arith.mulf %in, %in_0 : f16
            %15 = arith.addf %out, %14 : f16
            linalg.yield %15 : f16
          } -> tensor<12x64x64xf16>
    return %13 : tensor<12x64x64xf16>
}

compilation fails with

iree-compile minimal_repro.mlir \
    --iree-hal-target-backends=rocm \
    --iree-rocm-target-chip=gfx1100 \
    -o output.vmfb

nirvedhmeshram · 2024-08-12T21:35:23Z

The IR we are making seems reasonable to me, I have opened an upstream llvm issue
llvm/llvm-project#102981

nirvedhmeshram · 2024-08-12T22:42:35Z

I have a lead in that this intrinsic is only allowed when we have -mattr=+wavefrontsize64 but I believe we are using wavefrontsize32 or not passing it correctly. I should be able to solve this tomorrow.

EllisLambda · 2024-08-13T03:59:10Z

I have a lead in that this intrinsic is only allowed when we have -mattr=+wavefrontsize64 but I believe we are using wavefrontsize32 or not passing it correctly. I should be able to solve this tomorrow.

Yeah, I mentioned it in #17807. but at the same time I have tried changing the preference to wavefront64. But there's sharedmemory out of bound error occur.

EllisLambda added the bug 🐞 Something isn't working label Jul 31, 2024

kuhar added the codegen/hip ROCm code generation compiler backend label Jul 31, 2024

nirvedhmeshram self-assigned this Aug 12, 2024

EllisLambda closed this as not planned Won't fix, can't repro, duplicate, stale Aug 13, 2024

EllisLambda reopened this Aug 13, 2024

nirvedhmeshram mentioned this issue Aug 13, 2024

[ROCM] fix layout for WMMA_F16_16x16x16_F16 intrinsic #18206

Merged

nirvedhmeshram closed this as completed in #18206 Aug 13, 2024

nirvedhmeshram closed this as completed in 868f41e Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVMGPU][ROCm] Isel error for `%llvm.amdgcn.wmma.f16.16x16x16.f16` for GFX1100. #18060

[LLVMGPU][ROCm] Isel error for `%llvm.amdgcn.wmma.f16.16x16x16.f16` for GFX1100. #18060

EllisLambda commented Jul 31, 2024

nirvedhmeshram commented Aug 12, 2024 •

edited

Loading

nirvedhmeshram commented Aug 12, 2024 •

edited

Loading

nirvedhmeshram commented Aug 12, 2024

EllisLambda commented Aug 13, 2024 •

edited

Loading

[LLVMGPU][ROCm] Isel error for %llvm.amdgcn.wmma.f16.16x16x16.f16 for GFX1100. #18060

[LLVMGPU][ROCm] Isel error for %llvm.amdgcn.wmma.f16.16x16x16.f16 for GFX1100. #18060

Comments

EllisLambda commented Jul 31, 2024

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

nirvedhmeshram commented Aug 12, 2024 • edited Loading

nirvedhmeshram commented Aug 12, 2024 • edited Loading

nirvedhmeshram commented Aug 12, 2024

EllisLambda commented Aug 13, 2024 • edited Loading

[LLVMGPU][ROCm] Isel error for `%llvm.amdgcn.wmma.f16.16x16x16.f16` for GFX1100. #18060

[LLVMGPU][ROCm] Isel error for `%llvm.amdgcn.wmma.f16.16x16x16.f16` for GFX1100. #18060

nirvedhmeshram commented Aug 12, 2024 •

edited

Loading

nirvedhmeshram commented Aug 12, 2024 •

edited

Loading

EllisLambda commented Aug 13, 2024 •

edited

Loading