Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between iree-compile and standalone torch-mlir-opt compile #17832

Closed
AmosLewis opened this issue Jul 9, 2024 · 3 comments
Closed
Labels
bug 🐞 Something isn't working integrations/pytorch PyTorch integration work

Comments

@AmosLewis
Copy link
Contributor

What happened?

Inconsistency found when lowering Inception_v4_vaiq_int8 model nod-ai/SHARK-TestSuite#190

  1. Passed: standalone torch-mlir-opt + iree: onnx -> torch -> linalg -> vmfb
/home/chi/src/torch-mlir/build/bin/torch-mlir-opt -pass-pipeline='builtin.module(func.func(convert-torch-onnx-to-torch),torch-lower-to-backend-contract,func.func(cse,canonicalize),torch-backend-to-linalg-on-tensors-backend-pipeline)' Inception_v4_vaiq_int8.default.torch-onnx.mlir > Inception_v4_vaiq_int8.default.onnx.linalg.mlir

/home/chi/src/iree-build/tools/iree-compile --iree-input-demote-i64-to-i32 --iree-hal-target-backends=llvm-cpu  Inception_v4_vaiq_int8.default.onnx.linalg.mlir > Inception_v4_vaiq_int8.default.vmfb

/home/chi/src/iree-build/tools/iree-run-module --module=Inception_v4_vaiq_int8.default.vmfb --input="32x3x224x224xf32=@inference_input.0.bin"  --output=@inference_output.0.bin  --output=@inference_output.1.bin  --output=@inference_output.2.bin  --output=@inference_output.3.bin  --output=@inference_output.4.bin  --output=@inference_output.5.bin  --output=@inference_output.6.bin  --output=@inference_output.7.bin  --output=@inference_output.8.bin  --output=@inference_output.9.bin  --output=@inference_output.10.bin  --output=@inference_output.11.bin  --output=@inference_output.12.bin  --output=@inference_output.13.bin  --output=@inference_output.14.bin  --output=@inference_output.15.bin  --output=@inference_output.16.bin  --output=@inference_output.17.bin  --output=@inference_output.18.bin  --output=@inference_output.19.bin  --output=@inference_output.20.bin  --output=@inference_output.21.bin  --output=@inference_output.22.bin  --output=@inference_output.23.bin  --output=@inference_output.24.bin  --output=@inference_output.25.bin  --output=@inference_output.26.bin  --output=@inference_output.27.bin  --output=@inference_output.28.bin  --output=@inference_output.29.bin  --output=@inference_output.30.bin  --output=@inference_output.31.bin
  1. Failed: iree: onnx -> vmfb:
/home/chi/src/iree-build/tools/iree-compile --iree-input-demote-i64-to-i32 --iree-hal-target-backends=llvm-cpu  Inception_v4_vaiq_int8.default.onnx.torch.mlir > Inception_v4_vaiq_int8.default.vmfb

Failed log:

failed to translate executables
Inception_v4_vaiq_int8.default.onnx.torch.mlir:403:12: error: One or more operations with large vector sizes (8192 bytes) were found:

    %399 = torch.operator "onnx.Relu"(%398) : (!torch.vtensor<[32,192,25,25],f32>) -> !torch.vtensor<[32,192,25,25],f32> 
           ^
<unknown>:0: note:   %cst_0 = arith.constant dense<1.250000e-01> : vector<1x192x3x5xf32>

Inception_v4_vaiq_int8.default.onnx.torch.mlir:397:12: note:   %21 = arith.extsi %20 : vector<1x192x3x5xi8> to vector<1x192x3x5xi32>

    %393 = torch.operator "onnx.DequantizeLinear"(%392, %303, %301) : (!torch.vtensor<[32,192,52,52],si8>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[32,192,52,52],f32> 
           ^
Inception_v4_vaiq_int8.default.onnx.torch.mlir:397:12: note:   %22 = arith.sitofp %21 : vector<1x192x3x5xi32> to vector<1x192x3x5xf32>

Inception_v4_vaiq_int8.default.onnx.torch.mlir:397:12: note:   %23 = arith.mulf %22, %cst_0 : vector<1x192x3x5xf32>

Inception_v4_vaiq_int8.default.onnx.torch.mlir:397:12: note:   %24 = vector.transfer_write %23, %18[%c0, %c0, %c0, %c0], %19 {in_bounds = [true, true, true, true]} : vector<1x192x3x5xf32>, tensor<1x192x3x?xf32>

Inception_v4_vaiq_int8.default.onnx.torch.mlir:403:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", native_vector_size = 16 : i64, target_triple = "x86_64-unknown-unknown-eabi-elf"}>
    %399 = torch.operator "onnx.Relu"(%398) : (!torch.vtensor<[32,192,25,25],f32>) -> !torch.vtensor<[32,192,25,25],f32> 
           ^
Inception_v4_vaiq_int8.default.onnx.torch.mlir:403:12: note: see current operation: 
"hal.executable.variant"() ({
  "hal.executable.export"() ({
  ^bb0(%arg18: !hal.device):
    %72 = "arith.constant"() <{value = 12 : index}> : () -> index
    %73 = "arith.constant"() <{value = 1 : index}> : () -> index
    "hal.return"(%72, %73, %73) : (index, index, index) -> ()
  }) {hal.interface.bindings = [#hal.interface.binding<0, 0>, #hal.interface.binding<0, 1>, #hal.interface.binding<0, 2>], layout = #hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer, ReadOnly>, <2, storage_buffer>]>]>, ordinal = 0 : index, sym_name = "torch_jit$async_dispatch_19_conv_2d_nchw_fchw_32x192x25x25x192x3x3_f32"} : () -> ()
  "builtin.module"() ({
    "func.func"() <{function_type = () -> (), sym_name = "torch_jit$async_dispatch_19_conv_2d_nchw_fchw_32x192x25x25x192x3x3_f32"}> ({
      %0 = "arith.constant"() <{value = dense<0.000000e+00> : vector<1x2x1x2xf32>}> : () -> vector<1x2x1x2xf32>
      %1 = "arith.constant"() <{value = dense<1.250000e-01> : vector<1x192x3x5xf32>}> : () -> vector<1x192x3x5xf32>
      %2 = "arith.constant"() <{value = 0 : i8}> : () -> i8
      %3 = "arith.constant"() <{value = 8 : index}> : () -> index
      %4 = "arith.constant"() <{value = 3 : index}> : () -> index
      %5 = "arith.constant"() <{value = 25 : index}> : () -> index
      %6 = "arith.constant"() <{value = 2 : index}> : () -> index
      %7 = "arith.constant"() <{value = 16 : index}> : () -> index
      %8 = "arith.constant"() <{value = 1 : index}> : () -> index
      %9 = "arith.constant"() <{value = 32 : index}> : () -> index
      %10 = "arith.constant"() <{value = 192 : index}> : () -> index
      %11 = "arith.constant"() <{value = 0.000000e+00 : f32}> : () -> f32
      %12 = "arith.constant"() <{value = 0 : index}> : () -> index
      %13 = "arith.constant"() <{value = 145961472 : index}> : () -> index
      %14 = "arith.constant"() <{value = 147288576 : index}> : () -> index
      %15 = "arith.constant"() <{value = 16613376 : index}> : () -> index
      %16 = "hal.interface.binding.subspan"(%12) {alignment = 64 : index, binding = 0 : index, descriptor_flags = 1 : i32, descriptor_type = #hal.descriptor_type<storage_buffer>, operandSegmentSizes = array<i32: 1, 0>, set = 0 : index} : (index) -> !flow.dispatch.tensor<readonly:tensor<32x192x52x52xi8>>
      %17 = "hal.interface.binding.subspan"(%13) {alignment = 64 : index, binding = 1 : index, descriptor_flags = 1 : i32, descriptor_type = #hal.descriptor_type<storage_buffer>, operandSegmentSizes = array<i32: 1, 0>, set = 0 : index} : (index) -> !flow.dispatch.tensor<readonly:tensor<192x192x3x3xf32>>
      %18 = "hal.interface.binding.subspan"(%14) {alignment = 64 : index, binding = 1 : index, descriptor_flags = 1 : i32, descriptor_type = #hal.descriptor_type<storage_buffer>, operandSegmentSizes = array<i32: 1, 0>, set = 0 : index} : (index) -> !flow.dispatch.tensor<readonly:tensor<192xf32>>
      %19 = "hal.interface.binding.subspan"(%15) {alignment = 64 : index, binding = 2 : index, descriptor_type = #hal.descriptor_type<storage_buffer>, operandSegmentSizes = array<i32: 1, 0>, set = 0 : index} : (index) -> !flow.dispatch.tensor<writeonly:tensor<32x192x25x25xf32>>
      %20 = "hal.interface.workgroup.id"() {dimension = 0 : index} : () -> index
      %21 = "hal.interface.workgroup.count"() {dimension = 0 : index} : () -> index
      %22 = "affine.apply"(%20) <{map = affine_map<()[s0] -> (s0 * 16)>}> : (index) -> index
      %23 = "affine.apply"(%21) <{map = affine_map<()[s0] -> (s0 * 16)>}> : (index) -> index
      %24 = "flow.dispatch.tensor.load"(%16) <{operandSegmentSizes = array<i32: 1, 0, 0, 0, 0>, static_offsets = array<i64: 0, 0, 0, 0>, static_sizes = array<i64: 32, 192, 51, 51>, static_strides = array<i64: 1, 1, 1, 1>}> : (!flow.dispatch.tensor<readonly:tensor<32x192x52x52xi8>>) -> tensor<32x192x51x51xi8>
      "scf.for"(%22, %10, %23) ({
      ^bb0(%arg0: index):
        %25 = "flow.dispatch.tensor.load"(%19, %arg0) <{operandSegmentSizes = array<i32: 1, 0, 1, 0, 0>, static_offsets = array<i64: 0, -9223372036854775808, 0, 0>, static_sizes = array<i64: 32, 16, 25, 25>, static_strides = array<i64: 1, 1, 1, 1>}> : (!flow.dispatch.tensor<writeonly:tensor<32x192x25x25xf32>>, index) -> tensor<32x16x25x25xf32>
        %26 = "flow.dispatch.tensor.load"(%17, %arg0) <{operandSegmentSizes = array<i32: 1, 0, 1, 0, 0>, static_offsets = array<i64: -9223372036854775808, 0, 0, 0>, static_sizes = array<i64: 16, 192, 3, 3>, static_strides = array<i64: 1, 1, 1, 1>}> : (!flow.dispatch.tensor<readonly:tensor<192x192x3x3xf32>>, index) -> tensor<16x192x3x3xf32>
        %27 = "flow.dispatch.tensor.load"(%18, %arg0) <{operandSegmentSizes = array<i32: 1, 0, 1, 0, 0>, static_offsets = array<i64: -9223372036854775808>, static_sizes = array<i64: 16>, static_strides = array<i64: 1>}> : (!flow.dispatch.tensor<readonly:tensor<192xf32>>, index) -> tensor<16xf32>
        %28 = "scf.for"(%12, %9, %8, %25) ({
        ^bb0(%arg1: index, %arg2: tensor<32x16x25x25xf32>):
          %29 = "scf.for"(%12, %7, %6, %arg2) ({
          ^bb0(%arg3: index, %arg4: tensor<32x16x25x25xf32>):
            %30 = "tensor.extract_slice"(%26, %arg3) <{operandSegmentSizes = array<i32: 1, 1, 0, 0>, static_offsets = array<i64: -9223372036854775808, 0, 0, 0>, static_sizes = array<i64: 2, 192, 3, 3>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<16x192x3x3xf32>, index) -> tensor<2x192x3x3xf32>
            %31 = "scf.for"(%12, %5, %8, %arg4) ({
            ^bb0(%arg5: index, %arg6: tensor<32x16x25x25xf32>):
              %32 = "affine.apply"(%arg5) <{map = affine_map<(d0) -> (d0 * 2)>}> : (index) -> index
              %33 = "scf.for"(%12, %5, %6, %arg6) ({
              ^bb0(%arg7: index, %arg8: tensor<32x16x25x25xf32>):
                %34 = "affine.min"(%arg7) <{map = affine_map<(d0) -> (-d0 + 25, 2)>}> : (index) -> index
                %35 = "affine.apply"(%arg7) <{map = affine_map<(d0) -> (d0 * 2)>}> : (index) -> index
                %36 = "affine.apply"(%34) <{map = affine_map<(d0) -> (d0 * 2 + 1)>}> : (index) -> index
                %37 = "tensor.extract_slice"(%24, %arg1, %32, %35, %36) <{operandSegmentSizes = array<i32: 1, 3, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0, -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: 1, 192, 3, -9223372036854775808>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<32x192x51x51xi8>, index, index, index, index) -> tensor<1x192x3x?xi8>
                %38 = "tensor.empty"(%36) : (index) -> tensor<1x192x3x?xf32>
                %39 = "vector.create_mask"(%8, %10, %4, %36) : (index, index, index, index) -> vector<1x192x3x5xi1>
                %40 = "vector.transfer_read"(%37, %12, %12, %12, %12, %2, %39) <{in_bounds = [true, true, true, true], operandSegmentSizes = array<i32: 1, 4, 1, 1>, permutation_map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>}> : (tensor<1x192x3x?xi8>, index, index, index, index, i8, vector<1x192x3x5xi1>) -> vector<1x192x3x5xi8>
                %41 = "arith.extsi"(%40) : (vector<1x192x3x5xi8>) -> vector<1x192x3x5xi32>
                %42 = "arith.sitofp"(%41) : (vector<1x192x3x5xi32>) -> vector<1x192x3x5xf32>
                %43 = "arith.mulf"(%42, %1) <{fastmath = #arith.fastmath<none>}> : (vector<1x192x3x5xf32>, vector<1x192x3x5xf32>) -> vector<1x192x3x5xf32>
                %44 = "vector.transfer_write"(%43, %38, %12, %12, %12, %12, %39) <{in_bounds = [true, true, true, true], operandSegmentSizes = array<i32: 1, 1, 4, 1>, permutation_map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>}> : (vector<1x192x3x5xf32>, tensor<1x192x3x?xf32>, index, index, index, index, vector<1x192x3x5xi1>) -> tensor<1x192x3x?xf32>
                %45 = "tensor.extract_slice"(%arg8, %arg1, %arg3, %arg5, %arg7, %34) <{operandSegmentSizes = array<i32: 1, 4, 1, 0>, static_offsets = array<i64: -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: 1, 2, 1, -9223372036854775808>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<32x16x25x25xf32>, index, index, index, index, index) -> tensor<1x2x1x?xf32>
                %46 = "vector.create_mask"(%8, %6, %8, %34) : (index, index, index, index) -> vector<1x2x1x2xi1>
                %47 = "vector.transfer_write"(%0, %45, %12, %12, %12, %12, %46) <{in_bounds = [true, true, true, true], operandSegmentSizes = array<i32: 1, 1, 4, 1>, permutation_map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>}> : (vector<1x2x1x2xf32>, tensor<1x2x1x?xf32>, index, index, index, index, vector<1x2x1x2xi1>) -> tensor<1x2x1x?xf32>
                %48 = "affine.apply"(%34) <{map = affine_map<(d0) -> (d0 * 2 - 1)>}> : (index) -> index
                %49 = "tensor.extract_slice"(%47, %34) <{operandSegmentSizes = array<i32: 1, 0, 1, 0>, static_offsets = array<i64: 0, 0, 0, 0>, static_sizes = array<i64: 1, 2, 1, -9223372036854775808>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<1x2x1x?xf32>, index) -> tensor<1x2x1x?xf32>
                %50 = "tensor.extract_slice"(%49, %34) <{operandSegmentSizes = array<i32: 1, 0, 1, 0>, static_offsets = array<i64: 0, 0, 0, 0>, static_sizes = array<i64: 1, 2, 1, -9223372036854775808>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<1x2x1x?xf32>, index) -> tensor<1x2x?xf32>
                %51 = "scf.for"(%12, %10, %3, %50) ({
                ^bb0(%arg9: index, %arg10: tensor<1x2x?xf32>):
                  %63 = "scf.for"(%12, %4, %8, %arg10) ({
                  ^bb0(%arg11: index, %arg12: tensor<1x2x?xf32>):
                    %64 = "scf.for"(%12, %4, %8, %arg12) ({
                    ^bb0(%arg13: index, %arg14: tensor<1x2x?xf32>):
                      %65 = "tensor.extract_slice"(%44, %arg9, %arg11, %arg13, %48) <{operandSegmentSizes = array<i32: 1, 3, 1, 0>, static_offsets = array<i64: 0, -9223372036854775808, -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: 1, 8, 1, -9223372036854775808>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<1x192x3x?xf32>, index, index, index, index) -> tensor<1x8x1x?xf32>
                      %66 = "tensor.extract_slice"(%30, %arg9, %arg11, %arg13) <{operandSegmentSizes = array<i32: 1, 3, 0, 0>, static_offsets = array<i64: 0, -9223372036854775808, -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: 2, 8, 1, 1>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<2x192x3x3xf32>, index, index, index) -> tensor<2x8x1x1xf32>
                      %67 = "tensor.extract_slice"(%65, %48) <{operandSegmentSizes = array<i32: 1, 0, 1, 0>, static_offsets = array<i64: 0, 0, 0, 0>, static_sizes = array<i64: 1, 8, 1, -9223372036854775808>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<1x8x1x?xf32>, index) -> tensor<1x8x?xf32>
                      %68 = "tensor.extract_slice"(%66) <{operandSegmentSizes = array<i32: 1, 0, 0, 0>, static_offsets = array<i64: 0, 0, 0, 0>, static_sizes = array<i64: 2, 8, 1, 1>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<2x8x1x1xf32>) -> tensor<2x8x1xf32>
                      %69 = "linalg.conv_1d_ncw_fcw"(%67, %68, %arg14) <{dilations = dense<1> : vector<1xi64>, operandSegmentSizes = array<i32: 2, 1>, strides = dense<2> : vector<1xi64>}> ({
                      ^bb0(%arg15: f32, %arg16: f32, %arg17: f32):
                        %70 = "arith.mulf"(%arg15, %arg16) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
                        %71 = "arith.addf"(%arg17, %70) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
                        "linalg.yield"(%71) : (f32) -> ()
                      }) {linalg.memoized_indexing_maps = [affine_map<(d0, d1, d2, d3, d4) -> (d0, d3, d2 * 2 + d4)>, affine_map<(d0, d1, d2, d3, d4) -> (d1, d3, d4)>, affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2)>]} : (tensor<1x8x?xf32>, tensor<2x8x1xf32>, tensor<1x2x?xf32>) -> tensor<1x2x?xf32>
                      "scf.yield"(%69) : (tensor<1x2x?xf32>) -> ()
                    }) : (index, index, index, tensor<1x2x?xf32>) -> tensor<1x2x?xf32>
                    "scf.yield"(%64) : (tensor<1x2x?xf32>) -> ()
                  }) : (index, index, index, tensor<1x2x?xf32>) -> tensor<1x2x?xf32>
                  "scf.yield"(%63) : (tensor<1x2x?xf32>) -> ()
                }) : (index, index, index, tensor<1x2x?xf32>) -> tensor<1x2x?xf32>
                %52 = "tensor.insert_slice"(%51, %49, %34) <{operandSegmentSizes = array<i32: 1, 1, 0, 1, 0>, static_offsets = array<i64: 0, 0, 0, 0>, static_sizes = array<i64: 1, 2, 1, -9223372036854775808>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<1x2x?xf32>, tensor<1x2x1x?xf32>, index) -> tensor<1x2x1x?xf32>
                %53 = "tensor.insert_slice"(%52, %47, %34) <{operandSegmentSizes = array<i32: 1, 1, 0, 1, 0>, static_offsets = array<i64: 0, 0, 0, 0>, static_sizes = array<i64: 1, 2, 1, -9223372036854775808>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<1x2x1x?xf32>, tensor<1x2x1x?xf32>, index) -> tensor<1x2x1x?xf32>
                %54 = "vector.transfer_read"(%27, %arg3, %11) <{in_bounds = [true], operandSegmentSizes = array<i32: 1, 1, 1, 0>, permutation_map = affine_map<(d0) -> (d0)>}> : (tensor<16xf32>, index, f32) -> vector<2xf32>
                %55 = "vector.broadcast"(%54) : (vector<2xf32>) -> vector<1x1x2x2xf32>
                %56 = "vector.transpose"(%55) <{permutation = array<i64: 0, 3, 1, 2>}> : (vector<1x1x2x2xf32>) -> vector<1x2x1x2xf32>
                %57 = "vector.transfer_read"(%53, %12, %12, %12, %12, %11, %46) <{in_bounds = [true, true, true, true], operandSegmentSizes = array<i32: 1, 4, 1, 1>, permutation_map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>}> : (tensor<1x2x1x?xf32>, index, index, index, index, f32, vector<1x2x1x2xi1>) -> vector<1x2x1x2xf32>
                %58 = "arith.addf"(%57, %56) <{fastmath = #arith.fastmath<none>}> : (vector<1x2x1x2xf32>, vector<1x2x1x2xf32>) -> vector<1x2x1x2xf32>
                %59 = "arith.cmpf"(%58, %0) <{fastmath = #arith.fastmath<none>, predicate = 9 : i64}> : (vector<1x2x1x2xf32>, vector<1x2x1x2xf32>) -> vector<1x2x1x2xi1>
                %60 = "arith.select"(%59, %58, %0) : (vector<1x2x1x2xi1>, vector<1x2x1x2xf32>, vector<1x2x1x2xf32>) -> vector<1x2x1x2xf32>
                %61 = "vector.transfer_write"(%60, %53, %12, %12, %12, %12, %46) <{in_bounds = [true, true, true, true], operandSegmentSizes = array<i32: 1, 1, 4, 1>, permutation_map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>}> : (vector<1x2x1x2xf32>, tensor<1x2x1x?xf32>, index, index, index, index, vector<1x2x1x2xi1>) -> tensor<1x2x1x?xf32>
                %62 = "tensor.insert_slice"(%61, %arg8, %arg1, %arg3, %arg5, %arg7, %34) <{operandSegmentSizes = array<i32: 1, 1, 4, 1, 0>, static_offsets = array<i64: -9223372036854775808, -9223372036854775808, -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: 1, 2, 1, -9223372036854775808>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<1x2x1x?xf32>, tensor<32x16x25x25xf32>, index, index, index, index, index) -> tensor<32x16x25x25xf32>
                "scf.yield"(%62) : (tensor<32x16x25x25xf32>) -> ()
              }) : (index, index, index, tensor<32x16x25x25xf32>) -> tensor<32x16x25x25xf32>
              "scf.yield"(%33) : (tensor<32x16x25x25xf32>) -> ()
            }) : (index, index, index, tensor<32x16x25x25xf32>) -> tensor<32x16x25x25xf32>
            "scf.yield"(%31) : (tensor<32x16x25x25xf32>) -> ()
          }) : (index, index, index, tensor<32x16x25x25xf32>) -> tensor<32x16x25x25xf32>
          "scf.yield"(%29) : (tensor<32x16x25x25xf32>) -> ()
        }) : (index, index, index, tensor<32x16x25x25xf32>) -> tensor<32x16x25x25xf32>
        "flow.dispatch.tensor.store"(%28, %19, %arg0) <{operandSegmentSizes = array<i32: 1, 1, 0, 1, 0, 0>, static_offsets = array<i64: 0, -9223372036854775808, 0, 0>, static_sizes = array<i64: 32, 16, 25, 25>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<32x16x25x25xf32>, !flow.dispatch.tensor<writeonly:tensor<32x192x25x25xf32>>, index) -> ()
        "scf.yield"() : () -> ()
      }) : (index, index, index) -> ()
      "func.return"() : () -> ()
    }) {translation_info = #iree_codegen.translation_info<CPUConvTileAndDecomposeExpert>} : () -> ()
  }) : () -> ()
  "hal.executable.variant_end"() : () -> ()
}) {sym_name = "embedded_elf_x86_64", target = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", native_vector_size = 16 : i64, target_triple = "x86_64-unknown-unknown-eabi-elf"}>} : () -> ()

Steps to reproduce your issue

git clone https://github.com/nod-ai/SHARK-TestSuite
cd SHARK-TestSuite/e2eshark/

  1. Passed: standalone torch-mlir-opt + iree: onnx -> torch -> linalg -> vmfb
    python ./run.py --torchmlirbuild ../../torch-mlir/build --tolerance 0.001 0.001 --cachedir ./huggingface_cache --ireebuild ../../iree-build -f onnx -g models --mode onnx --report --tests onnx/models/Inception_v4_vaiq_int8 --torchtolinalg

  2. Failed: iree: onnx -> vmfb:
    python ./run.py --tolerance 0.001 0.001 --cachedir ./huggingface_cache --ireebuild ../../iree-build -f onnx -g models --mode onnx --report --tests onnx/models/Inception_v4_vaiq_int8

You can find the Inception_v4_vaiq_int8.default.torch-onnx.mlir file by cd SHARK-TestSuite/e2eshark/test-run/onnx/models/Inception_v4_vaiq_int8

What component(s) does this issue relate to?

Compiler

Version information

iree: candidate-20240704.944
torch-mlir : ca0e9066755b35c0889c6ab792265b0886325f50

Additional context

No response

@AmosLewis AmosLewis added the bug 🐞 Something isn't working label Jul 9, 2024
@ScottTodd ScottTodd added the integrations/pytorch PyTorch integration work label Jul 10, 2024
@stellaraccident
Copy link
Collaborator

Those versions are pinned to the same torch-mlir commit, so I'm not sure there can be something missing. But what there may be is an issue between running the full iree pipeline and the torch-mlir pipeline.

To triage that, you can run iree-compile --compile-to=input to see how IREE pre-processes and see if/how that differs from torch-mlir-opt.

@stellaraccident
Copy link
Collaborator

The usual suspect is the createTorchToIREEPipeline having drifted from the upstream version (https://github.com/llvm/torch-mlir/blob/main/lib/Dialect/TorchConversion/Transforms/Passes.cpp#L65).

I've been wanting to extract the torch-mlir code for that into something more modular that can be shared for a while... but time...

@AmosLewis
Copy link
Contributor Author

Just test with candidate-20240809.980, inconsistency for Inception_v4_vaiq_int8 disappear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working integrations/pytorch PyTorch integration work
Projects
None yet
Development

No branches or pull requests

3 participants