Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLIR][TORCH] Add TorchToTosa lowering for aten.where.self op #1454

Merged
merged 1 commit into from
Oct 18, 2022

Conversation

AmosLewis
Copy link
Collaborator

@AmosLewis AmosLewis commented Oct 3, 2022

Find this issue when lowering gpt2 to tosa dialect in SHARK with python nod-ai/SHARK#338

If you are boot camp torch-mlir:
Here is useful comments to understand the debug step: #961 (comment)

Here is the link to useful torch_mlir_debug_command.txt that will be helpful for boot camp and refer later in development of torch-mlir.

After you read all the following comments, for ONNX work, refer to this dc9ea08

@AmosLewis AmosLewis force-pushed the tosa-aten-where branch 2 times, most recently from a3a605d to 2446ece Compare October 3, 2022 21:41
@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Oct 3, 2022

@AmosLewis AmosLewis marked this pull request as ready for review October 3, 2022 22:31
@AmosLewis AmosLewis force-pushed the tosa-aten-where branch 4 times, most recently from e89a71f to 3c5faa6 Compare October 4, 2022 15:57
@AmosLewis
Copy link
Collaborator Author

In the test, it converts torch to tosa successfully. But why after that, it converts tosa to linalg? The tosa to linalg conversion arise the bug. @sjarus @eric-k256

> (mlir_venv) nod% python -m e2e_testing.main -f 'ElementwiseAtenWhereSelf' --config=tosa -v 
> Compiling ElementwiseAtenWhereSelfModule_basic...
> FAIL - "ElementwiseAtenWhereSelfModule_basic"
> 
> Unexpected outcome summary:
> 
> ****** Failed tests - 1 tests
>     FAIL - "ElementwiseAtenWhereSelfModule_basic"
>         Compilation error: Traceback (most recent call last):
>           File "/home/chi/src/ubuntu20/shark/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/framework.py", line 290, in compile_and_run_test
>             compiled = config.compile(test.program_factory())
>           File "/home/chi/src/ubuntu20/shark/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/tosa_backend.py", line 35, in compile
>             return self.backend.compile(module)
>           File "/home/chi/src/ubuntu20/shark/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/tosa_backends/linalg_on_tensors.py", line 57, in compile
>             run_pipeline_with_repro_report(
>           File "/home/chi/src/ubuntu20/shark/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/compiler_utils.py", line 73, in run_pipeline_with_repro_report
>             raise TorchMlirCompilerError(trimmed_message) from None
>         torch_mlir.compiler_utils.TorchMlirCompilerError: Lowering TOSA to Linalg-on-Tensors failed with the following diagnostics:
>         error: 'linalg.generic' op expected indexing_map #2 to have 4 dim(s) to match the number of loops
>         note: see current operation: 
>         %2 = "linalg.generic"(%1, %arg1, %arg2, %0) ({
>         ^bb0(%arg3: i1, %arg4: f32, %arg5: f32, %arg6: f32):
>         %3 = "arith.select"(%arg3, %arg4, %arg5) : (i1, f32, f32) -> f32
>         "linalg.yield"(%3) : (f32) -> ()
>         }) {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>, affine_map<() -> ()>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>], iterator_types = ["parallel", "parallel", "parallel", "parallel"], operand_segment_sizes = array<i32: 3, 1>} : (tensor<1x5x5xi1>, tensor<1x12x5x5xf32>, tensor<f32>, tensor<1x12x5x5xf32>) -> tensor<1x12x5x5xf32>

@ramiro050
Copy link
Collaborator

I think the code that generates the indexing maps for the linalg.generic op is not handling zero-rank tensors correctly.

@ramiro050
Copy link
Collaborator

I think this line should be using the result rank variable rank rather than the operand type.getRank():

https://github.com/llvm/llvm-project/blob/86bf43d2ab1334af1ca7cb10d407b5afe19fc65f/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp#L617

@AmosLewis AmosLewis requested a review from sjarus October 4, 2022 18:21
@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Oct 4, 2022

Yes. After replacing with rank, everything looks good.

    indexingMaps.push_back(AffineMap::get(
        /*dimCount=*/rank, /*symbolCount=*/0, affineExprs,
        rewriter.getContext()));

@AmosLewis AmosLewis force-pushed the tosa-aten-where branch 2 times, most recently from b0a1c48 to b854e20 Compare October 4, 2022 20:37
@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Oct 4, 2022

llvm to use llvm/llvm-project@main...AmosLewis:llvm-project:tosa-to-linalg

llvm patch waiting for review and merge : https://reviews.llvm.org/D135343

@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Oct 11, 2022

@AmosLewis
Copy link
Collaborator Author

Need this merged first. build: update llvm tag to 438e5918 #1475 Or this build: update llvm tag to d325d2b #1483

@ramiro050 Could you help uplift llvm to d325d2b?

@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Oct 18, 2022

Finish rebase and uplift the llvm to #1502.
But got a new build-test out-of-tree bug which is not from my patch.

  pip installing Pytorch..
  ERROR: Invalid requirement: '/main_checkout/torch-mlir/build_tools/../build_tools/python_deploy/wheelhouse/*'
  Hint: It looks like a path. File '/main_checkout/torch-mlir/build_tools/../build_tools/python_deploy/wheelhouse/*' does not exist.
  CMake Error at python/CMakeLists.txt:36 (message):
    Failed to run `build_libtorch.sh`

@ramiro050
Copy link
Collaborator

Finish rebase and uplift the llvm to #1502. But got a new build-test out-of-tree bug which is not from my patch.

  pip installing Pytorch..
  ERROR: Invalid requirement: '/main_checkout/torch-mlir/build_tools/../build_tools/python_deploy/wheelhouse/*'
  Hint: It looks like a path. File '/main_checkout/torch-mlir/build_tools/../build_tools/python_deploy/wheelhouse/*' does not exist.
  CMake Error at python/CMakeLists.txt:36 (message):
    Failed to run `build_libtorch.sh`

Is this happening locally on your machine? The CI seems to be building fine.

@AmosLewis AmosLewis requested a review from sjarus October 18, 2022 16:34
@ashay
Copy link
Collaborator

ashay commented Oct 18, 2022

Is this happening locally on your machine? The CI seems to be building fine.

It was happening in CI (because of a bogus cache entry) so I deleted that cache entry and restarted the build, which then passed. All is fine now, but I'll keep an eye out for more failures.

@ramiro050
Copy link
Collaborator

It was happening in CI (because of a bogus cache entry) so I deleted that cache entry and restarted the build, which then passed. All is fine now, but I'll keep an eye out for more failures.

Nice! Thanks for the quick fix!

@AmosLewis AmosLewis merged commit ad6f584 into llvm:main Oct 18, 2022
@AmosLewis AmosLewis deleted the tosa-aten-where branch February 8, 2023 16:15
@AmosLewis
Copy link
Collaborator Author

Here is the link to useful torch_mlir_debug_command.txt that will be helpful for boot camp and refer later in development of torch-mlir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants