examples/torchscript_resnet18.py doesn't work because incompatible function arguments #2298

yinrun · 2023-07-11T12:51:42Z

I was try to compile torch-mlir and run the testcase examples/torchscript_resnet18.py, does anyone else came across such kind of problem?

Traceback (most recent call last):
File "/home/yinrun/hp_workspace/torch-mlir/examples/torchscript_resnet18.py", line 70, in
module = torch_mlir.compile(resnet18, torch.ones(1, 3, 224, 224), output_type="linalg-on-tensors")
File "/home/yinrun/hp_workspace/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/init.py", line 359, in compile
class_annotator.exportNone(scripted._c._type())
TypeError: exportNone(): incompatible function arguments. The following argument types are supported:
1. (self: torch_mlir._mlir_libs._jit_ir_importer.ClassAnnotator, arg0: c10::ClassType) -> None

Invoked with: ClassAnnotator {
}
, torch.torchvision.models.resnet.ResNet

yinrun · 2023-08-17T10:34:38Z

don't know the reason, try to build pytorch from source code and it works

bilibiliGO283 · 2023-11-22T06:36:42Z

I got the same error as you when running unit tests

...
TypeError: exportNone(): incompatible function arguments. The following argument types are supported:
...

I did not recompile torch. I updated the versions in pytorch-requirements.txt and torchvision-requirements.txt :

torchvision==0.17.0.dev20231121
torch==2.2.0.dev20231121

Then recompile torch-mlir and it work

stellaraccident · 2023-11-22T07:04:33Z

I got an initial report of this this morning on: nod-ai/SHARK-Studio#1989 but we hadn't concluded if it was isolated/something wrong with that use case.

We did determine that this was not broken as of: https://github.com/llvm/torch-mlir/releases/tag/snapshot-20231119.1027

Given the timing of the issue, we suspect that whatever it is broke as a side effect of this patch (#2582) but don't have a working theory yet as to why this has started flaking.

It appears to be a permutation of a very old issue from the early days that perhaps was never truly squashed. If memory serves, the problem comes from the PyTorch and torch-mlir Python extensions seeing different subsets of certain key class identity symbols. The only thing that changed in the above patch is the mechanism of determining PyTorch compilation flags, and that is precisely the thing that was causing this old issue (it is necessary to "massage" pybind11 to agree on the C++ ABI when dealing with binary packages). That at least gives me a lead to follow for tomorrow.

For the moment, I would recommend pinning to the above release or syncing to prior to the above patch if you are experiencing this issue.

stellaraccident · 2023-11-22T07:06:11Z

Can you also confirm for me what compiler you are using and see if you can get CMake logs like these from your invocation:

-- Checking PyTorch ABI settings...
-- PyTorch C++ Dual ABI setting: "0"
-- PyTorch C++ ABI version: "11"
-- libtorch_python CXXFLAGS is ...-D_GLIBCXX_USE_CXX11_ABI=0 -U__GXX_ABI_VERSION -D__GXX_ABI_VERSION=1011 '-DPYBIND11_COMPILER_TYPE="_gcc"'

monorimet · 2023-11-22T20:53:08Z

Can you also confirm for me what compiler you are using and see if you can get CMake logs like these from your invocation:
-- Checking PyTorch ABI settings...
-- PyTorch C++ Dual ABI setting: "0"
-- PyTorch C++ ABI version: "11"
-- libtorch_python CXXFLAGS is ...-D_GLIBCXX_USE_CXX11_ABI=0 -U__GXX_ABI_VERSION -D__GXX_ABI_VERSION=1011 '-DPYBIND11_COMPILER_TYPE="_gcc"'

This is what I get on a source build of torch-mlir (via python setup.py bdist_wheel ), which, when invoked through python, has the same issue with ClassAnnotator

-- Enabling PyTorch C++ dep (features depend on it)                                                                                                                                                                                                                                                                                
-- Checking for PyTorch using /home/ean/SHARK/shark.venv/bin/python ...                                                                                                                                                                                                                                                            
-- Found PyTorch installation at /home/ean/SHARK/shark.venv/lib/python3.11/site-packages/torch/share/cmake                                                                                                                                                                                                                         
-- Attempting to locate libtorch as a sibling to the project: /home/ean/torch-mlir/../libtorch/share/cmake/Torch                                                                                                                                                                                                                   
CMake Warning at /home/ean/SHARK/shark.venv/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):                                                                                                                                                                                                   
  static library kineto_LIBRARY-NOTFOUND not found.                                                                                                                                                                                                                                                                                
Call Stack (most recent call first):                                                                                                                                                                                                                                                                                               
  /home/ean/SHARK/shark.venv/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)                               
  /home/ean/torch-mlir/projects/CMakeLists.txt:17 (find_package)                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                                   
-- Found Torch: /home/ean/SHARK/shark.venv/lib/python3.11/site-packages/torch/lib/libtorch.so (Required is at least version "1.11")                                                                                                                                                                                                
-- libtorch_python CXXFLAGS is ...                                                                                                                                                                                                                                                                                                 
-- TORCH_LIBRARIES = torch;torch_library;/home/ean/SHARK/shark.venv/lib/python3.11/site-packages/torch/lib/libc10.so                                                                                                                                                                                                               
-- Linking TorchMLIRJITImporter with torch;torch_library;/home/ean/SHARK/shark.venv/lib/python3.11/site-packages/torch/lib/libc10.so                                                                                                                                                                                               
-- TORCH_CXXFLAGS= -Wno-pedantic                                                                                                                                                                                                                                                                                                   
-- Building PyTorch1 compatibility project                                                                                                                                                                                                                                                                                         
-- LTC Backend build is enabled                                                                                                                                  
-- TORCH_CXXFLAGS= -Wno-pedantic

Is this what you were referring to?

Also, compiler:

-- The C compiler identification is GNU 11.4.0                                                                                                                                                                                                                                                                                     
-- The CXX compiler identification is GNU 11.4.0

Isn't the torch libc10.so mismatched against the compiler and cmake flags?

stellaraccident · 2023-11-22T21:01:43Z

Yes, I expect that is the issue. And thanks for confirming this is with GCC.

The -- libtorch_python CXXFLAGS is ... being empty is likely a problem. That should both be encoding the compiler being used and the CXXABI level, which PyTorch is often pinned to a different version than the system default.

That means that the torch-mlir Python extensions are not being compiled to be compatible with the PyTorch version with which they need to mate, and the result will be that native Python types defined in PyTorch proper will appear to be distinct types from those in the torch-mlir extensions. And that will result in the signature mismatch errors you see. Without detecting the right ABI flags, it is a coin toss whether your system defaults line up.

This is useful. I need to repro this setup and find a fix. I'm building with a different setup which is likely why I'm not seeing it.

stellaraccident · 2023-11-23T01:42:43Z

The plot thickens. I repro'd this situation but only on the very first cmake invocation in a build directory. In subsequent configures, it detects the flags properly. A bad theory is forming in my mind. In the prior arrangement, we were configuring PyTorch multiple times in each directory that needed it. I think this equaled (somehow) it being mis-computed wrong once but then somehow latching correctly for the others. It is probably not observably fatal if most of the places that were doing this got it wrong, so coin flips.

stellaraccident closed this as completed in 66b73ed Nov 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples/torchscript_resnet18.py doesn't work because incompatible function arguments #2298

examples/torchscript_resnet18.py doesn't work because incompatible function arguments #2298

yinrun commented Jul 11, 2023

yinrun commented Aug 17, 2023

bilibiliGO283 commented Nov 22, 2023 •

edited

Loading

stellaraccident commented Nov 22, 2023

stellaraccident commented Nov 22, 2023

monorimet commented Nov 22, 2023 •

edited

Loading

stellaraccident commented Nov 22, 2023

stellaraccident commented Nov 23, 2023

examples/torchscript_resnet18.py doesn't work because incompatible function arguments #2298

examples/torchscript_resnet18.py doesn't work because incompatible function arguments #2298

Comments

yinrun commented Jul 11, 2023

yinrun commented Aug 17, 2023

bilibiliGO283 commented Nov 22, 2023 • edited Loading

stellaraccident commented Nov 22, 2023

stellaraccident commented Nov 22, 2023

monorimet commented Nov 22, 2023 • edited Loading

stellaraccident commented Nov 22, 2023

stellaraccident commented Nov 23, 2023

bilibiliGO283 commented Nov 22, 2023 •

edited

Loading

monorimet commented Nov 22, 2023 •

edited

Loading