Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples/torchscript_resnet18.py doesn't work because incompatible function arguments #2298

Closed
yinrun opened this issue Jul 11, 2023 · 7 comments

Comments

@yinrun
Copy link
Contributor

yinrun commented Jul 11, 2023

I was try to compile torch-mlir and run the testcase examples/torchscript_resnet18.py, does anyone else came across such kind of problem?

Traceback (most recent call last):
File "/home/yinrun/hp_workspace/torch-mlir/examples/torchscript_resnet18.py", line 70, in
module = torch_mlir.compile(resnet18, torch.ones(1, 3, 224, 224), output_type="linalg-on-tensors")
File "/home/yinrun/hp_workspace/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/init.py", line 359, in compile
class_annotator.exportNone(scripted._c._type())
TypeError: exportNone(): incompatible function arguments. The following argument types are supported:
1. (self: torch_mlir._mlir_libs._jit_ir_importer.ClassAnnotator, arg0: c10::ClassType) -> None

Invoked with: ClassAnnotator {
}
, torch.torchvision.models.resnet.ResNet

@yinrun
Copy link
Contributor Author

yinrun commented Aug 17, 2023

don't know the reason, try to build pytorch from source code and it works

@bilibiliGO283
Copy link

bilibiliGO283 commented Nov 22, 2023

I got the same error as you when running unit tests

...
TypeError: exportNone(): incompatible function arguments. The following argument types are supported:
...

I did not recompile torch. I updated the versions in pytorch-requirements.txt and torchvision-requirements.txt :

torchvision==0.17.0.dev20231121
torch==2.2.0.dev20231121

Then recompile torch-mlir and it work

@stellaraccident
Copy link
Collaborator

I got an initial report of this this morning on: nod-ai/SHARK-Studio#1989 but we hadn't concluded if it was isolated/something wrong with that use case.

We did determine that this was not broken as of: https://github.com/llvm/torch-mlir/releases/tag/snapshot-20231119.1027

Given the timing of the issue, we suspect that whatever it is broke as a side effect of this patch (#2582) but don't have a working theory yet as to why this has started flaking.

It appears to be a permutation of a very old issue from the early days that perhaps was never truly squashed. If memory serves, the problem comes from the PyTorch and torch-mlir Python extensions seeing different subsets of certain key class identity symbols. The only thing that changed in the above patch is the mechanism of determining PyTorch compilation flags, and that is precisely the thing that was causing this old issue (it is necessary to "massage" pybind11 to agree on the C++ ABI when dealing with binary packages). That at least gives me a lead to follow for tomorrow.

For the moment, I would recommend pinning to the above release or syncing to prior to the above patch if you are experiencing this issue.

@stellaraccident
Copy link
Collaborator

Can you also confirm for me what compiler you are using and see if you can get CMake logs like these from your invocation:

-- Checking PyTorch ABI settings...
-- PyTorch C++ Dual ABI setting: "0"
-- PyTorch C++ ABI version: "11"
-- libtorch_python CXXFLAGS is ...-D_GLIBCXX_USE_CXX11_ABI=0 -U__GXX_ABI_VERSION -D__GXX_ABI_VERSION=1011 '-DPYBIND11_COMPILER_TYPE="_gcc"'

@monorimet
Copy link
Contributor

monorimet commented Nov 22, 2023

Can you also confirm for me what compiler you are using and see if you can get CMake logs like these from your invocation:

-- Checking PyTorch ABI settings...
-- PyTorch C++ Dual ABI setting: "0"
-- PyTorch C++ ABI version: "11"
-- libtorch_python CXXFLAGS is ...-D_GLIBCXX_USE_CXX11_ABI=0 -U__GXX_ABI_VERSION -D__GXX_ABI_VERSION=1011 '-DPYBIND11_COMPILER_TYPE="_gcc"'

This is what I get on a source build of torch-mlir (via python setup.py bdist_wheel ), which, when invoked through python, has the same issue with ClassAnnotator

-- Enabling PyTorch C++ dep (features depend on it)                                                                                                                                                                                                                                                                                
-- Checking for PyTorch using /home/ean/SHARK/shark.venv/bin/python ...                                                                                                                                                                                                                                                            
-- Found PyTorch installation at /home/ean/SHARK/shark.venv/lib/python3.11/site-packages/torch/share/cmake                                                                                                                                                                                                                         
-- Attempting to locate libtorch as a sibling to the project: /home/ean/torch-mlir/../libtorch/share/cmake/Torch                                                                                                                                                                                                                   
CMake Warning at /home/ean/SHARK/shark.venv/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):                                                                                                                                                                                                   
  static library kineto_LIBRARY-NOTFOUND not found.                                                                                                                                                                                                                                                                                
Call Stack (most recent call first):                                                                                                                                                                                                                                                                                               
  /home/ean/SHARK/shark.venv/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)                               
  /home/ean/torch-mlir/projects/CMakeLists.txt:17 (find_package)                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                                   
-- Found Torch: /home/ean/SHARK/shark.venv/lib/python3.11/site-packages/torch/lib/libtorch.so (Required is at least version "1.11")                                                                                                                                                                                                
-- libtorch_python CXXFLAGS is ...                                                                                                                                                                                                                                                                                                 
-- TORCH_LIBRARIES = torch;torch_library;/home/ean/SHARK/shark.venv/lib/python3.11/site-packages/torch/lib/libc10.so                                                                                                                                                                                                               
-- Linking TorchMLIRJITImporter with torch;torch_library;/home/ean/SHARK/shark.venv/lib/python3.11/site-packages/torch/lib/libc10.so                                                                                                                                                                                               
-- TORCH_CXXFLAGS= -Wno-pedantic                                                                                                                                                                                                                                                                                                   
-- Building PyTorch1 compatibility project                                                                                                                                                                                                                                                                                         
-- LTC Backend build is enabled                                                                                                                                  
-- TORCH_CXXFLAGS= -Wno-pedantic      

Is this what you were referring to?

Also, compiler:

-- The C compiler identification is GNU 11.4.0                                                                                                                                                                                                                                                                                     
-- The CXX compiler identification is GNU 11.4.0

Isn't the torch libc10.so mismatched against the compiler and cmake flags?

@stellaraccident
Copy link
Collaborator

Yes, I expect that is the issue. And thanks for confirming this is with GCC.

The -- libtorch_python CXXFLAGS is ... being empty is likely a problem. That should both be encoding the compiler being used and the CXXABI level, which PyTorch is often pinned to a different version than the system default.

That means that the torch-mlir Python extensions are not being compiled to be compatible with the PyTorch version with which they need to mate, and the result will be that native Python types defined in PyTorch proper will appear to be distinct types from those in the torch-mlir extensions. And that will result in the signature mismatch errors you see. Without detecting the right ABI flags, it is a coin toss whether your system defaults line up.

This is useful. I need to repro this setup and find a fix. I'm building with a different setup which is likely why I'm not seeing it.

@stellaraccident
Copy link
Collaborator

The plot thickens. I repro'd this situation but only on the very first cmake invocation in a build directory. In subsequent configures, it detects the flags properly. A bad theory is forming in my mind. In the prior arrangement, we were configuring PyTorch multiple times in each directory that needed it. I think this equaled (somehow) it being mis-computed wrong once but then somehow latching correctly for the others. It is probably not observably fatal if most of the places that were doing this got it wrong, so coin flips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants