-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tracking] E2EShark Model Tests Onnx Mode #566
Comments
Please add Path to the logs directory to make it clear where to look for them. For model <model_name> they are located here assuming SHARK-TestSuite/e2eshark/test-onnx is the test run directory:
|
Working on the "Add" Issue. Please assign: #586 |
A regression on 2024-04-30 https://github.com/nod-ai/e2eshark-reports/blob/main/2024-04-30/onnx_reports/statusreport.md |
I'm not sure what is the cause of the discrepancy with the current list of issues, but with an up-to-date torch-mlir with a few minor edits to the recent work in fuse quantized ops, here's a triage list for torch-mlir failures when running:
list of failures (and a brief triage):
|
Ty! On it. Will dive deeper tomorrow. Also, when posting commands, will so love it if you could do
I think a sizeable chunk have each repo directly cloned to our home dir so something like this would be directly runnable. |
the immediate issue is the dynamic dims in the input torch.vtensor<[?,256,?,?,?],f32> checked causing the compile error. Next would be the 3-d input |
…ering (#3351) Addresses [Shark-Turbine #196](nod-ai/SHARK-TestSuite#196) Related tracker [Shark-Turbine #566](nod-ai/SHARK-ModelDev#566) Related onnx.Resize issues [Shark-Turbine #616](nod-ai/SHARK-ModelDev#616)
…ering (llvm#3351) Addresses [Shark-Turbine llvm#196](nod-ai/SHARK-TestSuite#196) Related tracker [Shark-Turbine llvm#566](nod-ai/SHARK-ModelDev#566) Related onnx.Resize issues [Shark-Turbine llvm#616](nod-ai/SHARK-ModelDev#616)
3 pytorch model failed again from 2024-05-29 to 2024-05-30 e2eshark-reports.
|
Any idea what they are failing on? |
Not sure, working with @saienduri to figure it out. I just tested with 0530 torch-mlir d7b8f00 and iree candidate-20240530.909 locally, they passed. It's kind of weird. Sai think it might pass with latest iree, let's see what's going on with 0531 report. |
We have rooted the 3 model regression to 40 passes in https://github.com/nod-ai/e2eshark-reports/tree/main/2024-05-31 to the convert-torch-onnx-to-torch pass being outdated in iree (generating different mlirs compared to torch-mlir TOM). So, once torch-mlir gets bumped in iree, they should pass again :) |
This addresses 7 of the model failures I'm seeing in the test suite. See [Shark-Turbine issue #566](nod-ai/SHARK-ModelDev#566). Need the op ```linalg.conv_2d_ngchw_gfchw_q``` to be added upstream before merging this. See [llvm-project PR #92136 ](llvm/llvm-project#92136). A small additional expansion to operand quantization is included in this patch to address a model failure that occurs when unblocking the quantized group convolutions in one of these onnx models.
…ering (llvm#3351) Addresses [Shark-Turbine llvm#196](nod-ai/SHARK-TestSuite#196) Related tracker [Shark-Turbine llvm#566](nod-ai/SHARK-ModelDev#566) Related onnx.Resize issues [Shark-Turbine llvm#616](nod-ai/SHARK-ModelDev#616)
This addresses 7 of the model failures I'm seeing in the test suite. See [Shark-Turbine issue llvm#566](nod-ai/SHARK-ModelDev#566). Need the op ```linalg.conv_2d_ngchw_gfchw_q``` to be added upstream before merging this. See [llvm-project PR #92136 ](llvm/llvm-project#92136). A small additional expansion to operand quantization is included in this patch to address a model failure that occurs when unblocking the quantized group convolutions in one of these onnx models.
Below is the list of issues we are hitting when running vision int8 models end to end using onnx mode (onnx export/import -> torch-mlir -> iree-compile -> iree-runtime). You can find the models that lead to each issue in the issue description.
To reproduce the error, please setup SHARK-TestSuite and then run the run.py file with the respective command line flags (More Instructions can be found here.)
To fix the issue, you need to either modify the OnnxToTorch lowering of the corresponding op or add the missing support in the TorchToLinalg lowering. You can find more information in either the model-run.log or iree-compile.log after running the test. This can help you create a smaller repro and then try to fix that, then check if it fixes the model.
You can find the specific logs on what is failing in these locations for <model_name> where SHARK-TestSuite/e2eshark/test-onnx is test run directory:
Issues:
torch-to-linalg
-> Add transpose before ExtractSliceOp iree-org/iree#17574 @IanWood1
-> Add preprocessing TransposeExtractConcat pass iree-org/iree#17692 @IanWood1
-> [TorchToLinalg] remove
extract_slice
grid_sample lowering llvm/torch-mlir#3483onnx/RAFT_vaiq_int8
iree:
ConvNeXt_vaiq_int8 onnx model compile failed #809
dpn68_vaiq / dpn92_vaiq / dpn98_vaiq / dpn107_vaiq / dpn131_vaiq
skresnet34_vaiq / skresnet18_vaiq
DeepLabV3_resnet50_vaiq_int8 / RAFT_vaiq_int8 / U-2-Net_vaiq_int8 / RAFT_vaiq_int8
pytorch/bart-large
onnx/opt-125M-awq
Onnx VAIQ Models
To run all tests :
python run.py --torchmlirbuild /path_to/torch-mlir/build --ireebuild /path-to/iree-build --cachedir /path-to/model-cache-dir -r test-onnx --tolerance .001 .001 --mode onnx --report -f onnx -g models
To run specific test (ex: onnx/models/AlexNet_vaiq_int8)
python run.py --torchmlirbuild /path_to/torch-mlir/build --ireebuild /path-to/iree-build --cachedir /path-to/model-cache-dir -r test-onnx --tolerance .001 .001 --mode onnx --report --tests onnx/models/AlexNet_vaiq_int8
Versions:
torch-mlir
- main -a7302a68
iree
- main -40f25334d2
Status:
Check the latest run report in e2eshark-reports :
e2eshark-reports/<DATE>/onnx_reports/statusreport.md
onnx model Pass(with --torchtolinalg):28/34, Day:08/08
pytorch model , Pass(with --torchtolinalg): 4/17/28, Day: 08/08
The text was updated successfully, but these errors were encountered: