-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run all framework sanity check tests and organize jobs. #18420
Conversation
.github/workflows/pkgci.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this file and pkgci_test_onnx.yml
could be separated, but it made more sense to me to group them together. fx_importer_test and onnx_importer_test weren't running before because they didn't have a natural place to run. This gives them that place by moving around the other workflow jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not super happy from a naming perspective with having test like pkgci_test_amd_mi300.yml where the platform is what stands in the foreground and now having tests were a framework stands in the foreground and that runs on multiple hw platforms. Maybe we can rework this someday.
Anyway, definitely a step into the right direction 👍
jobs: | ||
cross_compile: | ||
android_arm64: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed that the Android and RISCV cross compile jobs are sharing a cache (named "cross_compile") when they shouldn't. Pushed another commit that gives each a unique cache key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not super happy from a naming perspective with having test like pkgci_test_amd_mi300.yml where the platform is what stands in the foreground and now having tests were a framework stands in the foreground and that runs on multiple hw platforms. Maybe we can rework this someday.
Anyway, definitely a step into the right direction 👍
Agreed. Currently, some large chunks of code are copy/pasted, and grouping by platform would make that worse. I had ideas for making that easier on nod-ai/SHARK-TestSuite#288.
Fixes iree-org#16624 by running the existing ONNX and PyTorch importer tests _with the packages they need installed_. Sample logs when a test fails: https://github.com/iree-org/iree/actions/runs/10691656074/job/29638920091?pr=18420#step:9:19 ``` Traceback (most recent call last): File "/home/runner/work/iree/iree/compiler/bindings/python/test/extras/fx_importer_test.py", line 8, in <module> from iree.compiler.extras import fx_importer File "/home/runner/work/iree/iree/.venv/lib/python3.11/site-packages/iree/compiler/extras/fx_importer.py", line 138, in <module> from .._mlir_libs._torchMlir import get_int64_max, get_int64_min ModuleNotFoundError: No module named 'iree.compiler._mlir_libs._torchMlir' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/runner/work/iree/iree/compiler/bindings/python/test/extras/fx_importer_test.py", line [19](https://github.com/iree-org/iree/actions/runs/10691656074/job/29638920091?pr=18420#step:9:20), in <module> raise ModuleNotFoundError( ModuleNotFoundError: Failed to import the fx_importer (for a reason other than torch not being found) Error: Process completed with exit code 1. ``` --- I'm not really satisfied with how these tests are distributed across jobs either before or after these changes, but I think this is a step in a good direction at least. * These tests depend on optional packages (torch, onnx, tensorflow) and disable themselves if those optional packages are not present. * The core project build (CMake/CTest, Python, packaging builds) strives to be modular and not require the entire kitchen sink to function. * Test workflows should make sense for both local development _and_ CI usage. The local development flows here are relatively convoluted and could use some work.
Fixes iree-org#16624 by running the existing ONNX and PyTorch importer tests _with the packages they need installed_. Sample logs when a test fails: https://github.com/iree-org/iree/actions/runs/10691656074/job/29638920091?pr=18420#step:9:19 ``` Traceback (most recent call last): File "/home/runner/work/iree/iree/compiler/bindings/python/test/extras/fx_importer_test.py", line 8, in <module> from iree.compiler.extras import fx_importer File "/home/runner/work/iree/iree/.venv/lib/python3.11/site-packages/iree/compiler/extras/fx_importer.py", line 138, in <module> from .._mlir_libs._torchMlir import get_int64_max, get_int64_min ModuleNotFoundError: No module named 'iree.compiler._mlir_libs._torchMlir' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/runner/work/iree/iree/compiler/bindings/python/test/extras/fx_importer_test.py", line [19](https://github.com/iree-org/iree/actions/runs/10691656074/job/29638920091?pr=18420#step:9:20), in <module> raise ModuleNotFoundError( ModuleNotFoundError: Failed to import the fx_importer (for a reason other than torch not being found) Error: Process completed with exit code 1. ``` --- I'm not really satisfied with how these tests are distributed across jobs either before or after these changes, but I think this is a step in a good direction at least. * These tests depend on optional packages (torch, onnx, tensorflow) and disable themselves if those optional packages are not present. * The core project build (CMake/CTest, Python, packaging builds) strives to be modular and not require the entire kitchen sink to function. * Test workflows should make sense for both local development _and_ CI usage. The local development flows here are relatively convoluted and could use some work.
Fixes #16624 by running the existing ONNX and PyTorch importer tests with the packages they need installed.
Sample logs when a test fails: https://github.com/iree-org/iree/actions/runs/10691656074/job/29638920091?pr=18420#step:9:19
I'm not really satisfied with how these tests are distributed across jobs either before or after these changes, but I think this is a step in a good direction at least.