-
Notifications
You must be signed in to change notification settings - Fork 576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from w7900 to using any persistent cache runner for CPU. #18322
Conversation
Signed-off-by: saienduri <[email protected]>
Co-authored-by: Scott Todd <[email protected]>
propagate descriptive change Co-authored-by: Scott Todd <[email protected]>
@@ -234,7 +238,11 @@ jobs: | |||
- name: cpu_llvm_task | |||
models-config-file: models_cpu_llvm_task.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mi250:
============================== slowest durations ===============================
597.91s call shark-test-suite-models/sdxl/test_unet.py::test_run_unet_cpu
292.26s call shark-test-suite-models/sdxl/test_vae.py::test_run_vae_cpu
38.20s call shark-test-suite-models/sdxl/test_unet.py::test_compile_unet_cpu
17.20s call shark-test-suite-models/sdxl/test_clip.py::test_compile_clip_cpu
https://github.com/iree-org/iree/actions/runs/10498668335/job/29084321533#step:7:126
😬
might be an improvement over the w7900 if we at least have enough CPU cores / runners to run jobs in parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah mi250 is a bit slower than w7900. Not sure how the mi300 fares, but it was initially why I picked w7900. Specifically the w7900 has an AMD Ryzen Threadripper PRO 7975WX 32-Cores while the mi 250 has an AMD EPYC 7713 64-Core Processor. The mi300 has an AMD EPYC 9454 48-Core Processor, so I don't think that will be any better. We should get another CI machine up with just the threadripper CPU in the lab :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if the job times are reasonable and the runners are picking up the jobs as expected.
The relevant tests worked as intended with the persistent-cache label, so merging |
This commit switches from w7900 to using any persistent cache runner for CPU model testing.
I also removed the persistent cache label from the w7900 runners because we don't have many and want to alleviate the load.
These jobs should now be running on the 16 mi250/mi300 runners we have
ci-exactly: build_packages,regression_test