[CPU] Enable 'iree-llvmcpu-reassociate-fp-reductions' by default #13685

dcaballe · 2023-05-18T21:55:22Z

This PR is mostly to have a discussion about enabling fp reduction reassociation by default. When this flag is disabled, we are basically not vectorizing the reduction dimension at all, which results in extra unrolling of scalar instructions. I think it's a bit difficult that an external user really understands the implications of this flag and that it has to be enabled to get some performance on fp reductions. In my case, even though I'm aware of it, I always forget to use it and always waste time looking into "why this random sequence of instruction is scalar?" in the profiles. Also, in terms of accuracy, I think we are not paying the same attention to other operations that we are approximating by default (e.g., Math ops), so I think enabling this by default and providing a flag to disable it's reasonable. In the future, we may think about introducing a compilation mode that optimizes for accuracy more generally. WDYT?

This PR is mostly to have a discussion about enabling fp reduction reassociation by default. When this flag is disabled, we are basically not vectorizing the reduction dimension at all, which results in extra unrolling of scalar instructions. I think it's a bit difficult that an external user really understands the implications of this flag and that it has to be enabled to get some performances on fp reductions. In my case, even though I'm aware of it, I always forget to use it and always waste time looking into "why this random sequence ofinstruction is scalar?" in the profiles. Also, in terms of accuracy, I think we are not paying the same attention to other operations that we are approximating by default (e.g., Math ops), so I think enabling this by default and providing a flag to disable it's reasonable. In the future, we may think about introducing a compilation mode that optimizes for accuracy more generally. WDYT?

MaheshRavishankar · 2023-05-18T22:06:14Z

I am fine with enabling by default. For the other math ops that have similar precision issues, we should guard it by the same flag.....

MaheshRavishankar · 2023-05-18T22:29:49Z

The tests failures seem reasonable.... I am OK with enabling this flag, but just beware, we might get a lot of precision related bugs, cause we are now changing the precision.

github-actions · 2023-05-18T22:50:14Z

Abbreviated Benchmark Summary

@ commit e1b9c7a6c9f56dda4e1aeb5cdf383c749bf04da1 (vs. base 4795765b831d56d326fdc6de07e73e74dc7e405c)

Regressed Latencies 🚩

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV3Small\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags] local\_task(embedded\_elf)[4-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[little-core]	25.574 (vs. 22.033, 16.07%↑)	25.598	0.277
MobileSSD\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags] local\_task(embedded\_elf)[4-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[little-core]	152.524 (vs. 138.154, 10.40%↑)	152.373	0.665

Improved Latencies 🎉

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV3Small\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[4-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core]	957.449 (vs. 1209.381, 20.83%↓)	957.428	16.119
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[4-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core]	4752.208 (vs. 5843.818, 18.68%↓)	4805.548	115.906
MobileNetV2\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,mmt4d] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core]	20.715 (vs. 22.722, 8.83%↓)	20.874	0.306

[Top 3 out of 17 results showed]

No improved or regressed compilation metrics 🏖️

For more information:

Source Workflow Run

hanhanW

I'm fine with this, but please keep in mind that there could be numerical issues. (It would be awesome if we can control this kind of options under a single flag.)

hanhanW · 2023-05-19T17:06:04Z

tests/e2e/regression/BUILD.bazel

+)
+
+iree_check_single_backend_test_suite(
+    name = "check_regression_llvm-cpu",


Do we exclude the test in this test suite?

hanhanW · 2023-05-19T17:09:12Z

I did not notice that auto-merged was enabled.. Sorry about that if you intended to land it later..

dcaballe · 2023-05-19T17:41:37Z

Sorry, my fault. I sometimes enable it when I think the review is mostly done or it's a trivial change to minimize the turnaround if the PR is approved. Let me exclude the test.

hanhanW · 2023-05-19T17:47:51Z

I'm surprised that we can pass the test. I thought it should fail when I read the file name.

EDIT: I'm good with current state, I'm just surprised.

dcaballe · 2023-05-19T18:06:20Z

The test would fail but it's not part of the src or BACKEND_TESTS lists so it's not running for that suit where the reassoc flag is enabled by default.

…ult (#13685)" This reverts commit 55a2780.

…ult" (#13713) Reverts #13685 due to #13706

…e-org#13685) This PR enables fp reduction reassociation by default. When this flag is disabled, we are basically not vectorizing the reduction dimension at all, which results in extra unrolling of scalar instructions. It's difficult that an external user really understands the implications of this flag and that it has to be enabled to get some performance on fp reductions.

…ult" (iree-org#13713) Reverts iree-org#13685 due to iree-org#13706

…e-org#13685) This PR enables fp reduction reassociation by default. When this flag is disabled, we are basically not vectorizing the reduction dimension at all, which results in extra unrolling of scalar instructions. It's difficult that an external user really understands the implications of this flag and that it has to be enabled to get some performance on fp reductions.

dcaballe requested review from hanhanW and MaheshRavishankar as code owners May 18, 2023 21:55

dcaballe added benchmarks:x86_64 Run default x86_64 benchmarks benchmarks:comp-stats Run default compilation statistics benchmarks benchmarks:android-cpu Run default Android CPU benchmarks labels May 18, 2023

Fix tests

333855e

dcaballe enabled auto-merge (squash) May 19, 2023 06:29

hanhanW approved these changes May 19, 2023

View reviewed changes

dcaballe merged commit 55a2780 into iree-org:main May 19, 2023

hanhanW mentioned this pull request May 19, 2023

Failed to compile reduction + broadcast dispatch in CPU backend #13706

Closed

dcaballe added a commit that referenced this pull request May 20, 2023

Revert "[CPU] Enable 'iree-llvmcpu-reassociate-fp-reductions' by defa…

cc8908e

…ult (#13685)" This reverts commit 55a2780.

dcaballe mentioned this pull request May 20, 2023

Revert "[CPU] Enable 'iree-llvmcpu-reassociate-fp-reductions' by default" #13713

Merged

dcaballe added a commit that referenced this pull request May 20, 2023

Revert "[CPU] Enable 'iree-llvmcpu-reassociate-fp-reductions' by defa…

40a68af

…ult" (#13713) Reverts #13685 due to #13706

NatashaKnk pushed a commit to NatashaKnk/iree that referenced this pull request Jul 6, 2023

Revert "[CPU] Enable 'iree-llvmcpu-reassociate-fp-reductions' by defa…

728a143

…ult" (iree-org#13713) Reverts iree-org#13685 due to iree-org#13706

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Enable 'iree-llvmcpu-reassociate-fp-reductions' by default #13685

[CPU] Enable 'iree-llvmcpu-reassociate-fp-reductions' by default #13685

dcaballe commented May 18, 2023

MaheshRavishankar commented May 18, 2023

MaheshRavishankar commented May 18, 2023

github-actions bot commented May 18, 2023 •

edited

Loading

hanhanW left a comment

hanhanW May 19, 2023

hanhanW commented May 19, 2023

dcaballe commented May 19, 2023

hanhanW commented May 19, 2023 •

edited

Loading

dcaballe commented May 19, 2023

[CPU] Enable 'iree-llvmcpu-reassociate-fp-reductions' by default #13685

[CPU] Enable 'iree-llvmcpu-reassociate-fp-reductions' by default #13685

Conversation

dcaballe commented May 18, 2023

MaheshRavishankar commented May 18, 2023

MaheshRavishankar commented May 18, 2023

github-actions bot commented May 18, 2023 • edited Loading

Abbreviated Benchmark Summary

Regressed Latencies 🚩

Improved Latencies 🎉

hanhanW left a comment

Choose a reason for hiding this comment

hanhanW May 19, 2023

Choose a reason for hiding this comment

hanhanW commented May 19, 2023

dcaballe commented May 19, 2023

hanhanW commented May 19, 2023 • edited Loading

dcaballe commented May 19, 2023

github-actions bot commented May 18, 2023 •

edited

Loading

hanhanW commented May 19, 2023 •

edited

Loading