-
Notifications
You must be signed in to change notification settings - Fork 576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] Enable 'iree-llvmcpu-reassociate-fp-reductions' by default #13685
Conversation
This PR is mostly to have a discussion about enabling fp reduction reassociation by default. When this flag is disabled, we are basically not vectorizing the reduction dimension at all, which results in extra unrolling of scalar instructions. I think it's a bit difficult that an external user really understands the implications of this flag and that it has to be enabled to get some performances on fp reductions. In my case, even though I'm aware of it, I always forget to use it and always waste time looking into "why this random sequence ofinstruction is scalar?" in the profiles. Also, in terms of accuracy, I think we are not paying the same attention to other operations that we are approximating by default (e.g., Math ops), so I think enabling this by default and providing a flag to disable it's reasonable. In the future, we may think about introducing a compilation mode that optimizes for accuracy more generally. WDYT?
I am fine with enabling by default. For the other math ops that have similar precision issues, we should guard it by the same flag..... |
The tests failures seem reasonable.... I am OK with enabling this flag, but just beware, we might get a lot of precision related bugs, cause we are now changing the precision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this, but please keep in mind that there could be numerical issues. (It would be awesome if we can control this kind of options under a single flag.)
) | ||
|
||
iree_check_single_backend_test_suite( | ||
name = "check_regression_llvm-cpu", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we exclude the test in this test suite?
I did not notice that auto-merged was enabled.. Sorry about that if you intended to land it later.. |
Sorry, my fault. I sometimes enable it when I think the review is mostly done or it's a trivial change to minimize the turnaround if the PR is approved. Let me exclude the test. |
I'm surprised that we can pass the test. I thought it should fail when I read the file name. EDIT: I'm good with current state, I'm just surprised. |
The test would fail but it's not part of the |
…e-org#13685) This PR enables fp reduction reassociation by default. When this flag is disabled, we are basically not vectorizing the reduction dimension at all, which results in extra unrolling of scalar instructions. It's difficult that an external user really understands the implications of this flag and that it has to be enabled to get some performance on fp reductions.
…e-org#13685) This PR enables fp reduction reassociation by default. When this flag is disabled, we are basically not vectorizing the reduction dimension at all, which results in extra unrolling of scalar instructions. It's difficult that an external user really understands the implications of this flag and that it has to be enabled to get some performance on fp reductions.
…e-org#13685) This PR enables fp reduction reassociation by default. When this flag is disabled, we are basically not vectorizing the reduction dimension at all, which results in extra unrolling of scalar instructions. It's difficult that an external user really understands the implications of this flag and that it has to be enabled to get some performance on fp reductions.
…e-org#13685) This PR enables fp reduction reassociation by default. When this flag is disabled, we are basically not vectorizing the reduction dimension at all, which results in extra unrolling of scalar instructions. It's difficult that an external user really understands the implications of this flag and that it has to be enabled to get some performance on fp reductions.
This PR is mostly to have a discussion about enabling fp reduction reassociation by default. When this flag is disabled, we are basically not vectorizing the reduction dimension at all, which results in extra unrolling of scalar instructions. I think it's a bit difficult that an external user really understands the implications of this flag and that it has to be enabled to get some performance on fp reductions. In my case, even though I'm aware of it, I always forget to use it and always waste time looking into "why this random sequence of instruction is scalar?" in the profiles. Also, in terms of accuracy, I think we are not paying the same attention to other operations that we are approximating by default (e.g., Math ops), so I think enabling this by default and providing a flag to disable it's reasonable. In the future, we may think about introducing a compilation mode that optimizes for accuracy more generally. WDYT?