-
Notifications
You must be signed in to change notification settings - Fork 576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLVMCPU] Add an option for tiling reduction only to LLVMCPUTile. #13821
Conversation
If the option is true, only tile the ops that has reduction loops. It is useful because it allows us to tile on reduction ops firstly and tileAndFuse on other operations later. We can greedily apply tileAndFuse on consumers because the reduction op will no longer be pulled in. There is a scf.for as barrier to stop fusion on reductions.
@@ -447,7 +447,9 @@ void addMultiTilingExpertPassPipeline(OpPassManager &passManager, | |||
nestedModulePM.addNestedPass<func::FuncOp>( | |||
createLLVMCPUSplitReductionPass(clEnableReassociateFpReductions)); | |||
nestedModulePM.addNestedPass<func::FuncOp>( | |||
createLLVMCPUTilePass(numLevels - 1)); | |||
createLLVMCPUTilePass(numLevels - 1, /*reductionOnly=*/true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ordering might be problematic..... Once you tile the reduction, there is no real opportunity for tile and fuse... Why do we need the tile and fuse after this layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order is intended. This is for the last level of tiling, i.e., the tiling level right before vectorization. We want to tile ops individually. We firstly tile the reduction ops, and then handle the consumer ops. There are no differences between tile and TileAndFuse if we have a single consumer op. They will just tile the consumer op. But it's important if there is a consumer ops chain, .g., reduction + broadcast + tensor.pack/pad ops. I want to tile and fuse broadcast + pack
op in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(maybe I should add a comment, that would help others to understand that it's intended)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm still not getting why we need this order. Couldn't we just use a loop and pass the right enum values for each dim as we are doing in other pipelines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Now that I get it) THis is what I meant https://github.com/hanhanW/iree/blob/multi-lowering-config/compiler/src/iree/compiler/Codegen/LLVMCPU/Passes.cpp#LL438C1-L438C1
Instead of invoking the pass for all the dimensions, couldn't we just add a loop that takes care of the parallel/reduction dimensions using the enums?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get the idea.. I thought using enums will be done in your multi-level tiling PR.
What I did here is creating different passes for the last level tiling (which intends to be vector level). What do you mean by "invoking the pass for all the dimensions"? Am I doing it now?
The reduction tiling still take the same config (e.g., [0, 0, 16]) for tiling. It only tile the reduction loop.
TileAndFuse for consumer ops are tricky. They reuse the same configuration. I'm still working on adding multi configs support. I see this is a transition state and it is a incremental change towards multi lowering configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we only have enums for workgroups right now it seems? https://github.com/openxla/iree/blob/main/compiler/src/iree/compiler/Codegen/LLVMCPU/KernelDispatch.h#L22
I was hoping we could do the dimension selection/filtering on the caller side, as we do in hanhanW/iree@multi-lowering-config/compiler/src/iree/compiler/Codegen/LLVMCPU/Passes.cpp#LL438C1-L438C1, so that LLVMCPUTile/TileAndFuse only have to worry about the tiling and not the filtering. If that is not possible, please go ahead with this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we only have enums for workgroups right now. The passes themselves only worry about the tiling. The filtering is handled by us, i.e., when we create the pass.
@@ -69,7 +69,7 @@ std::unique_ptr<OperationPass<func::FuncOp>> createLLVMCPUTileAndFusePass( | |||
|
|||
/// Pass to tile TilingInterface ops with given tilingLevel. | |||
std::unique_ptr<OperationPass<func::FuncOp>> createLLVMCPUTilePass( | |||
int64_t tilingLevel = -1); | |||
int64_t tilingLevel = -1, bool reductionOnly = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused. Why do we need this flag? Couldn't we use the tilingLevel
to pass the specific reduction level we want to tile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to tile and fuse consumer ops for vector level. Let's take reduction + broadcast + pack as an example. What I want is
scf.for ... // Tiling on reduction loop
reduction op
scf.for ... // Tile and fuse for broadcast + pack
broadcast
pack
If the reductionOnly
flag is not passed, the pass will tile all the ops individually, which results in
scf.for ... // Tiling on reduction loop
reduction op
scf.for ... // Tiling on broadcast op
broadcast
scf.for ... // Tiling on pack op
pack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks! Got it now
@@ -447,7 +447,9 @@ void addMultiTilingExpertPassPipeline(OpPassManager &passManager, | |||
nestedModulePM.addNestedPass<func::FuncOp>( | |||
createLLVMCPUSplitReductionPass(clEnableReassociateFpReductions)); | |||
nestedModulePM.addNestedPass<func::FuncOp>( | |||
createLLVMCPUTilePass(numLevels - 1)); | |||
createLLVMCPUTilePass(numLevels - 1, /*reductionOnly=*/true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm still not getting why we need this order. Couldn't we just use a loop and pass the right enum values for each dim as we are doing in other pipelines?
…ee-org#13821) If the option is true, only tile the ops that has reduction loops. It is useful because it allows us to tile on reduction ops firstly and tileAndFuse on other operations later. We can greedily apply tileAndFuse on consumers because the reduction op will no longer be pulled in. There is a scf.for as barrier to stop fusion on reductions. The changes to LLVMTileAndFuse is needed together because we follow the same pipeline behavior. Now we need to use TileAndFuse in last level of tiling for consumers. If there are no consumers, it will not be applied on reduction ops. It is a step toward iree-org#13706 and iree-org#13474
…ile." (iree-org#13867) Reverts iree-org#13821 It introduces definition issue about third tile size list. Revert the commit and we will land it in another way, which should have concrete definition for each tile list.
…ee-org#13821) If the option is true, only tile the ops that has reduction loops. It is useful because it allows us to tile on reduction ops firstly and tileAndFuse on other operations later. We can greedily apply tileAndFuse on consumers because the reduction op will no longer be pulled in. There is a scf.for as barrier to stop fusion on reductions. The changes to LLVMTileAndFuse is needed together because we follow the same pipeline behavior. Now we need to use TileAndFuse in last level of tiling for consumers. If there are no consumers, it will not be applied on reduction ops. It is a step toward iree-org#13706 and iree-org#13474
…ile." (iree-org#13867) Reverts iree-org#13821 It introduces definition issue about third tile size list. Revert the commit and we will land it in another way, which should have concrete definition for each tile list.
…ee-org#13821) If the option is true, only tile the ops that has reduction loops. It is useful because it allows us to tile on reduction ops firstly and tileAndFuse on other operations later. We can greedily apply tileAndFuse on consumers because the reduction op will no longer be pulled in. There is a scf.for as barrier to stop fusion on reductions. The changes to LLVMTileAndFuse is needed together because we follow the same pipeline behavior. Now we need to use TileAndFuse in last level of tiling for consumers. If there are no consumers, it will not be applied on reduction ops. It is a step toward iree-org#13706 and iree-org#13474
…ile." (iree-org#13867) Reverts iree-org#13821 It introduces definition issue about third tile size list. Revert the commit and we will land it in another way, which should have concrete definition for each tile list.
If the option is true, only tile the ops that has reduction loops. It is useful because it allows us to tile on reduction ops firstly and tileAndFuse on other operations later. We can greedily apply tileAndFuse on consumers because the reduction op will no longer be pulled in. There is a scf.for as barrier to stop fusion on reductions.
The changes to LLVMTileAndFuse is needed together because we follow the same pipeline behavior. Now we need to use TileAndFuse in last level of tiling for consumers. If there are no consumers, it should not be applied to reduction ops.
It is a step toward #13706 and #13474