Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVMCPU] Add an option for tiling reduction only to LLVMCPUTile. #13821

Merged
merged 3 commits into from
May 30, 2023

Conversation

hanhanW
Copy link
Contributor

@hanhanW hanhanW commented May 26, 2023

If the option is true, only tile the ops that has reduction loops. It is useful because it allows us to tile on reduction ops firstly and tileAndFuse on other operations later. We can greedily apply tileAndFuse on consumers because the reduction op will no longer be pulled in. There is a scf.for as barrier to stop fusion on reductions.

The changes to LLVMTileAndFuse is needed together because we follow the same pipeline behavior. Now we need to use TileAndFuse in last level of tiling for consumers. If there are no consumers, it should not be applied to reduction ops.

It is a step toward #13706 and #13474

If the option is true, only tile the ops that has reduction loops. It is
useful because it allows us to tile on reduction ops firstly and
tileAndFuse on other operations later. We can greedily apply tileAndFuse
on consumers because the reduction op will no longer be pulled in. There
is a scf.for as barrier to stop fusion on reductions.
@hanhanW hanhanW changed the title [LLVMCPU] Add an option to tile reduction only to LLVMCPUTile. [LLVMCPU] Add an option for tiling reduction only to LLVMCPUTile. May 26, 2023
@hanhanW hanhanW added benchmarks:x86_64 Run default x86_64 benchmarks benchmarks:comp-stats Run default compilation statistics benchmarks benchmarks:android-cpu Run default Android CPU benchmarks labels May 26, 2023
@github-actions
Copy link

github-actions bot commented May 27, 2023

@@ -447,7 +447,9 @@ void addMultiTilingExpertPassPipeline(OpPassManager &passManager,
nestedModulePM.addNestedPass<func::FuncOp>(
createLLVMCPUSplitReductionPass(clEnableReassociateFpReductions));
nestedModulePM.addNestedPass<func::FuncOp>(
createLLVMCPUTilePass(numLevels - 1));
createLLVMCPUTilePass(numLevels - 1, /*reductionOnly=*/true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ordering might be problematic..... Once you tile the reduction, there is no real opportunity for tile and fuse... Why do we need the tile and fuse after this layer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order is intended. This is for the last level of tiling, i.e., the tiling level right before vectorization. We want to tile ops individually. We firstly tile the reduction ops, and then handle the consumer ops. There are no differences between tile and TileAndFuse if we have a single consumer op. They will just tile the consumer op. But it's important if there is a consumer ops chain, .g., reduction + broadcast + tensor.pack/pad ops. I want to tile and fuse broadcast + pack op in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(maybe I should add a comment, that would help others to understand that it's intended)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still not getting why we need this order. Couldn't we just use a loop and pass the right enum values for each dim as we are doing in other pipelines?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Now that I get it) THis is what I meant https://github.com/hanhanW/iree/blob/multi-lowering-config/compiler/src/iree/compiler/Codegen/LLVMCPU/Passes.cpp#LL438C1-L438C1

Instead of invoking the pass for all the dimensions, couldn't we just add a loop that takes care of the parallel/reduction dimensions using the enums?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the idea.. I thought using enums will be done in your multi-level tiling PR.

What I did here is creating different passes for the last level tiling (which intends to be vector level). What do you mean by "invoking the pass for all the dimensions"? Am I doing it now?

The reduction tiling still take the same config (e.g., [0, 0, 16]) for tiling. It only tile the reduction loop.

TileAndFuse for consumer ops are tricky. They reuse the same configuration. I'm still working on adding multi configs support. I see this is a transition state and it is a incremental change towards multi lowering configs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we only have enums for workgroups right now it seems? https://github.com/openxla/iree/blob/main/compiler/src/iree/compiler/Codegen/LLVMCPU/KernelDispatch.h#L22

I was hoping we could do the dimension selection/filtering on the caller side, as we do in hanhanW/iree@multi-lowering-config/compiler/src/iree/compiler/Codegen/LLVMCPU/Passes.cpp#LL438C1-L438C1, so that LLVMCPUTile/TileAndFuse only have to worry about the tiling and not the filtering. If that is not possible, please go ahead with this as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we only have enums for workgroups right now. The passes themselves only worry about the tiling. The filtering is handled by us, i.e., when we create the pass.

@hanhanW hanhanW enabled auto-merge (squash) May 30, 2023 18:29
@@ -69,7 +69,7 @@ std::unique_ptr<OperationPass<func::FuncOp>> createLLVMCPUTileAndFusePass(

/// Pass to tile TilingInterface ops with given tilingLevel.
std::unique_ptr<OperationPass<func::FuncOp>> createLLVMCPUTilePass(
int64_t tilingLevel = -1);
int64_t tilingLevel = -1, bool reductionOnly = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused. Why do we need this flag? Couldn't we use the tilingLevel to pass the specific reduction level we want to tile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to tile and fuse consumer ops for vector level. Let's take reduction + broadcast + pack as an example. What I want is

scf.for ... // Tiling on reduction loop
  reduction op
scf.for ... // Tile and fuse for broadcast + pack
  broadcast
  pack

If the reductionOnly flag is not passed, the pass will tile all the ops individually, which results in

scf.for ... // Tiling on reduction loop
  reduction op
scf.for ... // Tiling on broadcast op
  broadcast
scf.for ... // Tiling on pack op
  pack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks! Got it now

@@ -447,7 +447,9 @@ void addMultiTilingExpertPassPipeline(OpPassManager &passManager,
nestedModulePM.addNestedPass<func::FuncOp>(
createLLVMCPUSplitReductionPass(clEnableReassociateFpReductions));
nestedModulePM.addNestedPass<func::FuncOp>(
createLLVMCPUTilePass(numLevels - 1));
createLLVMCPUTilePass(numLevels - 1, /*reductionOnly=*/true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still not getting why we need this order. Couldn't we just use a loop and pass the right enum values for each dim as we are doing in other pipelines?

@hanhanW hanhanW disabled auto-merge May 30, 2023 18:47
@hanhanW hanhanW merged commit c9c2e83 into iree-org:main May 30, 2023
@hanhanW hanhanW deleted the multi-lowering-config branch May 30, 2023 20:39
hanhanW added a commit that referenced this pull request May 31, 2023
hanhanW added a commit that referenced this pull request May 31, 2023
…ile." (#13867)

Reverts #13821

It introduces definition issue about third tile size list. Revert the commit and we will land it in another way, which should have concrete definition for each tile list.
jvstokes pushed a commit to jvstokes/iree that referenced this pull request Jun 4, 2023
…ee-org#13821)

If the option is true, only tile the ops that has reduction loops. It is useful because it allows us to tile on reduction ops firstly and tileAndFuse on other operations later. We can greedily apply tileAndFuse on consumers because the reduction op will no longer be pulled in. There is a scf.for as barrier to stop fusion on reductions.

The changes to LLVMTileAndFuse is needed together because we follow the same pipeline behavior. Now we need to use TileAndFuse in last level of tiling for consumers. If there are no consumers, it will not be applied on reduction ops.

It is a step toward iree-org#13706 and iree-org#13474
jvstokes pushed a commit to jvstokes/iree that referenced this pull request Jun 4, 2023
…ile." (iree-org#13867)

Reverts iree-org#13821

It introduces definition issue about third tile size list. Revert the commit and we will land it in another way, which should have concrete definition for each tile list.
NatashaKnk pushed a commit to NatashaKnk/iree that referenced this pull request Jul 6, 2023
…ee-org#13821)

If the option is true, only tile the ops that has reduction loops. It is useful because it allows us to tile on reduction ops firstly and tileAndFuse on other operations later. We can greedily apply tileAndFuse on consumers because the reduction op will no longer be pulled in. There is a scf.for as barrier to stop fusion on reductions.

The changes to LLVMTileAndFuse is needed together because we follow the same pipeline behavior. Now we need to use TileAndFuse in last level of tiling for consumers. If there are no consumers, it will not be applied on reduction ops.

It is a step toward iree-org#13706 and iree-org#13474
NatashaKnk pushed a commit to NatashaKnk/iree that referenced this pull request Jul 6, 2023
…ile." (iree-org#13867)

Reverts iree-org#13821

It introduces definition issue about third tile size list. Revert the commit and we will land it in another way, which should have concrete definition for each tile list.
nhasabni pushed a commit to plaidml/iree that referenced this pull request Aug 24, 2023
…ee-org#13821)

If the option is true, only tile the ops that has reduction loops. It is useful because it allows us to tile on reduction ops firstly and tileAndFuse on other operations later. We can greedily apply tileAndFuse on consumers because the reduction op will no longer be pulled in. There is a scf.for as barrier to stop fusion on reductions.

The changes to LLVMTileAndFuse is needed together because we follow the same pipeline behavior. Now we need to use TileAndFuse in last level of tiling for consumers. If there are no consumers, it will not be applied on reduction ops.

It is a step toward iree-org#13706 and iree-org#13474
nhasabni pushed a commit to plaidml/iree that referenced this pull request Aug 24, 2023
…ile." (iree-org#13867)

Reverts iree-org#13821

It introduces definition issue about third tile size list. Revert the commit and we will land it in another way, which should have concrete definition for each tile list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarks:android-cpu Run default Android CPU benchmarks benchmarks:comp-stats Run default compilation statistics benchmarks benchmarks:x86_64 Run default x86_64 benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants