-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tile and distribute linalg.generic in DispatchLinalgOnTensor #5159
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome! I am super excited about this landing! It address a lot of hidden issues by enabling this path.
Just a nit on comments for setting the launch config. Once it is addressed, it is good to go.
ShapedType outputShape = op.getOutputShapedType(0); | ||
|
||
SmallVector<int64_t, 4> sizes; | ||
// When Vectororization is not enabled the convertToGPU pass assumes that that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading comment. the ConvertToGPU
pass does not assume a tile size right? I am confused cause the launch config isnt even called in the ConvertToGPU
Pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, you're right I had the comment backward. The problem is that when vectorization is off the second level of tiling is skipped and we end up mapping one element to one thread. So the real problem is that if we picked something different in LaunchConfig the number of workgroup is wrong. I updated the comment, hopefully this is clear.
sizes.append({4 * subgroupSize, 2 * subgroupSize}); | ||
} | ||
sizes.push_back(subgroupSize); | ||
int64_t lowerTs = config.workgroupSize[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add comments to whats happening here. Hard to parse this a bit.
if (isa<linalg::ConvInputNHWCFilterHWCFOp, | ||
linalg::DepthwiseConvInputNHWCFilterHWCOp>(op)) { | ||
count.erase(count.begin()); | ||
size_t numParrallelLoops = getNumOuterParallelLoops(op); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh Nice! This was exactly what I was trying to do. Should work well with Hanhan's PR #5136
|
||
// As a second step mark all the element-wise linalg ops not fused as roots | ||
// so that they get tiled and distributed. | ||
for (linalg::LinalgOp linalgOp : linalgOps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the only reason we have the MakeDispatchWorkgroupsOp
pattern is for the fall back right? In other words in the ConvertToGPU
pass, the mapping of iterations of linalg
operation to global invocation ID is not used in the Linalg to tensors path. That would be great, cause then we can move all that loop distribution logic all the way in Flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, there is still few cases that don't tile, I can look at those at some point. Yes for all those cases the ConverToGPU
global invocation ID is not reached. It would be good to do the loop distribution in flow indeed.
The main case I have seen remaining is when the linalg affine map are not projected permutation.
Thanks Mahesh, please take another look. |
sizes.push_back(subgroupSize); | ||
// Use the first tile size that can divide the shape. If the shape is not | ||
// aligned on any of the tile sizes pick the smallest tile of one element per | ||
// thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a TODO here. FOr the linalg on tensors path this could actually be anything other than 1. It can be even on the old path with a few changes, but why bother when its going away. For now, leave a TODO and come back to it when we deprecate the old path.
for (linalg::LinalgOp linalgOp : linalgOps) { | ||
if (auto op = dyn_cast<linalg::GenericOp>(linalgOp.getOperation())) { | ||
if (getNumOuterParallelLoops(linalgOp) == 0 || | ||
llvm::any_of(linalgOp.getIndexingMaps(), [](AffineMap &map) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might just be getting mixed with the old paths. This should not be needed actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need this check otherwise some linalg op with weird affine map crash during tiling. I can look at why this happen after if you think this is not expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is not expected. I dont see why it would carsh (at least would be good to know the affine map. Is this because of reverse
? If so worth triaging upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mahesh, yes the problem is with reverse
, I'll work on fixing this part in a second step.
if (!rootOperation) { | ||
for (linalg::LinalgOp linalgOp : linalgOps) { | ||
if (auto op = dyn_cast<linalg::GenericOp>(linalgOp.getOperation())) { | ||
if (getNumOuterParallelLoops(linalgOp) == 0 || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we drop the old path, this will have to go through the new path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we still need this even in the new path right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, bad comment. I was saying that this is basically saying dont do anything with 0-rank operations. But those need to be handled as well (i.e. lowered to loops albeit no-op) at some point. I am guessing this isnt tested on the new path.
One thing to do here is just test what happens if a linalg.generic
is just 0-rank operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I can add a test for that.
I'll merge this PR, if there are still concerns with your comments I can address it after. |
This seems to break compiling mobilenet_v2 in iree-android-benchmark. To repro:
The mobilenet_v2.mlir can be downloaded from https://storage.googleapis.com/iree-model-artifacts/mobilenet-v2.tar.gz It fails in ConvertToSPIRVPass. The input IR of ConvertToSPIRVPass: https://gist.github.com/hanhanW/1f1c5cb70ff1486c04783ea052e2703d The error message is |
Oh, a bunch of size-1 vectors are generated. Those aren't supported by SPIR-V. (They should just be scalars.) @hanhanW: This is still checking the existing path? I think we should switch the benchmark tests to go via the new path. Also it would be nice to check in a faked weights MobileNet v2. Discovering breakage via benchmarking tests is not that good. :) @ThomasRaoux: can we revert this for now and have the issues fixed before landing it? |
Yes, let me revert it and I'll fix it offline. |
…iree-org#5159)" This reverts commit 156f0bb.
No description provided.