Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tile and distribute linalg.generic in DispatchLinalgOnTensor #5159

Merged
merged 3 commits into from
Mar 19, 2021

Conversation

ThomasRaoux
Copy link
Contributor

No description provided.

Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome! I am super excited about this landing! It address a lot of hidden issues by enabling this path.

Just a nit on comments for setting the launch config. Once it is addressed, it is good to go.

ShapedType outputShape = op.getOutputShapedType(0);

SmallVector<int64_t, 4> sizes;
// When Vectororization is not enabled the convertToGPU pass assumes that that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading comment. the ConvertToGPU pass does not assume a tile size right? I am confused cause the launch config isnt even called in the ConvertToGPU Pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, you're right I had the comment backward. The problem is that when vectorization is off the second level of tiling is skipped and we end up mapping one element to one thread. So the real problem is that if we picked something different in LaunchConfig the number of workgroup is wrong. I updated the comment, hopefully this is clear.

sizes.append({4 * subgroupSize, 2 * subgroupSize});
}
sizes.push_back(subgroupSize);
int64_t lowerTs = config.workgroupSize[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add comments to whats happening here. Hard to parse this a bit.

if (isa<linalg::ConvInputNHWCFilterHWCFOp,
linalg::DepthwiseConvInputNHWCFilterHWCOp>(op)) {
count.erase(count.begin());
size_t numParrallelLoops = getNumOuterParallelLoops(op);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh Nice! This was exactly what I was trying to do. Should work well with Hanhan's PR #5136


// As a second step mark all the element-wise linalg ops not fused as roots
// so that they get tiled and distributed.
for (linalg::LinalgOp linalgOp : linalgOps) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the only reason we have the MakeDispatchWorkgroupsOp pattern is for the fall back right? In other words in the ConvertToGPU pass, the mapping of iterations of linalg operation to global invocation ID is not used in the Linalg to tensors path. That would be great, cause then we can move all that loop distribution logic all the way in Flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, there is still few cases that don't tile, I can look at those at some point. Yes for all those cases the ConverToGPU global invocation ID is not reached. It would be good to do the loop distribution in flow indeed.
The main case I have seen remaining is when the linalg affine map are not projected permutation.

@ThomasRaoux
Copy link
Contributor Author

Thanks Mahesh, please take another look.

sizes.push_back(subgroupSize);
// Use the first tile size that can divide the shape. If the shape is not
// aligned on any of the tile sizes pick the smallest tile of one element per
// thread.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a TODO here. FOr the linalg on tensors path this could actually be anything other than 1. It can be even on the old path with a few changes, but why bother when its going away. For now, leave a TODO and come back to it when we deprecate the old path.

for (linalg::LinalgOp linalgOp : linalgOps) {
if (auto op = dyn_cast<linalg::GenericOp>(linalgOp.getOperation())) {
if (getNumOuterParallelLoops(linalgOp) == 0 ||
llvm::any_of(linalgOp.getIndexingMaps(), [](AffineMap &map) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might just be getting mixed with the old paths. This should not be needed actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need this check otherwise some linalg op with weird affine map crash during tiling. I can look at why this happen after if you think this is not expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is not expected. I dont see why it would carsh (at least would be good to know the affine map. Is this because of reverse? If so worth triaging upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mahesh, yes the problem is with reverse, I'll work on fixing this part in a second step.

if (!rootOperation) {
for (linalg::LinalgOp linalgOp : linalgOps) {
if (auto op = dyn_cast<linalg::GenericOp>(linalgOp.getOperation())) {
if (getNumOuterParallelLoops(linalgOp) == 0 ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we drop the old path, this will have to go through the new path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still need this even in the new path right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, bad comment. I was saying that this is basically saying dont do anything with 0-rank operations. But those need to be handled as well (i.e. lowered to loops albeit no-op) at some point. I am guessing this isnt tested on the new path.
One thing to do here is just test what happens if a linalg.generic is just 0-rank operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I can add a test for that.

@ThomasRaoux
Copy link
Contributor Author

I'll merge this PR, if there are still concerns with your comments I can address it after.

@ThomasRaoux ThomasRaoux merged commit 156f0bb into iree-org:main Mar 19, 2021
@hanhanW
Copy link
Contributor

hanhanW commented Mar 19, 2021

This seems to break compiling mobilenet_v2 in iree-android-benchmark.

To repro:

iree-translate --iree-mlir-to-vm-bytecode-module --iree-hal-target-backends=vulkan-spirv ~/mobilenet_v2.mlir -o a.vmfb --iree-spirv-enable-vectorization --iree-vulkan-target-triple=valhall-g77-unknown-android10

The mobilenet_v2.mlir can be downloaded from https://storage.googleapis.com/iree-model-artifacts/mobilenet-v2.tar.gz

It fails in ConvertToSPIRVPass. The input IR of ConvertToSPIRVPass: https://gist.github.com/hanhanW/1f1c5cb70ff1486c04783ea052e2703d

The error message is failed to legalize operation 'vector.broadcast'

@antiagainst
Copy link
Contributor

Oh, a bunch of size-1 vectors are generated. Those aren't supported by SPIR-V. (They should just be scalars.)

@hanhanW: This is still checking the existing path? I think we should switch the benchmark tests to go via the new path.

Also it would be nice to check in a faked weights MobileNet v2. Discovering breakage via benchmarking tests is not that good. :)

@ThomasRaoux: can we revert this for now and have the issues fixed before landing it?

@ThomasRaoux
Copy link
Contributor Author

Oh, a bunch of size-1 vectors are generated. Those aren't supported by SPIR-V. (They should just be scalars.)

@hanhanW: This is still checking the existing path? I think we should switch the benchmark tests to go via the new path.

Also it would be nice to check in a faked weights MobileNet v2. Discovering breakage via benchmarking tests is not that good. :)

@ThomasRaoux: can we revert this for now and have the issues fixed before landing it?

Yes, let me revert it and I'll fix it offline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants