Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust the universal flow-level padding for narrow static-sized dimensions #14206

Closed
wants to merge 1 commit into from

Conversation

bjacob
Copy link
Contributor

@bjacob bjacob commented Jun 23, 2023

This is part-solution, part-reframing for #11632. The immediate motivation is that nod-ai/SHARK#1581 is about a model that is all vector-times-matrix matmuls, and at the moment we are padding everything to the next multiple of 16 in Flow (which is the topic of #11632). To get good performance on mat-vec we need to stop doing that.

There was some existing logic in MaterializeEncoding to adjust tile sizes to narrow static dimensions (adjustTileSizesToNarrowStaticShape) but it was framed as a local implementation detail, which prevented letting Flow take advantage of it to say: "Since MaterializeEncoding will never generate tiles greater than this along this narrow dimension, I also don't need to pad more than this".

This PR changes that into a real contract between Flow (SetEncoding) and HAL (MaterializeEncoding).

At the moment, there is an e2e matmul test failure specifically only with the VMVX backend with ukernels, with 1x1 matrices: now that they aren't padded anymore to 16x16, the ukernel is being called with the same data pointer for the LHS and RHS matrices. (See printfs intentionally left in for now). @benvanik @stellaraccident any idea?

@MaheshRavishankar
Copy link
Contributor

I know this is a draft, but we this is probably not the right path. The easiest thing to do would be to add a new operation

%hi:2 = iree_linalg_ext.encoding_padding_value <encoding Attr>

and use %hi#0, %hi# as the padding value in the tensor.pad operation created.

%pad = tensor.pad %unpacked_tensor low[0, 0] high[%hi#0, %h1#1]

During MaterializeEncoding we have all the information we need to convert the %hi#0 and %hi# into a scalar value and then we will not overpad....

Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see comments)

@bjacob
Copy link
Contributor Author

bjacob commented Jul 10, 2023

I know this is a draft, but we this is probably not the right path. The easiest thing to do would be to add a new operation

%hi:2 = iree_linalg_ext.encoding_padding_value <encoding Attr>

and use %hi#0, %hi# as the padding value in the tensor.pad operation created.

%pad = tensor.pad %unpacked_tensor low[0, 0] high[%hi#0, %h1#1]

During MaterializeEncoding we have all the information we need to convert the %hi#0 and %hi# into a scalar value and then we will not overpad....

Thanks @MaheshRavishankar for the suggestion. Implementing it in #14349 . Closing this as superseded by that.

@bjacob bjacob closed this Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants