Adjust the universal flow-level padding for narrow static-sized dimensions #14206

bjacob · 2023-06-23T20:29:19Z

This is part-solution, part-reframing for #11632. The immediate motivation is that nod-ai/SHARK#1581 is about a model that is all vector-times-matrix matmuls, and at the moment we are padding everything to the next multiple of 16 in Flow (which is the topic of #11632). To get good performance on mat-vec we need to stop doing that.

There was some existing logic in MaterializeEncoding to adjust tile sizes to narrow static dimensions (adjustTileSizesToNarrowStaticShape) but it was framed as a local implementation detail, which prevented letting Flow take advantage of it to say: "Since MaterializeEncoding will never generate tiles greater than this along this narrow dimension, I also don't need to pad more than this".

This PR changes that into a real contract between Flow (SetEncoding) and HAL (MaterializeEncoding).

At the moment, there is an e2e matmul test failure specifically only with the VMVX backend with ukernels, with 1x1 matrices: now that they aren't padded anymore to 16x16, the ukernel is being called with the same data pointer for the LHS and RHS matrices. (See printfs intentionally left in for now). @benvanik @stellaraccident any idea?

MaheshRavishankar · 2023-06-23T21:16:25Z

I know this is a draft, but we this is probably not the right path. The easiest thing to do would be to add a new operation

%hi:2 = iree_linalg_ext.encoding_padding_value <encoding Attr>

and use %hi#0, %hi# as the padding value in the tensor.pad operation created.

%pad = tensor.pad %unpacked_tensor low[0, 0] high[%hi#0, %h1#1]

During MaterializeEncoding we have all the information we need to convert the %hi#0 and %hi# into a scalar value and then we will not overpad....

MaheshRavishankar

(see comments)

bjacob · 2023-07-10T21:02:29Z

I know this is a draft, but we this is probably not the right path. The easiest thing to do would be to add a new operation
%hi:2 = iree_linalg_ext.encoding_padding_value <encoding Attr>
and use %hi#0, %hi# as the padding value in the tensor.pad operation created.
%pad = tensor.pad %unpacked_tensor low[0, 0] high[%hi#0, %h1#1]
During MaterializeEncoding we have all the information we need to convert the %hi#0 and %hi# into a scalar value and then we will not overpad....

Thanks @MaheshRavishankar for the suggestion. Implementing it in #14349 . Closing this as superseded by that.

tile-1x1

510acc5

bjacob force-pushed the tile-1x1 branch from 7ad355a to 510acc5 Compare June 23, 2023 20:37

MaheshRavishankar requested changes Jun 23, 2023

View reviewed changes

bjacob closed this Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust the universal flow-level padding for narrow static-sized dimensions #14206

Adjust the universal flow-level padding for narrow static-sized dimensions #14206

bjacob commented Jun 23, 2023

MaheshRavishankar commented Jun 23, 2023

MaheshRavishankar left a comment

bjacob commented Jul 10, 2023

Adjust the universal flow-level padding for narrow static-sized dimensions #14206

Adjust the universal flow-level padding for narrow static-sized dimensions #14206

Conversation

bjacob commented Jun 23, 2023

MaheshRavishankar commented Jun 23, 2023

MaheshRavishankar left a comment

Choose a reason for hiding this comment

bjacob commented Jul 10, 2023