Adjust the universal flow-level padding for narrow static-sized dimensions #14206
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is part-solution, part-reframing for #11632. The immediate motivation is that nod-ai/SHARK#1581 is about a model that is all vector-times-matrix matmuls, and at the moment we are padding everything to the next multiple of 16 in Flow (which is the topic of #11632). To get good performance on mat-vec we need to stop doing that.
There was some existing logic in MaterializeEncoding to adjust tile sizes to narrow static dimensions (
adjustTileSizesToNarrowStaticShape
) but it was framed as a local implementation detail, which prevented letting Flow take advantage of it to say: "Since MaterializeEncoding will never generate tiles greater than this along this narrow dimension, I also don't need to pad more than this".This PR changes that into a real contract between Flow (SetEncoding) and HAL (MaterializeEncoding).
At the moment, there is an e2e matmul test failure specifically only with the VMVX backend with ukernels, with 1x1 matrices: now that they aren't padded anymore to 16x16, the ukernel is being called with the same data pointer for the LHS and RHS matrices. (See printfs intentionally left in for now). @benvanik @stellaraccident any idea?