[GlobalOptimization] 1x1 filter convolutions not converted to matmul #18710

Max191 · 2024-10-07T18:49:23Z

The Convert1x1FilterConvToMatmul pass currently fails when there is a non-unit batch N dimension. In such cases, the transformation is still possible, and the N dimension should be folded into the M dimension of the matmul. For example:

// RUN: iree-opt --split-input-file -iree-global-opt-convert-1x1-filter-conv2d-to-matmul %s
util.func public @main(%arg0: tensor<2x128x128x960xf16>, %arg1: tensor<1x1x960x320xf16>) -> tensor<2x128x128x320xf32> {
    %cst = arith.constant 0.0 : f32
    %9 = tensor.empty() : tensor<2x128x128x320xf32>
    %10 = linalg.fill ins(%cst : f32) outs(%9 : tensor<2x128x128x320xf32>) -> tensor<2x128x128x320xf32>
    %11 = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : vector<2xi64>, strides = dense<1> : vector<2xi64>} ins(%arg0, %arg1 : tensor<2x128x128x960xf16>, tensor<1x1x960x320xf16>) outs(%10 : tensor<2x128x128x320xf32>) -> tensor<2x128x128x320xf32>
    util.return %11 : tensor<2x128x128x320xf32>
}

The N dimension of 2 can be collapsed into the M dimension along with the H and W dimensions, and become:

%7 = linalg.matmul ins(%3, %4 : tensor<32768x960xf16>, tensor<960x320xf16>) outs(%6 : tensor<32768x320xf32>) -> tensor<32768x320xf32>

Similarly, for NCHW convolutions, the N dimension of the convolution can be folded into the N dimension of the matmul.

This is addressing the TODO here:

iree/compiler/src/iree/compiler/GlobalOptimization/Convert1X1FilterConv2DToMatmul.cpp

Line 69 in 691aa4f

// TODO(ataei): Support conversion to linalg.batch_matmul.

Dynamic N

The current implementation also bails on dynamic H or W dimensions:

iree/compiler/src/iree/compiler/GlobalOptimization/Convert1X1FilterConv2DToMatmul.cpp

Lines 61 to 64 in 691aa4f

    
           // We cannot merge the width and height if they are both dynamic as we 
        
           // cannot expand them back to their dynamic values. 
        
           if (isInputHWDynamic) 
        
             return failure();

This should be possible now that tensor.expand_shape carries the result shape, since the dynamic output sizes are able to be recovered from the expand_shape result sizes. It would also be good to support this dynamic case while adding support for non-unit N dimensions.

The text was updated successfully, but these errors were encountered:

IanWood1 · 2024-10-10T16:09:18Z

For nchw ops, I'm not sure how this would get converted to a matmul because the N dimension of the convolution is not contiguous with the N dimension of the matmul. Does this need a reshape to transpose dims 0 <-> 1? Or (as the TODO comment suggests) does %filter need to be broadcasted to match the N dimension of %input and then preform a batch_matmul?

%1 = linalg.conv_2d_nchw_fchw {
    dilations = dense<1> : tensor<2xi64>,
    strides = dense<1> : tensor<2xi64>} 
    ins(%input, %filter : tensor<5x2x4x5xf32>, tensor<7x2x1x1xf32>) outs(%0 : tensor<5x7x4x5xf32>) -> tensor<5x7x4x5xf32>

Max191 · 2024-10-11T15:33:42Z

For nchw ops, I'm not sure how this would get converted to a matmul because the N dimension of the convolution is not contiguous with the N dimension of the matmul. Does this need a reshape to transpose dims 0 <-> 1? Or (as the TODO comment suggests) does %filter need to be broadcasted to match the N dimension of %input and then preform a batch_matmul?

%1 = linalg.conv_2d_nchw_fchw {
dilations = dense<1> : tensor<2xi64>,
strides = dense<1> : tensor<2xi64>}
ins(%input, %filter : tensor<5x2x4x5xf32>, tensor<7x2x1x1xf32>) outs(%0 : tensor<5x7x4x5xf32>) -> tensor<5x7x4x5xf32>

In this case, I think it is best to create a linalg.generic with multiple N contraction dimensions. Here is an example of something similar:
https://github.com/Max191/iree/blob/a7fd3a33c1b80a599d75e2bdac413f1883f8608d/compiler/src/iree/compiler/Dialect/LinalgExt/Transforms/ConvertConv2DToIm2ColOp.cpp#L282-L308

qedawkins · 2024-10-11T15:50:31Z

What the conv_* to matmul pass should do is not collapse dimensions. It should simply be doing a rewrite of the indexing maps. In particular we have an indexing map of the form

affine_map<(d0, d1) -> (<dilation> * d0 + <stride> * d1)>

All we are doing is looking at the unit kernel loops (in this case d1) and folding the d1 to 0 in the indexing map. Collapsing the op to a matmul should be a separate pattern that is best effort (basically just CollapseDimensions)

Max191 assigned IanWood1 Oct 7, 2024

IanWood1 linked a pull request Oct 10, 2024 that will close this issue

Add conversions for 1x1 conv_2d to matmul #18736

Open

Max191 linked a pull request Oct 11, 2024 that will close this issue

Add conversions for 1x1 conv_2d to matmul #18736

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GlobalOptimization] 1x1 filter convolutions not converted to matmul #18710

[GlobalOptimization] 1x1 filter convolutions not converted to matmul #18710

Max191 commented Oct 7, 2024

IanWood1 commented Oct 10, 2024

Max191 commented Oct 11, 2024

qedawkins commented Oct 11, 2024

[GlobalOptimization] 1x1 filter convolutions not converted to matmul #18710

[GlobalOptimization] 1x1 filter convolutions not converted to matmul #18710

Comments

Max191 commented Oct 7, 2024

Dynamic N

IanWood1 commented Oct 10, 2024

Max191 commented Oct 11, 2024

qedawkins commented Oct 11, 2024