[RFC] Frontend layout transformation #2519

FrozenGene · 2019-01-29T03:07:21Z

Currently, frontend models has two different input layout: NHWC and NCHW. Tensorflow and TFLite are NHWC layout, while like CoreML frontend is NCHW layout.

For converting model with NHWC input layout, currently there is no unified way. Some framework convert NHWC into NCHW input layout. For example, Intel OpenVINO, Tensorflow-CoreML converters (https://github.com/tf-coreml/tf-coreml). This has some advantages, for example on GPU. And for TVM, we support NCHW very well, for example:

auto tuning only support NCHW currently(https://github.com/dmlc/tvm/blob/master/python/tvm/autotvm/task/topi_integration.py#L132)
conv2d_transpose only support NCHW(https://github.com/dmlc/tvm/blob/master/nnvm/python/nnvm/top/nn.py#L287)
many other places...

This is way our TVM Tensorflow Lite frontend (convert TFLite's NHWC into NCHW) takes. However, it also has disadvantages. When we handle shape transformations(like Reshape, Squeeze, Concat and so on), we should be very careful. For example, Tensorflow-CoreML converter has complex logic to handle reshape: https://github.com/tf-coreml/tf-coreml/blob/master/tfcoreml/_shape_sensitive_layers.py#L222.

Another way is to keep the same as original model's input layout, i.e. NHWC. This is the way our TVM Tensorflow frontend takes. However, for performance, we extend NCHW layout support when we want to run on GPU. But the way we take is to insert into transpose op before / after convolution. It will cost a noticeable fraction of the runtime when the convolution executes fast. We even find out it occupies half of the total running time during our one model test.

To avoid this issue, maybe we should do it in graph pass and eliminate shape transpose as much / far as possible.

Open this RFC is just to raise this concern and let us discuss how to do it will be better.

tqchen · 2019-01-31T18:10:56Z

When possible, I think we should keep the original format in the frontend importer, and implement automatic layout conversion pass that can be shared across frontend.

The layout conversion pass should try to eliminate as many intermediate layout changes as much as possible.

FrozenGene · 2019-04-17T15:04:10Z

@tqchen I plan to support TFLite NHWC data layout after my quantization part upstreamed. However, NCHW has its advantages as described. We could have two options:

Keep NCHW of TFLite and add one parameter named layout in from_tflite.

layout could be NCHW or NHWC. The default value could be discussed. This means we support two data layout in TFLite frontend and leave decisions to user. For example, user want to use NCHW[x]c schedule or model has conv2d_transpose, they may want to use NCHW layout.

Drop NCHW of TFLite and only original TFLite NHWC layout.

Besides TFLite frontend work, we also should have some work in AutoTVM (support NHWC of convolution tuning) and support spatial pack NHWC on ARM CPU.

Wish to hear some comments of yours.

srkreddy1238 · 2019-04-17T15:21:15Z

I vote for using tflite original layout as it is. The internal conversation logic make it complex while adding new features.

yzhliu · 2019-04-18T22:22:22Z

I prefer not to do any layout conversion in frontend.
Just to clarify, tflite quantization has not been upstreamed, right? @FrozenGene

FrozenGene · 2019-04-19T02:20:21Z

@srkreddy1238 @yzhliu Thanks comments!

If all of you agree, I will make TFLite frontend support from NCHW to NHWC.

@yzhliu Yes. quantization part support is not been upstreamed yet. It has many changes. I plan to upstream it in dev 0.6. My original plan is to support TFLite NHWC after quantization part upstreamed, the reason is we could leverage auto tuning of NCHW and see the performance of quantization model. The initial work is we did is faster than FP32 30% in Mobilenet V1 using spatial pack. We also find this is not the limit of quantization model, we could tensorize q_conv2d to get better performance. However, if we change the layout from NCHW to NHWC, we should have some additional work to do, for example auto tuning of NHWC support (including conv2d and depthwise convolution). All right, I could start to do this work firstly to support TFLite NHWC and upstream it before quantization part, because this work is much easier than quantization part.

tqchen · 2019-05-08T15:25:18Z

As a side note, we can also provide a layout transformation pass to do the layout transformation(from NHWC->NCHW, or NCHW4) in relay.

apivovarov · 2019-06-07T06:49:14Z

I tried to compile mobilenet_v1_1.0_224.tflite model for ARM Mali GPU using the latest code from the tutorial. The compilation failed - topi/arm_cpu/conv2d.py _decl_spatial_pack Only support NCHW

  File "/usr/local/lib/python3.5/dist-packages/topi-0.6.dev0-py3.5.egg/topi/mali/conv2d.py", line 72, in conv2d_mali
    num_tile=3)
  File "/usr/local/lib/python3.5/dist-packages/topi-0.6.dev0-py3.5.egg/topi/arm_cpu/conv2d.py", line 134, in _decl_spatial_pack
    assert layout == "NCHW", "Only support NCHW"
AssertionError: Only support NCHW

apivovarov · 2019-06-07T07:22:55Z

The same issue exist with arm_cpu, e.g. tvm.target.arm_cpu('rk3399')

  File "/usr/local/lib/python3.5/dist-packages/topi-0.6.dev0-py3.5.egg/topi/arm_cpu/conv2d.py", line 76, in conv2d_arm_cpu
    num_tile=2)
  File "/usr/local/lib/python3.5/dist-packages/topi-0.6.dev0-py3.5.egg/topi/arm_cpu/conv2d.py", line 134, in _decl_spatial_pack
    assert layout == "NCHW", "Only support NCHW"
AssertionError: Only support NCHW

tqchen · 2019-06-13T22:21:36Z

this thread is concluded and we shall move layout transformation as passes in relay

FrozenGene mentioned this issue Jan 29, 2019

[Doc] TFLite frontend tutorial #2508

Merged

yzhliu added the status: RFC label Jan 29, 2019

cbecker mentioned this issue Feb 21, 2019

[WIP] Add conv2d gradients #1075

Closed

2 tasks

FrozenGene mentioned this issue Mar 29, 2019

[Relay][Frontend] Adding ADD operator to tflite frontend for compiling the MobileNetV2 #2919

Merged

FrozenGene mentioned this issue May 7, 2019

[TFLite] Convert TFLite NCHW to NHWC #3141

Merged

tqchen closed this as completed Jun 13, 2019

anijain2305 mentioned this issue Aug 11, 2019

[Relay][Legalize][ARM_CPU] Handling NHWC layout for arm_cpu. #3754

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Frontend layout transformation #2519

[RFC] Frontend layout transformation #2519

FrozenGene commented Jan 29, 2019 •

edited

Loading

tqchen commented Jan 31, 2019

FrozenGene commented Apr 17, 2019

srkreddy1238 commented Apr 17, 2019

yzhliu commented Apr 18, 2019

FrozenGene commented Apr 19, 2019 •

edited

Loading

tqchen commented May 8, 2019 •

edited

Loading

apivovarov commented Jun 7, 2019 •

edited

Loading

apivovarov commented Jun 7, 2019

tqchen commented Jun 13, 2019

[RFC] Frontend layout transformation #2519

[RFC] Frontend layout transformation #2519

Comments

FrozenGene commented Jan 29, 2019 • edited Loading

tqchen commented Jan 31, 2019

FrozenGene commented Apr 17, 2019

srkreddy1238 commented Apr 17, 2019

yzhliu commented Apr 18, 2019

FrozenGene commented Apr 19, 2019 • edited Loading

tqchen commented May 8, 2019 • edited Loading

apivovarov commented Jun 7, 2019 • edited Loading

apivovarov commented Jun 7, 2019

tqchen commented Jun 13, 2019

FrozenGene commented Jan 29, 2019 •

edited

Loading

FrozenGene commented Apr 19, 2019 •

edited

Loading

tqchen commented May 8, 2019 •

edited

Loading

apivovarov commented Jun 7, 2019 •

edited

Loading