Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Frontend layout transformation #2519

Closed
FrozenGene opened this issue Jan 29, 2019 · 9 comments
Closed

[RFC] Frontend layout transformation #2519

FrozenGene opened this issue Jan 29, 2019 · 9 comments

Comments

@FrozenGene
Copy link
Member

FrozenGene commented Jan 29, 2019

Currently, frontend models has two different input layout: NHWC and NCHW. Tensorflow and TFLite are NHWC layout, while like CoreML frontend is NCHW layout.

For converting model with NHWC input layout, currently there is no unified way. Some framework convert NHWC into NCHW input layout. For example, Intel OpenVINO, Tensorflow-CoreML converters (https://github.com/tf-coreml/tf-coreml). This has some advantages, for example on GPU. And for TVM, we support NCHW very well, for example:

This is way our TVM Tensorflow Lite frontend (convert TFLite's NHWC into NCHW) takes. However, it also has disadvantages. When we handle shape transformations(like Reshape, Squeeze, Concat and so on), we should be very careful. For example, Tensorflow-CoreML converter has complex logic to handle reshape: https://github.com/tf-coreml/tf-coreml/blob/master/tfcoreml/_shape_sensitive_layers.py#L222.

Another way is to keep the same as original model's input layout, i.e. NHWC. This is the way our TVM Tensorflow frontend takes. However, for performance, we extend NCHW layout support when we want to run on GPU. But the way we take is to insert into transpose op before / after convolution. It will cost a noticeable fraction of the runtime when the convolution executes fast. We even find out it occupies half of the total running time during our one model test.

To avoid this issue, maybe we should do it in graph pass and eliminate shape transpose as much / far as possible.

Open this RFC is just to raise this concern and let us discuss how to do it will be better.

@tqchen
Copy link
Member

tqchen commented Jan 31, 2019

When possible, I think we should keep the original format in the frontend importer, and implement automatic layout conversion pass that can be shared across frontend.

The layout conversion pass should try to eliminate as many intermediate layout changes as much as possible.

@FrozenGene
Copy link
Member Author

@tqchen I plan to support TFLite NHWC data layout after my quantization part upstreamed. However, NCHW has its advantages as described. We could have two options:

  • Keep NCHW of TFLite and add one parameter named layout in from_tflite.

layout could be NCHW or NHWC. The default value could be discussed. This means we support two data layout in TFLite frontend and leave decisions to user. For example, user want to use NCHW[x]c schedule or model has conv2d_transpose, they may want to use NCHW layout.

  • Drop NCHW of TFLite and only original TFLite NHWC layout.

Besides TFLite frontend work, we also should have some work in AutoTVM (support NHWC of convolution tuning) and support spatial pack NHWC on ARM CPU.

Wish to hear some comments of yours.

@srkreddy1238
Copy link
Contributor

I vote for using tflite original layout as it is. The internal conversation logic make it complex while adding new features.

@yzhliu
Copy link
Member

yzhliu commented Apr 18, 2019

I prefer not to do any layout conversion in frontend.
Just to clarify, tflite quantization has not been upstreamed, right? @FrozenGene

@FrozenGene
Copy link
Member Author

FrozenGene commented Apr 19, 2019

@srkreddy1238 @yzhliu Thanks comments!

If all of you agree, I will make TFLite frontend support from NCHW to NHWC.

@yzhliu Yes. quantization part support is not been upstreamed yet. It has many changes. I plan to upstream it in dev 0.6. My original plan is to support TFLite NHWC after quantization part upstreamed, the reason is we could leverage auto tuning of NCHW and see the performance of quantization model. The initial work is we did is faster than FP32 30% in Mobilenet V1 using spatial pack. We also find this is not the limit of quantization model, we could tensorize q_conv2d to get better performance. However, if we change the layout from NCHW to NHWC, we should have some additional work to do, for example auto tuning of NHWC support (including conv2d and depthwise convolution). All right, I could start to do this work firstly to support TFLite NHWC and upstream it before quantization part, because this work is much easier than quantization part.

@tqchen
Copy link
Member

tqchen commented May 8, 2019

As a side note, we can also provide a layout transformation pass to do the layout transformation(from NHWC->NCHW, or NCHW4) in relay.

@apivovarov
Copy link
Contributor

apivovarov commented Jun 7, 2019

I tried to compile mobilenet_v1_1.0_224.tflite model for ARM Mali GPU using the latest code from the tutorial. The compilation failed - topi/arm_cpu/conv2d.py _decl_spatial_pack Only support NCHW

  File "/usr/local/lib/python3.5/dist-packages/topi-0.6.dev0-py3.5.egg/topi/mali/conv2d.py", line 72, in conv2d_mali
    num_tile=3)
  File "/usr/local/lib/python3.5/dist-packages/topi-0.6.dev0-py3.5.egg/topi/arm_cpu/conv2d.py", line 134, in _decl_spatial_pack
    assert layout == "NCHW", "Only support NCHW"
AssertionError: Only support NCHW

@apivovarov
Copy link
Contributor

The same issue exist with arm_cpu, e.g. tvm.target.arm_cpu('rk3399')

  File "/usr/local/lib/python3.5/dist-packages/topi-0.6.dev0-py3.5.egg/topi/arm_cpu/conv2d.py", line 76, in conv2d_arm_cpu
    num_tile=2)
  File "/usr/local/lib/python3.5/dist-packages/topi-0.6.dev0-py3.5.egg/topi/arm_cpu/conv2d.py", line 134, in _decl_spatial_pack
    assert layout == "NCHW", "Only support NCHW"
AssertionError: Only support NCHW

@tqchen tqchen closed this as completed Jun 13, 2019
@tqchen
Copy link
Member

tqchen commented Jun 13, 2019

this thread is concluded and we shall move layout transformation as passes in relay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants