-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Frontend layout transformation #2519
Comments
When possible, I think we should keep the original format in the frontend importer, and implement automatic layout conversion pass that can be shared across frontend. The layout conversion pass should try to eliminate as many intermediate layout changes as much as possible. |
@tqchen I plan to support TFLite NHWC data layout after my quantization part upstreamed. However, NCHW has its advantages as described. We could have two options:
Besides TFLite frontend work, we also should have some work in AutoTVM (support NHWC of convolution tuning) and support Wish to hear some comments of yours. |
I vote for using tflite original layout as it is. The internal conversation logic make it complex while adding new features. |
I prefer not to do any layout conversion in frontend. |
@srkreddy1238 @yzhliu Thanks comments! If all of you agree, I will make TFLite frontend support from NCHW to NHWC. @yzhliu Yes. quantization part support is not been upstreamed yet. It has many changes. I plan to upstream it in dev 0.6. My original plan is to support TFLite NHWC after quantization part upstreamed, the reason is we could leverage auto tuning of NCHW and see the performance of quantization model. The initial work is we did is faster than FP32 30% in Mobilenet V1 using spatial pack. We also find this is not the limit of quantization model, we could tensorize |
As a side note, we can also provide a layout transformation pass to do the layout transformation(from NHWC->NCHW, or NCHW4) in relay. |
I tried to compile
|
The same issue exist with arm_cpu, e.g.
|
this thread is concluded and we shall move layout transformation as passes in relay |
Currently, frontend models has two different input layout: NHWC and NCHW. Tensorflow and TFLite are NHWC layout, while like CoreML frontend is NCHW layout.
For converting model with NHWC input layout, currently there is no unified way. Some framework convert NHWC into NCHW input layout. For example, Intel OpenVINO, Tensorflow-CoreML converters (https://github.com/tf-coreml/tf-coreml). This has some advantages, for example on GPU. And for TVM, we support NCHW very well, for example:
auto tuning only support NCHW currently(https://github.com/dmlc/tvm/blob/master/python/tvm/autotvm/task/topi_integration.py#L132)
conv2d_transpose only support NCHW(https://github.com/dmlc/tvm/blob/master/nnvm/python/nnvm/top/nn.py#L287)
many other places...
This is way our TVM Tensorflow Lite frontend (convert TFLite's NHWC into NCHW) takes. However, it also has disadvantages. When we handle shape transformations(like Reshape, Squeeze, Concat and so on), we should be very careful. For example, Tensorflow-CoreML converter has complex logic to handle reshape: https://github.com/tf-coreml/tf-coreml/blob/master/tfcoreml/_shape_sensitive_layers.py#L222.
Another way is to keep the same as original model's input layout, i.e. NHWC. This is the way our TVM Tensorflow frontend takes. However, for performance, we extend NCHW layout support when we want to run on GPU. But the way we take is to insert into transpose op before / after convolution. It will cost a noticeable fraction of the runtime when the convolution executes fast. We even find out it occupies half of the total running time during our one model test.
To avoid this issue, maybe we should do it in graph pass and eliminate shape transpose as much / far as possible.
Open this RFC is just to raise this concern and let us discuss how to do it will be better.
The text was updated successfully, but these errors were encountered: