Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TFLite] Convert TFLite NCHW to NHWC #3141

Merged
merged 2 commits into from
May 22, 2019

Conversation

FrozenGene
Copy link
Member

Background RFC: #2519

As discussed in above RFC, we agree to make TFLite frontend input layout from NCHW to NHWC. This PR does this.

Affected:
We could not use Auto TVM to tuning on ARM CPU. Because ARM CPU schedule only implement NCHW currently. We should add SpatialPack + NHWC for conv2d / depthwise convolution on ARM CPU.

@srkreddy1238

cc: @songqun @gomida @ariwaranosai @tqchen @yzhliu

@FrozenGene
Copy link
Member Author

ping @srkreddy1238

@srkreddy1238
Copy link
Contributor

@FrozenGene overall look good to me.

Trigger the CI and I will have another look.

@FrozenGene
Copy link
Member Author

@srkreddy1238 CI is green now.

@FrozenGene
Copy link
Member Author

@srkreddy1238 Could you help to have a look now? Thanks.

Copy link
Contributor

@srkreddy1238 srkreddy1238 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @FrozenGene . LGTM.

@srkreddy1238 srkreddy1238 merged commit b63267b into apache:master May 22, 2019
@apivovarov
Copy link
Contributor

apivovarov commented Jun 7, 2019

I tried to compile the model for ARM Mali GPU. The compilation failed - topi/arm_cpu/conv2d.py _decl_spatial_pack Only support NCHW

target = tvm.target.mali('rk3399')
target_host = 'llvm -target=aarch64-linux-gnu'
Traceback (most recent call last):
  File "./compile.py", line 83, in <module>
    graph, lib, params = relay.build(func, target, target_host=target_host, params=params)
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/relay/build_module.py", line 284, in build
    params)
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/relay/build_module.py", line 112, in build
    self._build(func, target, target_host)
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/_ffi/_ctypes/function.py", line 209, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x5a08e1) [0x7f89de2fc8e1]
  [bt] (7) /usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x59f8df) [0x7f89de2fb8df]
  [bt] (6) /usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x5a492e) [0x7f89de30092e]
  [bt] (5) /usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x5a46d3) [0x7f89de3006d3]
  [bt] (4) /usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x59a024) [0x7f89de2f6024]
  [bt] (3) /usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x5a6897) [0x7f89de302897]
  [bt] (2) /usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x59a024) [0x7f89de2f6024]
  [bt] (1) /usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x5a64f6) [0x7f89de3024f6]
  [bt] (0) /usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x95d17b) [0x7f89de6b917b]
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/_ffi/_ctypes/function.py", line 71, in cfun
    rv = local_pyfunc(*pyargs)
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/relay/op/nn/_nn.py", line 122, in compute_conv2d
    dilation, layout, out_dtype=out_dtype)
  File "</usr/local/lib/python3.5/dist-packages/decorator-4.3.2-py3.5.egg/decorator.py:decorator-gen-19>", line 2, in conv2d
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/target.py", line 372, in dispatch_func
    return dispatch_dict[k](*args, **kwargs)
  File "</usr/local/lib/python3.5/dist-packages/decorator-4.3.2-py3.5.egg/decorator.py:decorator-gen-162>", line 2, in config_dispatcher
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/autotvm/task/dispatcher.py", line 215, in dispatch_func
    return dispatch_dict['direct'](cfg, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tvm-0.6.dev0-py3.5-linux-x86_64.egg/tvm/autotvm/task/topi_integration.py", line 344, in template_call
    node = f(cfg, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/topi-0.6.dev0-py3.5.egg/topi/mali/conv2d.py", line 72, in conv2d_mali
    num_tile=3)
  File "/usr/local/lib/python3.5/dist-packages/topi-0.6.dev0-py3.5.egg/topi/arm_cpu/conv2d.py", line 134, in _decl_spatial_pack
    assert layout == "NCHW", "Only support NCHW"
AssertionError: Only support NCHW

@apivovarov
Copy link
Contributor

apivovarov commented Jun 7, 2019

The same issue exists on ARM CPU, e.g. tvm.target.arm_cpu('rk3399')

  File "/usr/local/lib/python3.5/dist-packages/topi-0.6.dev0-py3.5.egg/topi/arm_cpu/conv2d.py", line 76, in conv2d_arm_cpu
    num_tile=2)
  File "/usr/local/lib/python3.5/dist-packages/topi-0.6.dev0-py3.5.egg/topi/arm_cpu/conv2d.py", line 134, in _decl_spatial_pack
    assert layout == "NCHW", "Only support NCHW"
AssertionError: Only support NCHW

@FrozenGene
Copy link
Member Author

@apivovarov As described this thread, this PR will affect the Auto Tuning. We should add spatial_pack for NHWC.

@apivovarov
Copy link
Contributor

apivovarov commented Jun 7, 2019

Why this PR was merged before adding spatial_pack for NHWC?
This PR affects not only Auto Tuning. It affects main TVM functionality - compilation.
I can not even compile simple mobilenet TFLite model for arm_mali and arm_cpu now.
It there any workaround for the problem which can help me to compile TFLite model for arm?
Should we add parameter layout=NHWC/NCHW similar to what from_tensorflow is doing?

@FrozenGene
Copy link
Member Author

This PR doesn't affect functionality, we could run generic / x86 CPU target well. If you compile Tensorflow model (not TFLite model) on ARM CPU, you will also get the same result. Please see the background discussion link of this PR (#2519), I have listed the advantages / disadvantages. I know your concern and this is why I support TFLite in NCHW layout firstly, but however, make TFLite frontend be NHWC is also discussed before and we know what it will be affected. Support spatial pack NHWC is also not difficult. You could just modify it by yourself currently.

@FrozenGene
Copy link
Member Author

FrozenGene commented Jun 7, 2019

Why this PR was merged before adding spatial_pack for NHWC?
This PR affects not only Auto Tuning. It affects main TVM functionality - compilation.
I can not even compile simple mobilenet TFLite model for arm_mali and arm_cpu now.
It there any workaround for the problem which can help me to compile TFLite model for arm?
Should we add parameter layout=NHWC/NCHW similar to what from_tensorflow is doing?

@apivovarov
I think you could revert this PR if you want to auto tuning on ARM CPU. Because I support TFLite in NCHW firstly and handle logic internally.

I don't think we should add layout=NHWC/NCHW, because we will introduce transpose op and affect the performance. We should support spatial pack NHWC.

@kevinthesun
Copy link
Contributor

kevinthesun commented Jun 7, 2019

@FrozenGene Will data-related transpose op only appear at the beginning and the end of the network? I think if we can just keep the necessary layout transformations, the performance shouldn't be affected. This is probably a more maintainable way than adding another layout optimization in topi and autotvm.

@apivovarov
Copy link
Contributor

I noticed that in tensorflow/test_forward.py unit tests we test for two targets - cuda and llmv
https://github.com/dmlc/tvm/blob/master/tests/python/frontend/tensorflow/test_forward.py#L1273

But in tflite/test_forward.py tests we test only for llvm
https://github.com/dmlc/tvm/blob/master/tests/python/frontend/tflite/test_forward.py#L137

@FrozenGene
Copy link
Member Author

@FrozenGene Will data-related transpose op only appear at the beginning and the end of the network? I think if we can just keep the necessary layout transformations, the performance shouldn't be affected. This is probably a more maintainable way than adding another layout optimization in topi and autotvm.

@kevinthesun Not. It will appear before and after every convolution op. Because like our Tensorflow's implementation, we just simply insert transpose before and after conv. I think we should add spatial pack NHWC for ARM CPU.

@apivovarov The reason is NCHW is better for CUDA target.

@kevinthesun
Copy link
Contributor

@FrozenGene Got it. In the long term we can add a graph level layout transformation pass. For now we can replace conv2d NHWC with conv2d NCHW in conv2d_later_layout. This should achieve better performance than originla implementation.

@FrozenGene
Copy link
Member Author

FrozenGene commented Jun 8, 2019

@FrozenGene Got it. In the long term we can add a graph level layout transformation pass. For now we can replace conv2d NHWC with conv2d NCHW in conv2d_later_layout. This should achieve better performance than originla implementation.

However, this will affect auto tuning. And in fact, we have spatial_pack NHWC and spatial_pack NCHW already. Please see: https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/bitserial_conv2d.py#L212-L318 Their logic are very similar and we could support ARM CPU conv2d / depthwise_conv2d NHWC / NCHW like bitserial_conv2d.

wweic pushed a commit to wweic/tvm that referenced this pull request Jun 26, 2019
* Convert TFLite NCHW to NHWC

* Minor comment fix
wweic pushed a commit to neo-ai/tvm that referenced this pull request Jun 27, 2019
* Convert TFLite NCHW to NHWC

* Minor comment fix
@chenbiaolong
Copy link

@FrozenGene @apivovarov any update for NHWC support for ARM CPU? And I think we need more examples about tflite for ARM CPU, like some network in arm_cpu_imagenet_bench.py.

@FrozenGene
Copy link
Member Author

FrozenGene commented Jul 25, 2019

@chenbiaolong I haven't spare time doing this currently. I wish someone could help to it, I could help to review it. I think it is not difficult. Could refer https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/bitserial_conv2d.py#L212-L318 how to do it.

@FrozenGene
Copy link
Member Author

CC @ajtulloch if you have any interest and time.

@tmoreau89
Copy link
Contributor

@FrozenGene has anyone followed up on NHWC AutoTVM templates for x86/ARM? If not I can give that a stab.

@FrozenGene
Copy link
Member Author

Hi @tmoreau89 , my colleague @jackwish is working on arm and will pr soon. You are welcome to do x86. :-)

@zhenhuaw-me
Copy link
Contributor

Hi @tmoreau89 and @FrozenGene Drafted the ARM NHWC schedule PR #3859.

@tmoreau89
Copy link
Contributor

Nice, thanks for sharing your draft @jackwish . I have given x86 support a try in this WIP PR: #3858

@zhenhuaw-me
Copy link
Contributor

zhenhuaw-me commented Aug 30, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants