You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, TVM has different schedules for ARM and Intel for conv2d operators. The discuss post listed above shows that Intel conv2d NCHWc schedule on ARM gives better end-to-end latency compared to ARM NCHW conv2d spatial pack schedule for many TFLite networks.
However, this is just one opportunity and there are also some more ideas that we should pursue. This issue lists those potential issues and anybody interested can pick them up. This list is a result of discussions from the above post.
Investigate NHWC vs NCHWc schedule - NCHWc can bring data layouts transform. Check if NHWC can achieve same performance as NCHWc conv2d, and eliminate data layout conversion overhead.
That is really good idea that we can share schedules across backends!
A few monthes ago, I had once thought that, maybe we can modularize TOPI such that some well known schedules can be shared as much as possible. For example, we can have schedules for CPU and GPU which could shared between x86/ARM/... and CUDA/OpenCL respectively. Though I am not actively working on TVM nowdays, hoping to contribute someday still.
Relevant discuss post - https://discuss.tvm.ai/t/topi-using-x86-schedules-for-arm-conv2d/6365
Currently, TVM has different schedules for ARM and Intel for conv2d operators. The discuss post listed above shows that Intel conv2d NCHWc schedule on ARM gives better end-to-end latency compared to ARM NCHW conv2d spatial pack schedule for many TFLite networks.
However, this is just one opportunity and there are also some more ideas that we should pursue. This issue lists those potential issues and anybody interested can pick them up. This list is a result of discussions from the above post.
@FrozenGene @masahi @tqchen
The text was updated successfully, but these errors were encountered: