Optimize x86 conv3d_ndhwc using data packing approach. #4866

alexgl-github · 2020-02-11T21:41:46Z

Add tuneable conv3d_ndhwc schedule

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

alexgl-github · 2020-02-11T21:43:37Z

@anijain2305 Please take a look

kevinthesun · 2020-02-12T17:52:45Z

Thank you for this work! It would be great if you can provide benchmarking data comparing tvm conv3d performance VS existing solution(tensorflow + mkldnn?) to see where current implementation stands.

anijain2305

Minor comments.

Overall looks good to me. Currently, you have used NCHW schedule of Conv2D. you should also try Conv2d NCHWc schedule. That schedule gives best performance for Conv2D, and has potential here as well.

https://github.com/apache/incubator-tvm/blob/master/topi/python/topi/nn/conv2d.py#L421

topi/python/topi/nn/util.py

topi/python/topi/x86/conv3d.py

anijain2305

Good to go from my side! If you can, please add some TF comparison.

Add tuneable conv3d_ndhwc schedule

alexgl-github · 2020-02-13T01:25:16Z

Thank you for this work! It would be great if you can provide benchmarking data comparing tvm conv3d performance VS existing solution(tensorflow + mkldnn?) to see where current implementation stands.

@kevinthesun
Below is benchmark results for certain data/kernel combinations, run on 2 core Intel "ivybridge" vs TF 1.15 + mkldnn. X: value means speedup (or slowdown) of TVM model vs same TF model

TVM: 0.007 sec; X: 1.645; TF: 0.011 sec; input_shape=(1, 16, 256, 256, 1) ; kernel_shape=(1, 3, 3, 1, 8)
TVM: 0.019 sec; X: 1.218; TF: 0.023 sec; input_shape=(1, 16, 256, 256, 1) ; kernel_shape=(1, 7, 7, 1, 8)
TVM: 0.054 sec; X: 1.490; TF: 0.080 sec; input_shape=(1, 16, 256, 256, 8) ; kernel_shape=(1, 3, 3, 8, 16)
TVM: 0.262 sec; X: 0.869; TF: 0.228 sec; input_shape=(1, 16, 256, 256, 8) ; kernel_shape=(1, 7, 7, 8, 16)
TVM: 0.013 sec; X: 1.290; TF: 0.016 sec; input_shape=(1, 16, 256, 256, 1) ; kernel_shape=(3, 3, 3, 1, 8)
TVM: 0.114 sec; X: 1.148; TF: 0.131 sec; input_shape=(1, 16, 256, 256, 1) ; kernel_shape=(7, 7, 7, 1, 8)
TVM: 0.146 sec; X: 1.058; TF: 0.154 sec; input_shape=(1, 16, 256, 256, 8) ; kernel_shape=(3, 3, 3, 8, 16)
TVM: 2.432 sec; X: 0.591; TF: 1.436 sec; input_shape=(1, 16, 256, 256, 8) ; kernel_shape=(7, 7, 7, 8, 16)

kevinthesun

LGTM. As suggestion from @anijain2305, we might also want to try similar data layout as conv2d_NCHWc to see whether we can get more performance improvement.

kevinthesun · 2020-02-13T06:38:43Z

Thanks @alexgl-github @anijain2305

Add tuneable conv3d_ndhwc schedule

alexgl-github force-pushed the conv3d_packed branch from 13d4ecf to eca65c1 Compare February 11, 2020 23:10

tqchen added the status: need review label Feb 12, 2020

tqchen assigned kevinthesun Feb 12, 2020

anijain2305 reviewed Feb 12, 2020

View reviewed changes

topi/python/topi/nn/util.py Outdated Show resolved Hide resolved

topi/python/topi/x86/conv3d.py Outdated Show resolved Hide resolved

alexgl-github force-pushed the conv3d_packed branch from eca65c1 to 89c72d8 Compare February 12, 2020 23:37

anijain2305 reviewed Feb 13, 2020

View reviewed changes

anijain2305 approved these changes Feb 13, 2020

View reviewed changes

Optimize x86 conv3d_ndhwc using data packing approach.

d7f5d50

Add tuneable conv3d_ndhwc schedule

alexgl-github force-pushed the conv3d_packed branch from 89c72d8 to d7f5d50 Compare February 13, 2020 00:10

kevinthesun approved these changes Feb 13, 2020

View reviewed changes

kevinthesun merged commit 8d94587 into apache:master Feb 13, 2020

alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 26, 2020

Optimize x86 conv3d_ndhwc using data packing approach. (apache#4866)

a09154a

Add tuneable conv3d_ndhwc schedule

alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 28, 2020

Optimize x86 conv3d_ndhwc using data packing approach. (apache#4866)

d9deac7

Add tuneable conv3d_ndhwc schedule

zhiics pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2020

Optimize x86 conv3d_ndhwc using data packing approach. (apache#4866)

c51689d

Add tuneable conv3d_ndhwc schedule

ZihengJiang mentioned this pull request Sep 17, 2020

TVM v0.7 Release Note Candidate #6486

Closed

alexgl-github deleted the conv3d_packed branch November 3, 2020 22:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize x86 conv3d_ndhwc using data packing approach. #4866

Optimize x86 conv3d_ndhwc using data packing approach. #4866

alexgl-github commented Feb 11, 2020

alexgl-github commented Feb 11, 2020

kevinthesun commented Feb 12, 2020

anijain2305 left a comment

anijain2305 left a comment

alexgl-github commented Feb 13, 2020 •

edited

Loading

kevinthesun left a comment

kevinthesun commented Feb 13, 2020

Optimize x86 conv3d_ndhwc using data packing approach. #4866

Optimize x86 conv3d_ndhwc using data packing approach. #4866

Conversation

alexgl-github commented Feb 11, 2020

alexgl-github commented Feb 11, 2020

kevinthesun commented Feb 12, 2020

anijain2305 left a comment

Choose a reason for hiding this comment

anijain2305 left a comment

Choose a reason for hiding this comment

alexgl-github commented Feb 13, 2020 • edited Loading

kevinthesun left a comment

Choose a reason for hiding this comment

kevinthesun commented Feb 13, 2020

alexgl-github commented Feb 13, 2020 •

edited

Loading