See apache/tvm#14009 for more details.
Tested under commit afbfb7aa7e43732cb716f8e443df696110be6afc
.
Notice: given the stochastic nature of evolutionary search, perfromance might become worse if enable this PR.
Workload: Conv2d NHWC
Shape | Mainline TVM | Mainline TVM with Async | Performance Boost |
---|---|---|---|
N=1_H=224_W=224_C=3_K=64_R=7_S=7_STR=2_PAD=3_DIL=1 | 13838.05219 | 14687.89452 | 6.141343581679319% |
N=1_H=56_W=56_C=64_K=64_R=1_S=1_STR=1_PAD=0_DIL=1 | 5398.305085 | 5613.892553 | 3.9936140067192905% |
N=1_H=56_W=56_C=64_K=64_R=3_S=3_STR=1_PAD=1_DIL=1 | 11652.96825 | 13157.88249 | 12.91442839038028% |
N=1_H=56_W=56_C=64_K=256_R=1_S=1_STR=1_PAD=0_DIL=1 | 10638.8309 | 11674.68499 | 9.736540600527816% |
N=1_H=56_W=56_C=256_K=64_R=1_S=1_STR=1_PAD=0_DIL=1 | 8692.32829 | 9469.264089 | 8.938178277203573% |
N=1_H=56_W=56_C=256_K=128_R=1_S=1_STR=2_PAD=0_DIL=1 | 4685.767442 | 5698.19634 | 21.606469175684712% |
N=1_H=28_W=28_C=128_K=128_R=3_S=3_STR=1_PAD=1_DIL=1 | 9872.787087 | 10404.60405 | 5.38669535070061% |
N=1_H=28_W=28_C=128_K=512_R=1_S=1_STR=1_PAD=0_DIL=1 | 9974.281496 | 10073.31657 | 0.9929043414276753% |
N=1_H=28_W=28_C=512_K=128_R=1_S=1_STR=1_PAD=0_DIL=1 | 7075.866932 | 8564.572712 | 21.039199780135142% |
N=1_H=28_W=28_C=512_K=256_R=1_S=1_STR=2_PAD=0_DIL=1 | 3648.330914 | 4021.923142 | 10.240086132713124% |
N=1_H=14_W=14_C=256_K=256_R=3_S=3_STR=1_PAD=1_DIL=1 | 8192.954618 | 9160.182054 | 11.805599824451525% |
N=1_H=14_W=14_C=256_K=1024_R=1_S=1_STR=1_PAD=0_DIL=1 | 8008.870153 | 9362.825279 | 16.90569456283206% |
N=1_H=14_W=14_C=1024_K=256_R=1_S=1_STR=1_PAD=0_DIL=1 | 5210.062241 | 6051.208379 | 16.144646629759908% |
N=1_H=14_W=14_C=1024_K=512_R=1_S=1_STR=2_PAD=0_DIL=1 | 2550.787202 | 3587.902938 | 40.65865373586739% |
N=1_H=7_W=7_C=512_K=512_R=3_S=3_STR=1_PAD=1_DIL=1 | 4350.626084 | 5432.788068 | 24.873706981617943% |
N=1_H=7_W=7_C=512_K=2048_R=1_S=1_STR=1_PAD=0_DIL=1 | 6672.068026 | 7663.725217 | 14.862815953549454% |
N=1_H=7_W=7_C=2048_K=512_R=1_S=1_STR=1_PAD=0_DIL=1 | 3142.564263 | 4297.988014 | 36.766909259541826% |
Workload: GEMM NN
Shape | Mainline TVM | Mainline TVM with Async | Performance Boost |
---|---|---|---|
M=512_N=256_K=640 | 8678.46 | 10607.37 | 22.226408832903555% |
M=512_N=384_K=256 | 8109.13 | 10290.72 | 26.902886006267003% |
M=512_N=512_K=512 | 11419.83 | 14000.86 | 22.601299669084398% |
M=512_N=3072_K=768 | 19709.39 | 18351.61 | -6.8890006235606425% |
M=512_N=768_K=3072 | 12844.59 | 13730.88 | 6.90010346768561% |
M=896_N=896_K=896 | 16149.91 | 16131.39 | -0.11467556165947945% |
M=1024_N=1024_K=1024 | 18842.11 | 19662.8 | 4.355616223448428% |
M=1152_N=1152_K=1152 | 15386.79 | 16736.1 | 8.769275462913303% |
M=1536_N=1536_K=1536 | 18522.67 | 18872.06 | 1.88628313304725% |
M=2048_N=2048_K=2048 | 19515.42 | 18874.85 | -3.282378754851291% |
M=3072_N=3072_K=3072 | 19233.9 | 19291.42 | 0.2990553137948975% |
M=4096_N=4096_K=4096 | 17122.17 | 19259.01 | 12.479960191961652% |