Issue Info

There is a missmatch in the overall performance when using the 2 APIs in TVM. api_experimental.py contains a minimal tests of these two API with the usage available in Tutorials.

API_2:

vm_executor = relay.create_executor("vm", mod, tvm.cpu(0), target)
vm_executor.evaluate()(input_data, **params)

With opt_level=3 there is bytecodes and optimizations passes are not identical as expected. The issue is that params are not passed properly and prevent optimizations related with EA and constant foldings.

Debug Output:

def @main(%data: Tensor[(1, 1024), float32], %fc0_weight: Tensor[(512, 1024), float32], %fc0_bias: Tensor[(512), float32], %fc1_weight: Tensor[(256, 512), float32], %fc1_bias: Tensor[(256), float32], %fc2_weight: Tensor[(128, 256), float32], %fc2_bias: Tensor[(128), float32]) -> Tensor[(1, 128), float32] {
  %0 = nn.dense(%data, %fc0_weight, units=512) /* ty=Tensor[(1, 512), float32] */;
  %1 = nn.bias_add(%0, %fc0_bias, axis=-1) /* ty=Tensor[(1, 512), float32] */;
  %2 = nn.dense(%1, %fc1_weight, units=256) /* ty=Tensor[(1, 256), float32] */;
  %3 = nn.bias_add(%2, %fc1_bias, axis=-1) /* ty=Tensor[(1, 256), float32] */;
  %4 = nn.dense(%3, %fc2_weight, units=128) /* ty=Tensor[(1, 128), float32] */;
  nn.bias_add(%4, %fc2_bias, axis=-1) /* ty=Tensor[(1, 128), float32] */
}bytecode_print.patch

One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
VM Function[0]: main(data)
# reg file size = 19
# instruction count = 22
opcode, fields # inst(text):
 0: 11 0 1   # load_const $1 Const[0]
 1: 16 1 64 2 32 1 1 2   # alloc_storage $2 $1 64 float32 1
 2: 11 1 3   # load_const $3 Const[1]
 3: 5 2 3 2 32 1 2 4 1 512   # alloc_tensor $4 $2 $3 [1, 512] float32
 4: 11 2 5   # load_const $5 Const[2]
 5: 11 3 6   # load_const $6 Const[3]
 6: 4 0 4 1 0 5 6 4   # invoke_packed PackedFunc[0] (in: $0, $5, $6, out: $4)
 7: 11 4 7   # load_const $7 Const[4]
 8: 16 7 64 2 32 1 1 8   # alloc_storage $8 $7 64 float32 1
 9: 11 5 9   # load_const $9 Const[5]
10: 5 8 9 2 32 1 2 10 1 256   # alloc_tensor $10 $8 $9 [1, 256] float32
11: 11 6 11   # load_const $11 Const[6]
12: 11 7 12   # load_const $12 Const[7]
13: 4 1 4 1 4 11 12 10   # invoke_packed PackedFunc[1] (in: $4, $11, $12, out: $10)
14: 11 8 13   # load_const $13 Const[8]
15: 16 13 64 2 32 1 1 14   # alloc_storage $14 $13 64 float32 1
16: 11 9 15   # load_const $15 Const[9]
17: 5 14 15 2 32 1 2 16 1 128   # alloc_tensor $16 $14 $15 [1, 128] float32
18: 11 10 17   # load_const $17 Const[10]
19: 11 11 18   # load_const $18 Const[11]
20: 4 2 4 1 10 17 18 16   # invoke_packed PackedFunc[2] (in: $10, $17, $18, out: $16)
21: 1 16   # ret $16


*** VM style1 runtime time elapsed 0.04385066032409668


def @main(%data: Tensor[(1, 1024), float32], %fc0_weight: Tensor[(512, 1024), float32], %fc0_bias: Tensor[(512), float32], %fc1_weight: Tensor[(256, 512), float32], %fc1_bias: Tensor[(256), float32], %fc2_weight: Tensor[(128, 256), float32], %fc2_bias: Tensor[(128), float32]) -> Tensor[(1, 128), float32] {
  %0 = nn.dense(%data, %fc0_weight, units=512) /* ty=Tensor[(1, 512), float32] */;
  %1 = nn.bias_add(%0, %fc0_bias, axis=-1) /* ty=Tensor[(1, 512), float32] */;
  %2 = nn.dense(%1, %fc1_weight, units=256) /* ty=Tensor[(1, 256), float32] */;
  %3 = nn.bias_add(%2, %fc1_bias, axis=-1) /* ty=Tensor[(1, 256), float32] */;
  %4 = nn.dense(%3, %fc2_weight, units=128) /* ty=Tensor[(1, 128), float32] */;
  nn.bias_add(%4, %fc2_bias, axis=-1) /* ty=Tensor[(1, 128), float32] */
}

VM Function[0]: main(data, fc0_weight, fc0_bias, fc1_weight, fc1_bias, fc2_weight, fc2_bias)
# reg file size = 19
# instruction count = 16
opcode, fields # inst(text):
 0: 11 0 7   # load_const $7 Const[0]
 1: 16 7 64 2 32 1 1 8   # alloc_storage $8 $7 64 float32 1
 2: 11 1 9   # load_const $9 Const[1]
 3: 5 8 9 2 32 1 2 10 1 512   # alloc_tensor $10 $8 $9 [1, 512] float32
 4: 4 0 4 1 0 1 2 10   # invoke_packed PackedFunc[0] (in: $0, $1, $2, out: $10)
 5: 11 2 11   # load_const $11 Const[2]
 6: 16 11 64 2 32 1 1 12   # alloc_storage $12 $11 64 float32 1
 7: 11 3 13   # load_const $13 Const[3]
 8: 5 12 13 2 32 1 2 14 1 256   # alloc_tensor $14 $12 $13 [1, 256] float32
 9: 4 1 4 1 10 3 4 14   # invoke_packed PackedFunc[1] (in: $10, $3, $4, out: $14)
10: 11 4 15   # load_const $15 Const[4]
11: 16 15 64 2 32 1 1 16   # alloc_storage $16 $15 64 float32 1
12: 11 5 17   # load_const $17 Const[5]
13: 5 16 17 2 32 1 2 18 1 128   # alloc_tensor $18 $16 $17 [1, 128] float32
14: 4 2 4 1 14 5 6 18   # invoke_packed PackedFunc[2] (in: $14, $5, $6, out: $18)
15: 1 18   # ret $18


*** VM style2 runtime time elapsed 0.07250618934631348


API 2 is 1.653479988908402 times slower than API 1

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
api_experimental.py		api_experimental.py
bytecode_print.patch		bytecode_print.patch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Issue Info

About

Releases

Packages

Languages

mikepapadim/tvm_issue

Folders and files

Latest commit

History

Repository files navigation

Issue Info

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages