Inference benchmark of deep learning models implemented by paddlepaddle.
Environment
- MI 5, Android 7.0, Snapdragon 820 1.8GHz
- android-ndk-r13b
- gcc version 4.9.x 20150123 (prerelease) (GCC)
- Android clang version 3.8.256229 (based on LLVM 3.8.256229)
Benchmark for Mobilenet inference(input image 3x224x224).
Currently, on MI 5 phones, single-threaded inference takes 122.607ms and takes up 48M of system memory.
version | times(ms) | mem(MB) | size(KB) | optimization(accelerate) |
---|---|---|---|---|
d2258a4 | 321.682 | - | - | base |
d2258a4 | 225.044 | - | - | merge bn(30%) |
b45d020 | 148.201 | - | - | depthwise convolution(34.1%) |
0146e8b | 127.032 | - | - | clang compile(14.3%) |
d59295f | 122.607 | 48 | 4306 -> 1431 | neon::relu(3.5%) |
- The convolution layer of the Base version is achieved by
im2col + gemm
way. - The merge bn optimization is merge the parameters of batch normalization layer's into the parameters of convolution layer.
- The depthwise convolution is a depthwise convolution optimization base on arm neon intrinsics.
- The clang compile is better than gcc compile.
- The test method of
mem(MB)
is running the paddle inference program, and use the free command access the changes of memory usage in the system. - The previous value in
size (KB)
column is the size of the paddle inference.so, and the latter is the size after zip compressed.