forked from PaddlePaddle/Paddle
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add anakin and mobile doc (PaddlePaddle#30)
* Update 3rdparty * Remove submodule * Add anakin doc * Add anakin dev docs * Add mobile doc * Follow comments * Remove mobile * Polish mobile doc
- Loading branch information
Showing
19 changed files
with
289 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# Anakin ARM 性能测试 | ||
|
||
## 测试环境和参数: | ||
+ 测试模型Mobilenetv1, mobilenetv2, mobilenet-ssd | ||
+ 采用android ndk交叉编译,gcc 4.9,enable neon, ABI: armveabi-v7a with neon -mfloat-abi=softfp | ||
+ 测试平台 | ||
- 荣耀v9(root): 处理器:麒麟960, 4 big cores in 2.36GHz, 4 little cores in 1.8GHz | ||
- nubia z17:处理器:高通835, 4 big cores in 2.36GHz, 4 little cores in 1.9GHz | ||
- 360 N5:处理器:高通653, 4 big cores in 1.8GHz, 4 little cores in 1.4GHz | ||
+ 多线程:openmp | ||
+ 时间:warmup10次,运行10次取均值 | ||
+ ncnn版本:来源于github的master branch中commits ID:307a77f04be29875f40d337cfff6df747df09de6(msg:convert LogisticRegressionOutput)版本 | ||
+ TFlite版本:来源于github的master branch中commits ID:65c05bc2ac19f51f7027e66350bc71652662125c(msg:Removed unneeded file copy that was causing failure in Pi builds)版本 | ||
|
||
在BenchMark中本文将使用**`ncnn`**、**`TFlite`**和**`Anakin`**进行性能对比分析 | ||
|
||
## BenchMark model | ||
|
||
> 注意在性能测试之前,请先将测试model通过[External Converter](#10003)转换为Anakin model | ||
> 对这些model,本文在ARM上进行多线程的单batch size测试。 | ||
- [Mobilenet v1](#11) *caffe model 可以在[这儿](https://github.com/shicai/MobileNet-Caffe)下载* | ||
- [Mobilenet v2](#22) *caffe model 可以在[这儿](https://github.com/shicai/MobileNet-Caffe)下载* | ||
- [mobilenet-ssd](#33) *caffe model 可以在[这儿](https://github.com/chuanqi305/MobileNet-SSD)下载* | ||
|
||
### <span id = '11'> mobilenetv1 </span> | ||
|
||
|platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| | ||
|:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:| | ||
|麒麟960|107.7ms|61.1ms|38.2ms|152.8ms|85.2ms|51.9ms|152.6ms|nan|nan| | ||
|高通835|105.7ms|63.1ms|~~46.8ms~~|152.7ms|87.0ms|~~92.7ms~~|146.9ms|nan|nan| | ||
|高通653|120.3ms|64.2ms|46.6ms|202.5ms|117.6ms|84.8ms|158.6ms|nan|nan| | ||
|
||
### <span id = '22'> mobilenetv2 </span> | ||
|
||
|platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| | ||
|:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:| | ||
|麒麟960|93.1ms|53.9ms|34.8ms|144.4ms|84.3ms|55.3ms|100.6ms|nan|nan| | ||
|高通835|93.0ms|55.6ms|41.1ms|139.1ms|88.4ms|58.1ms|95.2ms|nan|nan| | ||
|高通653|106.6ms|64.2ms|48.0ms|199.9ms|125.1ms|98.9ms|108.5ms|nan|nan| | ||
|
||
### <span id = '33'> mobilenet-ssd </span> | ||
|
||
|platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| | ||
|:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:| | ||
|麒麟960|213.9ms|120.5ms|74.5ms|307.9ms|166.5ms|104.2ms|nan|nan|nan| | ||
|高通835|213.0ms|125.7ms|~~98.4ms~~|292.9ms|177.9ms|~~167.8ms~~|nan|nan|nan| | ||
|高通653|236.0ms|129.6ms|96.0ms|377.7ms|228.9ms|165.0ms|nan|nan|nan | ||
|
||
## How to run those Benchmark models? | ||
|
||
1. 首先, 使用[External Converter](../docs/Manual/Converter_en.md)对caffe model 进行转换 | ||
2. 然后将转换后的Anakin model和编译好的benchmark_arm 二进制文件通过'adb push'命令上传至测试机 | ||
3. 接着在测试机含有Anakin model的目录中运行'./benchmark_arm ./ anakin_model.anakin.bin 1 10 10 1' 命令 | ||
4. 最后,终端显示器上将会打印该模型的运行时间 | ||
5. 其中运行命令的参数个数和含义可以通过运行'./benchmark_arm'看到 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../anakin/examples/example_introduction_cn.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,175 @@ | ||
# Anakin GPU Benchmark | ||
|
||
## Machine: | ||
|
||
> CPU: `12-core Intel(R) Xeon(R) CPU E5-2620 v2 @2.10GHz` | ||
> GPU: `Tesla P4` | ||
> cuDNN: `v7` | ||
|
||
## Counterpart of anakin : | ||
|
||
The counterpart of **`Anakin`** is the acknowledged high performance inference engine **`NVIDIA TensorRT 3`** , The models which TensorRT 3 doesn't support we use the custom plugins to support. | ||
|
||
## Benchmark Model | ||
|
||
The following convolutional neural networks are tested with both `Anakin` and `TenorRT3`. | ||
You can use pretrained caffe model or the model trained by youself. | ||
|
||
> Please note that you should transform caffe model or others into anakin model with the help of [`external converter ->`](../docs/Manual/Converter_en.md) | ||
|
||
- [Vgg16](#1) *caffe model can be found [here->](https://gist.github.com/jimmie33/27c1c0a7736ba66c2395)* | ||
- [Yolo](#2) *caffe model can be found [here->](https://github.com/hojel/caffe-yolo-model)* | ||
- [Resnet50](#3) *caffe model can be found [here->](https://github.com/KaimingHe/deep-residual-networks#models)* | ||
- [Resnet101](#4) *caffe model can be found [here->](https://github.com/KaimingHe/deep-residual-networks#models)* | ||
- [Mobilenet v1](#5) *caffe model can be found [here->](https://github.com/shicai/MobileNet-Caffe)* | ||
- [Mobilenet v2](#6) *caffe model can be found [here->](https://github.com/shicai/MobileNet-Caffe)* | ||
- [RNN](#7) *not support yet* | ||
|
||
We tested them on single-GPU with single-thread. | ||
|
||
### <span id = '1'>VGG16 </span> | ||
|
||
- Latency (`ms`) of different batch | ||
|
||
| BatchSize | TensorRT | Anakin | | ||
| --- | --- | --- | | ||
| 1 | 8.8690 | 8.2815 | | ||
| 2 | 15.5344 | 13.9116 | | ||
| 4 | 26.6000 | 21.8747 | | ||
| 8 | 49.8279 | 40.4076 | | ||
| 32 | 188.6270 | 163.7660 | | ||
|
||
- GPU Memory Used (`MB`) | ||
|
||
| BatchSize | TensorRT | Anakin | | ||
| --- | --- | --- | | ||
| 1 | 963 | 997 | | ||
| 2 | 965 | 1039 | | ||
| 4 | 991 | 1115 | | ||
| 8 | 1067 | 1269 | | ||
| 32 | 1715 | 2193 | | ||
|
||
|
||
### <span id = '2'>Yolo </span> | ||
|
||
- Latency (`ms`) of different batch | ||
|
||
| BatchSize | TensorRT | Anakin | | ||
| --- | --- | --- | | ||
| 1 | 16.4596| 15.2124 | | ||
| 2 | 26.6347| 25.0442 | | ||
| 4 | 43.3695| 43.5017 | | ||
| 8 | 80.9139 | 80.9880 | | ||
| 32 | 293.8080| 310.8810 | | ||
|
||
- GPU Memory Used (`MB`) | ||
|
||
| BatchSize | TensorRT | Anakin | | ||
| --- | --- | --- | | ||
| 1 | 1569 | 1775 | | ||
| 2 | 1649 | 1815 | | ||
| 4 | 1709 | 1887 | | ||
| 8 | 1731 | 2031 | | ||
| 32 | 2253 | 2907 | | ||
|
||
### <span id = '3'> Resnet50 </span> | ||
|
||
- Latency (`ms`) of different batch | ||
|
||
| BatchSize | TensorRT | Anakin | | ||
| --- | --- | --- | | ||
| 1 | 4.2459 | 4.1061 | | ||
| 2 | 6.2627 | 6.5159 | | ||
| 4 | 10.1277 | 11.3327 | | ||
| 8 | 17.8209 | 20.6680 | | ||
| 32 | 65.8582 | 77.8858 | | ||
|
||
- GPU Memory Used (`MB`) | ||
|
||
| BatchSize | TensorRT | Anakin | | ||
| --- | --- | --- | | ||
| 1 | 531 | 503 | | ||
| 2 | 543 | 517 | | ||
| 4 | 583 | 541 | | ||
| 8 | 611 | 589 | | ||
| 32 | 809 | 879 | | ||
|
||
### <span id = '4'> Resnet101 </span> | ||
|
||
- Latency (`ms`) of different batch | ||
|
||
| BatchSize | TensorRT | Anakin | | ||
| --- | --- | --- | | ||
| 1 | 7.5562 | 7.0837 | | ||
| 2 | 11.6023 | 11.4079 | | ||
| 4 | 18.3650 | 20.0493 | | ||
| 8 | 32.7632 | 36.0648 | | ||
| 32 | 123.2550 | 135.4880 | | ||
|
||
- GPU Memory Used (`MB)` | ||
|
||
| BatchSize | TensorRT | Anakin | | ||
| --- | --- | --- | | ||
| 1 | 701 | 683 | | ||
| 2 | 713 | 697 | | ||
| 4 | 793 | 721 | | ||
| 8 | 819 | 769 | | ||
| 32 | 1043 | 1059 | | ||
|
||
### <span id = '5'> MobileNet V1 </span> | ||
|
||
- Latency (`ms`) of different batch | ||
|
||
| BatchSize | TensorRT | Anakin | | ||
| --- | --- | --- | | ||
| 1 | 45.5156 | 1.3947 | | ||
| 2 | 46.5585 | 2.5483 | | ||
| 4 | 48.4242 | 4.3404 | | ||
| 8 | 52.7957 | 8.1513 | | ||
| 32 | 83.2519 | 31.3178 | | ||
|
||
- GPU Memory Used (`MB`) | ||
|
||
| BatchSize | TensorRT | Anakin | | ||
| --- | --- | --- | | ||
| 1 | 329 | 283 | | ||
| 2 | 345 | 289 | | ||
| 4 | 371 | 299 | | ||
| 8 | 393 | 319 | | ||
| 32 | 531 | 433 | | ||
|
||
### <span id = '6'> MobileNet V2</span> | ||
|
||
- Latency (`ms`) of different batch | ||
|
||
| BatchSize | TensorRT | Anakin | | ||
| --- | --- | --- | | ||
| 1 | 65.6861 | 2.9842 | | ||
| 2 | 66.6814 | 4.7472 | | ||
| 4 | 69.7114 | 7.4163 | | ||
| 8 | 76.1092 | 12.8779 | | ||
| 32 | 124.9810 | 47.2142 | | ||
|
||
- GPU Memory Used (`MB`) | ||
|
||
| BatchSize | TensorRT | Anakin | | ||
| --- | --- | --- | | ||
| 1 | 341 | 293 | | ||
| 2 | 353 | 301 | | ||
| 4 | 385 | 319 | | ||
| 8 | 421 | 351 | | ||
| 32 | 637 | 551 | | ||
|
||
## How to run those Benchmark models? | ||
|
||
> 1. At first, you should parse the caffe model with [`external converter`](https://github.com/PaddlePaddle/Anakin/blob/b95f31e19993a192e7428b4fcf852b9fe9860e5f/docs/Manual/Converter_en.md). | ||
> 2. Switch to *source_root/benchmark/CNN* directory. Use 'mkdir ./models' to create ./models and put anakin models into this file. | ||
> 3. Use command 'sh run.sh', we will create files in logs to save model log with different batch size. Finally, model latency summary will be displayed on the screen. | ||
> 4. If you want to get more detailed information with op time, you can modify CMakeLists.txt with setting `ENABLE_OP_TIMER` to `YES`, then recompile and run. You will find detailed information in model log file. | ||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../anakin/docs/Manual/Tutorial_ch.md |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../anakin/docs/Manual/Converter_ch.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../anakin/docs/Manual/addCustomOp.md |
1 change: 1 addition & 0 deletions
1
source/advanced_usage/deploy/how_to_support_new_device_in_anakin.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../anakin/docs/Manual/addCustomDevice.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../mobile/doc/images/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,45 @@ | ||
######## | ||
预测部署 | ||
######## | ||
|
||
服务端 | ||
###### | ||
服务器端部署 - 原生引擎 | ||
####################### | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
build_and_install_lib_cn.rst | ||
native_inference_engine.rst | ||
|
||
服务器端部署 - Anakin | ||
##################### | ||
|
||
|
||
使用文档 | ||
~~~~~~~~ | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
install_anakin.md | ||
convert_paddle_to_anakin.md | ||
run_anakin_on_arm.md | ||
anakin_tutorial.md | ||
anakin_example.md | ||
anakin_gpu_benchmark.md | ||
anakin_arm_benchmark.md | ||
|
||
开发文档 | ||
~~~~~~~~ | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
how_to_add_anakin_op.md | ||
how_to_support_new_device_in_anakin.md | ||
|
||
移动端部署 | ||
########## | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
移动端 | ||
###### | ||
mobile_build.md | ||
mobile_design.md | ||
mobile_dev.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../anakin/docs/Manual/INSTALL_ch.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../mobile/doc/build.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../mobile/doc/design_doc.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../mobile/doc/development_doc.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../anakin/docs/Manual/run_on_arm_ch.md |
Submodule anakin
deleted from
4e7732