Skip to content

Latest commit

 

History

History
49 lines (32 loc) · 2.32 KB

benchmark.md

File metadata and controls

49 lines (32 loc) · 2.32 KB

简体中文 | English

Benchmark

We compare our results with some popular frameworks and official releases in terms of speed.

Environment

Hardware

  • 8 NVIDIA Tesla V100 (16G) GPUs
  • Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz

Software

  • Python 3.7
  • PaddlePaddle2.0
  • CUDA 10.1
  • CUDNN 7.6.3
  • NCCL 2.1.15
  • GCC 8.2.0

Experiments and Statistics

The statistic is the average training time, including data processing and model training time, and the training speed is measured with ips(instance per second). Note that we skip the first 50 iters as they may contain the device warmup time.

Here we compare PaddleVideo with the other video understanding toolkits in the same data and model settings.

To ensure the fairness of the comparison, the comparison experiments were conducted under the same hardware environment and using the same dataset. The dataset we used is generated by the data preparation, and in each model setting, the same data preprocessing methods are applied to make sure the same feature input.

Significant improvement can be observed when comparing with other video understanding framework as shown in the table below, Especially the Slowfast model is nearly 2x faster than the counterparts.

Results

Recognizers

Model batch size x gpus PaddleVideo(ips) Reference(ips) MMAction2 (ips) PySlowFast (ips)
TSM 16x8 58.1 46.04(temporal-shift-module) To do X
PPTSM 16x8 57.6 X X X
TSN 16x8 841.1 To do (tsn-pytorch) To do X
Slowfast 16x8 99.5 X To do 43.2
Attention_LSTM 128x8 112.6 X X X

Localizers

Model PaddleVideo(ips) MMAction2 (ips) BMN(boundary matching network) (ips)
BMN 43.84 x x