TSM

Introduction

Temporal Shift Module (TSM) is a popular model that attracts more attention at present. The method of moving through channels greatly improves the utilization ability of temporal information without increasing any additional number of parameters and calculation amount. Moreover, due to its lightweight and efficient characteristics, it is very suitable for industrial landing.

This code implemented single RGB stream of TSM networks. Backbone is ResNet-50.

Please refer to the ICCV 2019 paper for details TSM: Temporal Shift Module for Efficient Video Understanding

Data

Please refer to Kinetics-400 data download and preparation k400 data preparation

Please refer to UCF101 data download and preparation ucf101 data preparation

Train

Train on the Kinetics-400 dataset

download pretrain-model

Please download ResNet50_pretrain.pdparams as pretraind model:

wget https://videotag.bj.bcebos.com/PaddleVideo/PretrainModel/ResNet50_pretrain.pdparams

Open PaddleVideo/configs/recognition/tsm/tsm_k400_frames.yaml, and fill in the downloaded weight path below pretrained:
```
MODEL:
	framework: "Recognizer2D"
		backbone:
		name: "ResNetTSM"
		pretrained: your weight path
```

Start training

By specifying different configuration files, different data formats/data sets can be used for training. Taking the training configuration of Kinetics-400 data set + 8 cards + frames format as an example, the startup command is as follows (more training commands can be viewed in PaddleVideo/run.sh).
```
python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" --log_dir=log_tsm main.py  --validate -c configs/recognition/tsm/tsm_k400_frames.yaml
```

Training Kinetics-400 dataset of videos format using scripts.

python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" --log_dir=log_tsm main.py  --validate -c configs/recognition/tsm/tsm_k400_videos.yaml

AMP is useful for speeding up training, scripts as follows:

export FLAGS_conv_workspace_size_limit=800 #MB
export FLAGS_cudnn_exhaustive_search=1
export FLAGS_cudnn_batchnorm_spatial_persistent=1

python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" --log_dir=log_tsm main.py  --amp --validate -c configs/recognition/tsm/tsm_k400_frames.yaml

AMP works better with NHWC data format, scripts as follows:

export FLAGS_conv_workspace_size_limit=800 #MB
export FLAGS_cudnn_exhaustive_search=1
export FLAGS_cudnn_batchnorm_spatial_persistent=1

python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" --log_dir=log_tsm main.py  --amp --validate -c configs/recognition/tsm/tsm_k400_frames_nhwc.yaml

For the config file usage，please refer to config.

Train on UCF-101 dataset

download pretrain-model

Load the TSM model we trained on Kinetics-400 TSM_k400.pdparams, or download it through the command line
```
wget https://videotag.bj.bcebos.com/PaddleVideo-release2.1/TSM/TSM_k400.pdparams
```

Open PaddleVideo/configs/recognition/tsm/tsm_ucf101_frames.yaml, and fill in the downloaded weight path below pretrained:

MODEL:
    framework: "Recognizer2D"
    backbone:
        name: "ResNetTSM"
        pretrained: your weight path

Start training

By specifying different configuration files, different data formats/data sets can be used for training. Taking the training configuration of Kinetics-400 data set + 8 cards + frames format as an example, the startup command is as follows (more training commands can be viewed in PaddleVideo/run.sh).
```
python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3" --log_dir=log_tsm main.py  --validate -c configs/recognition/tsm/tsm_ucf101_frames.yaml
```

Training UCF-101 dataset of videos format using scripts.

python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3" --log_dir=log_tsm main.py  --validate -c configs/recognition/tsm/tsm_ucf101_videos.yaml

AMP is useful for speeding up training, scripts as follows:

export FLAGS_conv_workspace_size_limit=800 #MB
export FLAGS_cudnn_exhaustive_search=1
export FLAGS_cudnn_batchnorm_spatial_persistent=1

python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3" --log_dir=log_tsm main.py  --amp --validate -c configs/recognition/tsm/tsm_ucf101_frames.yaml

AMP works better with NHWC data format, scripts as follows:

export FLAGS_conv_workspace_size_limit=800 #MB
export FLAGS_cudnn_exhaustive_search=1
export FLAGS_cudnn_batchnorm_spatial_persistent=1

python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3" --log_dir=log_tsm main.py  --amp --validate -c configs/recognition/tsm/tsm_ucf101_frames_nhwc.yaml

Test

Put the weight of the model to be tested into the output/TSM/ directory, the test command is as follows

python3 main.py --test -c configs/recognition/tsm/tsm.yaml -w output/TSM/TSM_best.pdparams

When the test configuration uses the following parameters, the evaluation accuracy on the validation data set of Kinetics-400 is as follows:

backbone	Sampling method	Training Strategy	num_seg	target_size	Top-1	checkpoints
ResNet50	Uniform	NCHW	8	224	71.06	TSM_k400.pdparams

When the test configuration uses the following parameters, the evaluation accuracy on the validation data set of UCF-101 is as follows:

backbone	Sampling method	Training Strategy	num_seg	target_size	Top-1	checkpoints
ResNet50	Uniform	NCHW	8	224	94.42	TSM_ucf101_nchw.pdparams
ResNet50	Uniform	NCHW+AMP	8	224	94.40	TSM_ucf101_amp_nchw.pdparams
ResNet50	Uniform	NHWC+AMP	8	224	94.55	TSM_ucf101_amp_nhwc.pdparams

Inference

export inference model

To get model architecture file TSM.pdmodel and parameters file TSM.pdiparams, use:

python3.7 tools/export_model.py -c configs/recognition/tsm/tsm_k400_frames.yaml \
                                -p data/TSM_k400.pdparams \
                                -o inference/TSM

Args usage please refer to Model Inference.

infer

python3.7 tools/predict.py --input_file data/example.avi \
                           --config configs/recognition/tsm/tsm_k400_frames.yaml \
                           --model_file inference/TSM/TSM.pdmodel \
                           --params_file inference/TSM/TSM.pdiparams \
                           --use_gpu=True \
                           --use_tensorrt=False

Implementation details

data processing

The model reads the mp4 data in the Kinetics-400 data set, first divides each piece of video data into num_seg segments, and then uniformly extracts 1 frame of image from each segment to obtain sparsely sampled num_seg video frames. Then do the same random data enhancement to this num_seg frame image, including multi-scale random cropping, random left and right flips, data normalization, etc., and finally zoom to target_size.

Training strategy

Use Momentum optimization algorithm training, momentum=0.9
Using L2_Decay, the weight attenuation coefficient is 1e-4
Using global gradient clipping, the clipping factor is 20.0
The total number of epochs is 50, and the learning rate will be attenuated by 0.1 times when the epoch reaches 20 and 40
The learning rate of the weight and bias of the FC layer are respectively 5 times and 10 times the overall learning rate, and the bias does not set L2_Decay
Dropout_ratio=0.5

Parameter initialization

Initialize the weight of the FC layer with the normal distribution of Normal(mean=0, std=0.001), and initialize the bias of the FC layer with a constant of 0

Reference

TSM: Temporal Shift Module for Efficient Video Understanding, Ji Lin, Chuang Gan, Song Han

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tsm.md

tsm.md

TSM

Contents

Introduction

Data

Train

Train on the Kinetics-400 dataset

download pretrain-model

Start training

Train on UCF-101 dataset

download pretrain-model

Start training

Test

Inference

export inference model

infer

Implementation details

data processing

Training strategy

Parameter initialization

Reference

Files

tsm.md

Latest commit

History

tsm.md

File metadata and controls

TSM

Contents

Introduction

Data

Train

Train on the Kinetics-400 dataset

download pretrain-model

Start training

Train on UCF-101 dataset

download pretrain-model

Start training

Test

Inference

export inference model

infer

Implementation details

data processing

Training strategy

Parameter initialization

Reference