A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

HomePage | Colab | arXiv | YouTube

This project is the implementation of our Paper: A Dynamic Multi-Scale Voxel Flow Network for Video Prediction, which is accepted by CVPR2023 (highlight✨, 10% of accepted papers). We proposed a SOTA model for Video Prediction.

Poster | 研究历程 | 中文论文 | rebuttal (3WA->1WA2SA) | Demo

Usage

Installation

git clone https://github.com/megvii-research/CVPR2023-DMVFN.git
cd CVPR2023-DMVFN
pip3 install -r requirements.txt

Download the pretrained models from Google Drive. (百度网盘 password:33ly), and move the pretrained parameters to CVPR2023-DMVFN/pretrained_models/*

pip install gdown
mkdir pretrained_models && cd pretrained_models
gdown --id 1jILbS8Gm4E5Xx4tDCPZh_7rId0eo8r9W
gdown --id 1WrV30prRiS4hWOQBnVPUxdaTlp9XxmVK
gdown --id 14_xQ3Yl3mO89hr28hbcQW3h63lLrcYY0
cd ..

Data Preparation

In this section, we will download all parts including training and testing sets. If you only need the test set, please jump to Directly download test splits.

The final data folder CVPR2023-DMVFN/data/ should be organized like this:

data
├── Cityscapes
│   ├── train (Citysapes images in 512x1024)
│   │   └── 000000
│   │   └── ...
│   │   └── 002974
│   ├── test (Citysapes images in 512x1024)
│   │   └── 000000
│   │   └── ...
│   │   └── 000499
├── KITTI
│   ├── train (Kitti images in 256x832)
│   │   └── 000000
│   │   └── ...
│   │   └── 013499
│   ├── test (Kitti images in 256x832)
│   │   └── 000000
│   │   └── ...
│   │   └── 001336
├── UCF101
│   ├── v_ApplyEyeMakeup_g08_c01
│   └── ...
└── vimeo_interp_test
    └── target
        └── 00001
        └── ...

Cityscapes

Download the Cityscapes dataset leftImg8bit_sequence_trainvaltest.zip from here.
Unzip leftImg8bit_sequence_trainvaltest.zip.

unzip leftImg8bit_sequence_trainvaltest.zip

Run ./utils/prepare_city.py

python3 ./utils/prepare_city.py

KITTI

Download the KITTI dataset from Google Drive. You need to register and login, then download all videos.
Our training split and testing split are consistent with YueWuHKUST/CVPR2020-FutureVideoSynthesis.

Videos for training split:
2011_09_26_drive_0001_sync  2011_09_26_drive_0018_sync  2011_09_26_drive_0104_sync 2011_09_26_drive_0002_sync  2011_09_26_drive_0048_sync
2011_09_26_drive_0106_sync  2011_09_26_drive_0005_sync  2011_09_26_drive_0051_sync 2011_09_26_drive_0113_sync  2011_09_26_drive_0009_sync  
2011_09_26_drive_0056_sync  2011_09_26_drive_0117_sync  2011_09_26_drive_0011_sync 2011_09_26_drive_0057_sync  2011_09_28_drive_0001_sync
2011_09_26_drive_0013_sync  2011_09_26_drive_0059_sync  2011_09_28_drive_0002_sync 2011_09_26_drive_0014_sync  2011_09_26_drive_0091_sync 
2011_09_29_drive_0026_sync  2011_09_26_drive_0017_sync  2011_09_26_drive_0095_sync 2011_09_29_drive_0071_sync

Videos for testing split:
2011_09_26_drive_0060_sync  2011_09_26_drive_0084_sync  2011_09_26_drive_0093_sync  2011_09_26_drive_0096_sync

Unzip all files, then reorganize the dataset as follows:

mkdir train_or
unzip 2011_09_26_drive_0001_sync
mv 2011_09_26_drive_0001_sync/image02/data/ train_or/2011_09_26_drive_0001_sync

Do this command for all files. We use image02 and image03 for training.

Run ./utils/prepare_kitti.py.

python3 ./utils/prepare_kitti.py

Do the same for the testing split.

UCF101

We extract RGB frames from each video in UCF101 dataset and save as .jpg image.

Download the preprocessed data directly from feichtenhofer/twostreamfusion for convenience.

wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.001
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.002
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.003

cat ucf101_jpegs_256.zip* > ucf101_jpegs_256.zip
unzip ucf101_jpegs_256.zip

Vimeo90K

Download Vimeo90K dataset directly from here.
Unzip the dataset.

Run

😆Training

For Cityscapes Dataset:

python3 -m torch.distributed.launch --nproc_per_node=8 \
--master_port=4321 ./scripts/train.py \
--train_dataset CityTrainDataset \
--val_datasets CityValDataset \
--batch_size 8 \
--num_gpu 8

For KITTI Dataset:

python3 -m torch.distributed.launch --nproc_per_node=8 \
--master_port=4321 ./scripts/train.py \
--train_dataset KittiTrainDataset \
--val_datasets KittiValDataset \
--batch_size 8 \
--num_gpu 8

For DAVIS and Vimeo Dataset:

python3 -m torch.distributed.launch --nproc_per_node=8 \
--master_port=4321 ./scripts/train.py \
--train_dataset UCF101TrainDataset \
--val_datasets DavisValDataset VimeoValDataset \
--batch_size 8 \
--num_gpu 8

🤔️Testing

Directly download test splits of different datasets

Download Cityscapes_test directly from Google Drive.(百度网盘 password: wk7k)

Download KITTI_test directly from Google Drive.(百度网盘 password: e7da)

Download DAVIS_test directly from Google Drive.(百度网盘 password: mczk)

Download Vimeo_test directly from Google Drive.(百度网盘 password: 0mjo)

Run the following command to generate test results of DMVFN model. The --val_datasets can be CityValDataset, KittiValDataset, DavisValDataset, and VimeoValDataset. --save_image can be disabled.

python3 ./scripts/test.py \
--val_datasets CityValDataset [optional: KittiValDataset, DavisValDataset, VimeoValDataset] \
--load_path path_of_pretrained_weights \
--save_image

Image results

We provide the image results of DMVFN on various datasets (Cityscapes, KITTI, DAVIS and Vimeo) in 百度网盘 (password: k7eb).

We also provide the results of DMVFN (without routing) in 百度网盘 (password: 8zo9).

Test the image results

Run the following command to directly test the image results.

python3 ./scripts/test_ssim_lpips.py

😋Single test

We provide a simple code to predict a t+1 image with t-1 and t images. Please run the following command:

python3 ./scripts/single_test.py \
--image_0_path ./images/sample_img_0.png \
--image_1_path ./images/sample_img_1.png \
--load_path path_of_pretrained_weights \
--output_dir pred.png

Recommend

We sincerely recommend some related papers:

ECCV22 - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

CVPR22 - Optimizing Video Prediction via Video Frame Interpolation

Citation

If you think this project is helpful, please feel free to leave a star or cite our paper:

@inproceedings{hu2023dmvfn,
  title={A Dynamic Multi-Scale Voxel Flow Network for Video Prediction},
  author={Hu, Xiaotao and Huang, Zhewei and Huang, Ailin and Xu, Jun and Zhou, Shuchang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
dataset		dataset
images		images
loss		loss
model		model
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md
colab_demo.ipynb		colab_demo.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

HomePage | Colab | arXiv | YouTube

Usage

Installation

Data Preparation

Run

😆Training

🤔️Testing

😋Single test

Recommend

Citation

About

Contributors 3

Languages

License

hzwer/CVPR2023-DMVFN

Folders and files

Latest commit

History

Repository files navigation

A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

HomePage | Colab | arXiv | YouTube

Usage

Installation

Data Preparation

Run

😆Training

🤔️Testing

😋Single test

Recommend

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

Languages