TOFlow: Video Enhancement with Task-Oriented Flow

This repository is based on our IJCV publication TOFlow: Video Enhancement with Task-Oriented Flow (PDF). It contains pre-trained models and a demo code. It also includes the description and download scripts for the Vimeo-90K dataset we collected. If you used this code or dataset in your work, please cite:

@article{xue2019video,
  title={Video Enhancement with Task-Oriented Flow},
  author={Xue, Tianfan and Chen, Baian and Wu, Jiajun and Wei, Donglai and Freeman, William T},
  journal={International Journal of Computer Vision (IJCV)},
  volume={127},
  number={8},
  pages={1106--1125},
  year={2019},
  publisher={Springer}
}

Video Demo

If you cannot access YouTube, please download 1080p video from here.

Prerequisites

Torch

Our implementation is based on Torch 7 (http://torch.ch).

CUDA [optional]

CUDA is suggested (https://developer.nvidia.com/cuda-toolkit) for fast inference. The demo code is still runnable without CUDA, but much slower.

Matlab [optional]

We use Matlab for generating video denoising/super-resolution dataset and quantitative evaluation require Matlab installation (https://www.mathworks.com/products/matlab.html). It is not necessary for the demo code.

FFmpeg [optional]

We use FFmpeg (http://ffmpeg.org) for generating video deblocking dataset. It is not necessary for the demo code.

Installation

Our current release has been tested on Ubuntu 14.04.

Clone the repository

git clone https://github.com/anchen1011/toflow.git

Install dependency

cd toflow/src/stnbhwd
luarocks make

This will install 'stn' package for Lua. The list of components:

require 'stn'
nn.AffineGridGeneratorBHWD(height, width)
-- takes B x 2 x 3 affine transform matrices as input, 
-- outputs a height x width grid in normalized [-1,1] coordinates
-- output layout is B,H,W,2 where the first coordinate in the 4th dimension is y, and the second is x
nn.BilinearSamplerBHWD()
-- takes a table {inputImages, grids} as inputs
-- outputs the interpolated images according to the grids
-- inputImages is a batch of samples in BHWD layout
-- grids is a batch of grids (output of AffineGridGeneratorBHWD)
-- output is also BHWD
nn.AffineTransformMatrixGenerator(useRotation, useScale, useTranslation)
-- takes a B x nbParams tensor as inputs
-- nbParams depends on the contrained transformation
-- The parameters for the selected transformation(s) should be supplied in the
-- following order: rotationAngle, scaleFactor, translationX, translationY
-- If no transformation is specified, it generates a generic affine transformation (nbParams = 6)
-- outputs B x 2 x 3 affine transform matrices

Download pretrained models (104MB)

cd ../../
./download_models.sh

Run Demo Code

cd src
th demo.lua -mode interp -inpath ../data/example/low_frame_rate
th demo.lua -mode denoise -inpath ../data/example/noisy
th demo.lua -mode deblock -inpath ../data/example/block
th demo.lua -mode sr -inpath ../data/example/blur

There are a few options in demo.lua:

nocuda: Set this option when CUDA is not available.

gpuId: GPU device ID.

mode: There are four options:

'interp': temporal frame interpolation
'denoise': video denoising
'deblock': video deblocking
'sr': video super-resolution

inpath: The path to the input sequence.

outpath: The path to where the result stores (default is ../demo_output).

Vimeo-90K Dataset

We also build a large-scale, high-quality video dataset, Vimeo-90K, designed for the following four video processing tasks: temporal frame interpolation, video denoising, video deblocking, and video super-resolution.

Vimeo-90K is built upon 5,846 selected videos downloaded from vimeo.com, which covers large variaty of scenes and actions. This video set is a subset of Vimeo-90K dataset is a subset of AoT dataset and all video links are here.

We further chop these videos to 89,800 video clips and build two datasets from these clips:

Triplet dataset for temporal frame interpolation

The triplet dataset consists of 73171 3-frame sequences with a fixed resolution of 448 x 256, extracted from 15k selected video clips from Vimeo-90K. This dataset is designed for temporal frame interpolation. Download links are:

Test set only: zip (1.7GB).

Both training and test set: zip (33GB).

Septuplet dataset for video denoising, super-resolution, and deblocking

The septuplet dataset consists of 91701 7-frame sequences with fixed resolution 448 x 256, extracted from 39k selected video clips from Vimeo-90k. This dataset is designed to video denoising, deblocking, and super-resolution.

The test set for video denoising: zip (16GB).

The test set for video deblocking: zip (11GB).

The test set for video super-resolution: zip (6GB).

The original test set (not downsampled or downgraded by noise): zip (15GB).

The original training + test set (consists of 91701 sequences, which are not downsampled or downgraded by noise): zip (82GB).

Generate Testing Sequences

See src/generate_testing_sample for the functions to generate noisy/low-resolution sequences.

To generate noisy sequences with Matlab under src/generate_testing_sample, run

add_noise_to_input(data_path, output_path);

and the results will be stored under output_path

To generate blur sequences with Matlab, run

blur_input(data_path, output_path);

and the results will be stored under output_path

Blocky sequences are compressed by FFmpeg. Our test set is generated with the following configuration:

ffmpeg -i *.png -q 20 -vcodec jpeg2000 -format j2k name.mov

Run Quantitative Evaluation

Download all four Vimeo testsets (52G)

./download_testset.sh

Run inference on Vimeo testsets

cd src
th demo_vimeo90k.lua -mode interp
th demo_vimeo90k.lua -mode denoise
th demo_vimeo90k.lua -mode deblock
th demo_vimeo90k.lua -mode sr

Evaluation

We use three metrics to evaluate the performance of our algorithm: PSNR, SSIM, and Abs metrics. To run evaluation, execute following commands in Matlab:

cd src/evaluation
evaluate(output_dir, target_dir);

For example, to evaluate results generated in the previous step, run

cd src/evaluation
evaluate('../../output/interp', '../../data/vimeo_interp_test/target', 'interp')
evaluate('../../output/denoise', '../../data/vimeo_test_clean/sequences', 'denoise')
evaluate('../../output/deblock', '../../data/vimeo_test_clean/sequences', 'deblock')
evaluate('../../output/sr', '../../data/vimeo_test_clean/sequences', 'sr')

It is assumed that our datasets are unzipped under data/ and not renamed. It is also assumed that results are put under [output_root]/[task_name] e.g. output/sr output/interp output/denoise output/deblock, with exactly the same subfolder structure as our datasets.

References

Our warping code is based on qassemoquab/stnbhwd.
Our flow utilities and transformation utilities are based on anuragranj/spynet
There is an unofficial PyTorch implementation by coldog2333/pytoflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TOFlow: Video Enhancement with Task-Oriented Flow

Video Demo

Prerequisites

Torch

CUDA [optional]

Matlab [optional]

FFmpeg [optional]

Installation

Clone the repository

Install dependency

Download pretrained models (104MB)

Run Demo Code

Vimeo-90K Dataset

Triplet dataset for temporal frame interpolation

Septuplet dataset for video denoising, super-resolution, and deblocking

Generate Testing Sequences

Run Quantitative Evaluation

Download all four Vimeo testsets (52G)

Run inference on Vimeo testsets

Evaluation

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

TOFlow: Video Enhancement with Task-Oriented Flow

Video Demo

Prerequisites

Torch

CUDA [optional]

Matlab [optional]

FFmpeg [optional]

Installation

Clone the repository

Install dependency

Download pretrained models (104MB)

Run Demo Code

Vimeo-90K Dataset

Triplet dataset for temporal frame interpolation

Septuplet dataset for video denoising, super-resolution, and deblocking

Generate Testing Sequences

Run Quantitative Evaluation

Download all four Vimeo testsets (52G)

Run inference on Vimeo testsets

Evaluation

References