Practical Single-Image and Temporal Upscaling via Swin-Conv-UNet

Codes

Single-Image inference

python test_sisr.py --model_path pretrained_models/scunet_color_real_psnr.pth --input example/lr/ --output example/sr/ --depth 16

Temporal inference
```
python test_vsr.py --model_path pretrained_models/2x_eula_anifilm_vsr.pth --input example/lr/ --output example/sr/ --depth 16
```
Temporal models are curently not publicly available, and existing SCUNet models are not compatible with the temporal architecture. If a folder of images is provided as input, they all must match in resolution.

Both architectures support image inputs with video output and vice-versa. Input and output arguments can be a path to either a single image, a folder of images, or a video file. To output to a video, additional arguments --video and --res must be provided, to select the output video codec and the output resolution respectively. Additional ffmpeg arguments such as --profile, --preset, --crf, and --pix_fmt can also be provided.

Additionally, the --presize argument can be used to resize the input to the target resolution divided by the scale, which can be produce better results when the output resolution is short of the target resolution or if the original aspect ratio does not match the target aspect ratio.

python test_vsr.py --model_path pretrained_models/tscu_2x.pth --input example/lr_video.mp4 --output example/sr_video.mp4 --video libx264 --res 1440:1080 --presize --depth 16

Original Paper

[Paper]

@article{zhang2022practical,
title={Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis},
author={Zhang, Kai and Li, Yawei and Liang, Jingyun and Cao, Jiezhang and Zhang, Yulun and Tang, Hao and Timofte, Radu and Van Gool, Luc},
journal={arXiv preprint},
year={2022}
}

Swin-Conv-UNet (SCUNet) denoising network

The architecture of the proposed Swin-Conv-UNet (SCUNet) denoising network. SCUNet exploits the swin-conv (SC) block as the main building block of a UNet backbone. In each SC block, the input is first passed through a 1×1 convolution, and subsequently is split evenly into two feature map groups, each of which is then fed into a swin transformer (SwinT) block and residual 3×3 convolutional (RConv) block, respectively; after that, the outputs of SwinT block and RConv block are concatenated and then passed through a 1×1 convolution to produce the residual of the input. “SConv” and “TConv” denote 2×2 strided convolution with stride 2 and 2×2 transposed convolution with stride 2, respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
figs		figs
models		models
pretrained_models		pretrained_models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test_sisr.py		test_sisr.py
test_vsr.py		test_vsr.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Practical Single-Image and Temporal Upscaling via Swin-Conv-UNet

Codes

Original Paper

Swin-Conv-UNet (SCUNet) denoising network

About

Releases

Packages

Languages

License

aaf6aa/SCUNet

Folders and files

Latest commit

History

Repository files navigation

Practical Single-Image and Temporal Upscaling via Swin-Conv-UNet

Codes

Original Paper

Swin-Conv-UNet (SCUNet) denoising network

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages