This repository is the official PyTorch implementation of SeD: Semantic-Aware Discriminator for Image Super-Resolution (CVPR24)
- 2024-3-24: Updated training codes!
- 2024-4-11: Updated test codes and U+SeD.
Generative Adversarial Networks (GANs) have been widely used to recover vivid textures in image super-resolution (SR) tasks. In particular, one discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality images in an adversarial training manner. However, the distribution learning is overly coarse-grained, which is susceptible to virtual textures and causes counter-intuitive generation results. To mitigate this, we propose the simple and effective Semantic-aware Discriminator (denoted as SeD), which encourages the SR network to learn the fine-grained distributions by introducing the semantics of images as a condition. Concretely, we aim to excavate the semantics of images from a well-trained semantic extractor. Under different semantics, the discriminator is able to distinguish the real-fake images individually and adaptively, which guides the SR network to learn the more fine-grained semantic-aware textures. To obtain accurate and abundant semantics, we take full advantage of recently popular pretrained vision models (PVMs) with extensive datasets, and then incorporate its semantic features into the discriminator through a well-designed spatial cross-attention module. In this way, our proposed semantic-aware discriminator empowered the SR network to produce more photo-realistic and pleasing images. Extensive experiments on two typical tasks, i.e., SR and Real SR have demonstrated the effectiveness of our proposed methods.
Python == 3.9
Pytorch == 1.9.0
conda create -n SeD python=3.9
conda activate SeD
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
pip install git+https://github.com/openai/CLIP.git
Notes: To install Pytorch 1.9.0, you may also refer to the official site.
- Download DIV2K: Following this link: DIV2K
- Download Flickr2K: Following this link: Flickr2K
- Make a directory with name DF2K. Move GT images from DIV2K and Flickr2K into train folder under DF2K. Move x4 downsampled images into train_x4 folder under DF2K. Now your dicrectory should look like:
DF2K
|-train
|---0001.png
|---000001.png
|---...
|-train_x4
|---0001x4.png
|---000001x4.png
|---...
- Run (Please replace N with a suitable thread numbers for your operation system, e.g., 12)
python extract_subimages.py --input DF2K/train --output DF2K/train_sub --n_thread N
python extract_subimages.py --input DF2K/train_x4 --output DF2K/train_sub_x4 --n_thread N --crop_size 120 --step 60
- Now we have train_sub and train_sub_x4 folders under DF2K, these two folders will be used for training.
- Download Set5, Set14, Urban100, Manga109: Following this link: download all
- Unzip them into Evaluation. Now your dicrectory should look like:
Evaluation
|-Set5
|---GTmod12
|---LRbicx2
|---LRbicx3
|---LRbicx4
|-Set14
|-Urban100
|-...
- RRDB: RRDB.pth
- SwinIR: SwinIR.pth
Notes: please put them into pretrained folder.
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train.py --opt options/train_rrdb_P+SeD.yml --resume pretrained/RRDB.pth
Notes: you may also alter the additional argparse:
--data_root /path/to/your/DF2K
--out_root /path/to/your/checkpoints
Notes: you may also try "train_rrdb_U+SeD.yml"~
To start up, please first modify the yml file. Modify "data_lr_root", "data_hr_root" and "use_hr". Then, modify "ckpt_path" into your checkpoint location.
CUDA_VISIBLE_DEVICES=0 python test.py --opt options/test_rrdb_P+SeD.yml --output_path /path/to/your/output
We provide our P+SeD weights in Google drive
Quantitative Results
Visual ComparisonsPlease cite us if this work is helpful to you.
@inproceedings{li2024sed,
title={SeD: Semantic-Aware Discriminator for Image Super-Resolution},
author={Li, Bingchen and Li, Xin and Zhu, Hanxin and Jin, Yeying and Feng, Ruoyu and Zhang, Zhizheng and Chen, Zhibo},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
The code is partially from the below repos.
Please follow their licenses. Thanks for their awesome works.