🔮 This is the official code and model release for MaGIC: Multi-modality Guided Image Completion. In submission.
🔧 Installing both PyTorch and TorchVision with CUDA support is strongly recommended. The code requires python>=3.8, as well as pytorch>=1.10 and torchvision>=0.11.
A suitable conda environment named magic
can be created and activated with:
conda create -n magic python=3.8.5
pip install -r requirements.txt
conda activate magic
Attention: If the compute capability of your GPU is greater than 7.0, such as GeForce RTX 3090, you need to comment out torch==1.10.1 torchvision==0.11.2
and uncomment --extra-index-url https://download.pytorch.org/whl/cu113 torch==1.12.1+cu113 torchvision==0.13.1+cu113
in requirements.txt before running pip install -r requirements.txt
.
Prepare example images and model weights as follows,
Files | Google Drive / HF | Baidu Pan |
---|---|---|
Backbone(SD-inpainting-2.1) | Huggingface | Baidu Pan |
MaGIC checkpoints and example images | Google Drive | Baidu Pan |
-
🔑 First, download model checkpoints and place them in the
checkpoints/
folder. -
✏️ Then, check the inference config in
configs/example_config.yaml
. Edit it according to your personal settings, such as input images and masks. -
🏃 Run the following command to get the completed image result:
python infer.py
We explain each folder in the root directory as follows:
- Folder
annotator
: contains guidance modality-related processing - Folders
checkpoints
andexamples
: include pre-trained weights, COCO image captions, and example test images. Please download them directly. - Folder
ldm
: contains stable diffusion code, our CMB (consistent modality blending) as inldm/models/diffusion/ddim_infer.py
- Folder
modules
: consists of condition networks and inference utils - Folder
scripts
: includes script files that may be useful.
🎭 We provide a gradio GUI to generate masks for input images.
You can simply run python scripts/draw_mask.py
and open the webpage using a local URL. Click the
Save button to save the pair of image and mask map to the scripts/pair/
folder.
📊 We provide an evaluation script used in the paper, found in the file scripts/evaluate.py
.
To get your score, please edit the variables folder1
and folder2
, and follow these tips:
- Enable
is_pick
to calculate PickScore. If the test image has no captions/prompt text, please setis_pick
toFalse
. - If the number of test images is too small, resulting in a zero P/U-IDS score, augment the test dataset to address the issue.
- To calculate the standard deviation, enable
cal_std
. If not provide multiple results, please setcal_std
toFalse
.
Evaluation output examples:
> folder1: results/lama_coco/
fid: 48.6322, pids: 0.0000, uids: 0.0000, pickscore: 0.2906
fid_std: 0.0000, pid_std: 0.0000, uid_std: 0.0000, pick_std: 0.0000
> folder1: results/lama_places/
fid: 13.6287, pids: 0.1054, uids: 0.2284, pickscore: -1.0000
fid_std: 0.0000, pid_std: 0.0000, uid_std: 0.0000, pick_std: 0.0000
📝 The workload is not small, so things will not progress very quickly.
- Release inference code and structure-form tau-net checkpoints
- Release Gradio Demos
- Release local edit code (+SAM)
- Release context-form tau-net code
- Release image generation code and related checkpoints
- Release tau-net train code
If you find this repository useful, please consider giving it a star ⭐ and citing it:
@article{yu2023magic,
title={MaGIC: Multi-modality Guided Image Completion},
author={Yu, Yongsheng and Wang, Hao and Luo, Tiejian and Fan, Heng and Zhang, Libo},
journal={arXiv preprint arXiv:2305.11818},
year={2023}
}
We would like to thank the authors of LDM and T2I-adapter for sharing their codes.