GitHub - CrystalNeuro/visual-concept-translator: Code of ICCV 2023 paper titled General Image-to-Image Translation with One-Shot Image Guidance

Visual Concept Translator

Visual Concept Translator (VCT) aims to achieve image tranlation with one-shot image guidance. Given only one reference image, VCT can automatically learn its dominant concepts and integrate them into input source image. The following examples show its performance.

For each image group, the upper-left image is the source image, the lower-left image is the reference image, and the right part is the translated image. The VCT can be applied in many general image-to-image and style transfer tasks.

Setup

To set up the environment, please run

conda create -n vct python=3.8
conda activate vct
pip install -r requirements.txt

We test our method on both Nvidia A30 and A100 GPU. However, it should work in any GPU with 24G memory.

Usage

To use the VCT to image-to-image tasks, please run

accelerate launch main.py \
    --concept_image_dir="./examples/concept_image" \
    --content_image_dir="./examples/content_image" \
    --pretrained_model_name_or_path="/put/your/downloaded/huggingface/model"
    --output_image_path="./outputs" \
    --initializer_token="girl" \
    --max_train_steps=500 \
    --concept_embedding_num=3 \
    --cross_attention_injection_ratio=0.2 \
    --self_attention_injection_ratio=0.9 \
    --use_l1

Please put your one-shot concept image into concept_image_dir, and any number of content images into content_image_dir. The translated images will be saved in output_image_path.

To avoid the loading error or repeated downloads, it is recommended to download the pre-trained huggingface model such as stable-diffusion-v1-5 to local. Then put the downloaded path into pretrained_model_name_or_path.

The initializer_token is used as the beginning of concept embeddings. The max_train_steps defines the training steps. For different concept, the optimal training step is also different, so you can adjust the max_train_steps to generate better results (always between 100 to 1000).

Inspired by prompt-to-prompt, the VCT also applies the self-attention and cross-attention injection. Larger self_attention_injection_ratio or cross_attention_injection_ratio means more source contents preserved and less target concepts transferred. If you think the current results are not desired, please adjust these two parameters to achieve more content preservation or concept translation.

Citation

If this code is useful for your work, please cite our paper:

@article{cheng2023general,
  title={General Image-to-Image Translation with One-Shot Image Guidance},
  author={Cheng, B. and Liu, Z. and Peng, Y. and Lin, Y.},
  journal={arXiv preprint arXiv:2307.14352},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
examples		examples
LICENSE		LICENSE
README.md		README.md
ddim_inversion.py		ddim_inversion.py
main.py		main.py
multi_token_clip.py		multi_token_clip.py
new_scheduling_ddpm.py		new_scheduling_ddpm.py
pivot_turning_inversion.py		pivot_turning_inversion.py
ptp_inversion.py		ptp_inversion.py
ptp_utils.py		ptp_utils.py
requirements.txt		requirements.txt
seq_aligner.py		seq_aligner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Concept Translator

Setup

Usage

Citation

About

Releases

Packages

Languages

License

CrystalNeuro/visual-concept-translator

Folders and files

Latest commit

History

Repository files navigation

Visual Concept Translator

Setup

Usage

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages