Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation

master branch for public viewing

development branch contains notebooks for early dataset exploration, debugging, unbatched inference and making plots.

Modify <largefiles_dir> if you keep large files in a separate directory.

Calculation of the proposed metrics, COMPLETENESS and BALANCE: quantifying_skew.ipynb

Setup

1. Python Environment

git clone [email protected]:zdxdsw/skewed_relations_T2I.git &&
cd skewed_relations_T2I &&
python3 -m venv venv &&
source venv/bin/activate &&
pip install --upgrade pip &&
pip install -r requirements.txt

Toubleshooting: If you're having ImportError or imcompatibility issues, try installing the specific version. pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118. This requires cuda11.8. If your machine supports multiple cuda versions, you might want to do the following: export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib.

2. Accelerate config

$ accelerate config # This will automatically generate ~/.cache/huggingface/accelerate/default_config.yaml.

Example config:

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: MULTI_GPU
downcast_bf16: 'no'
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

Pixel Diffusion Experiments with Synthetic Images

1. Training configs

Config your training hyperparameters in skewed_relations_T2I/scripts/diffuser_icons/config.py.

To reproduce results in our paper, copy configs from skewed_relations_T2I/scripts/diffuser_icons/configs/pixel_icons_singleobj_pt_config.py and skewed_relations_T2I/scripts/diffuser_icons/configs/pixel_icons_twoobjs_ft_config.py

2. Synthetic dataset

Due to the simplicity of synthetic data, we do not save a copy. Data is constructed on the fly in the dataloader. Please refer to dataset.py for how splits with different degrees of skew are created, and this summary chart for mapping split_method to metrics.

3. Training commands

cd skewed_relations_T2I/scripts/diffusion_icons
accelerate launch trainer.py

4. Testing commands

cd skewed_relations_T2I/scripts/diffusion_icons
accelerate launch tester.py --load_from_dir <handle> --load_from_epochs <load_from_epochs> --eval_batch_size <eval_batch_size>

<handle>: Every experiment will have a unique identifier, created from the timestamp at which it is launched. E.g. 0515_222602 (%m%d_%H%M%S)

<load_from_epochs>: String seperated by spaces. E.g. "99 199 299 399 499 599"

<eval_batch_size>: Per gpu batch size.

By default, tester.py will run inference on both training and testing set. To opt out from training (testing) set, set --num_iter_train 0 (--num_iter_test 0).

5. Evaluation script

Fixed filters are created from GTH icons. Then generated images are evaluated via pixel-level pattern matching. Please refer to this notebook.

6. Ablation experiments

To disable image positional embeddings, comment the line patch_size = 2 in config.py or set patch_size = None. (It needs to re-run both single-obj pretraining and two-objs finetuning.)

To switch language encoder from T5 to CLIP, modify config.py: lm = "t5" <--> lm = "clip_"

Pixel Diffusion Experiments with Natural Images

1. Download WhatsUp dataset

Images are released by the WhatsUp official repo. Download controlled_clevr.tar.gz from https://drive.google.com/drive/u/0/folders/164q6X9hrvP-QYpi3ioSnfMuyHpG5oRkZ.

cd <largefiles_dir>/skewed_relations_T2I &&
mkdir -p data/whatsup_vlms

Move the folder controlled_clevr to <largefiles_dir>/skewed_relations_T2I/data/whatsup_vlms/.

WhatsUp annotation files are preprocessed --- filtering for selected relations & objects --- and saved to skewed_relations_T2I/data/aggregated. Refer to whatsup_preprocess.ipynb for preprocessing code.

2. Training configs

Config your training hyperparameters in skewed_relations_T2I/scripts/diffuser_real/config.py.

To reproduce results in our paper, copy configs from skewed_relations_T2I/scripts/diffuser_real/configs/pixel_natural_singleobj_pt_config.py and skewed_relations_T2I/scripts/diffuser_real/configs/pixel_natural_twoobjs_ft_config.py

3. Drawing subsamples

Instances are converted to the tuple representation $(f_1, r_1, f_2, r_2)$ and subsampled in the tuple representation space. Please refer to dataset.py for how subsamples with different degrees of skew are drawn, and this summary chart for mapping subsample_method to metrics.

4. Training commands

cd skewed_relations_T2I/scripts/diffusion_real
accelerate launch trainer.py

5. Testing commands

cd skewed_relations_T2I/scripts/diffusion_real
accelerate launch tester.py --load_from_dir <handle> --load_from_epochs <load_from_epochs> --eval_batch_size <eval_batch_size>

<handle>: Every experiment will have a unique identifier, created from the timestamp at which it is launched. E.g. 0515_222602 (%m%d_%H%M%S)

<load_from_epochs>: String seperated by spaces. E.g. "99 199 299 399 499 599"

<eval_batch_size>: Per gpu batch size.

By default, tester.py will run inference on both training and testing set. To opt out from training (testing) set, set --num_iter_train 0 (--num_iter_test 0).

6. AutoEval with ViT

cd <largefiles_dir>/skewed_relations_T2I &&
mkdir autoeval

Download the finetuned ViT checkpoint from here (328MB) and move it to <largefiles_dir>/skewed_relations_T2I/autoeval.

For your reference, we provide code for finetuning ViT.

7. Evaluation commands

cd skewed_relations_T2I/scripts/diffusion_real
python eval.py --ckpt_handle <handle> --epochs_for_eval <epochs_for_eval> --output_folder <output_folder> # single_gpu job

<handle>: Every experiment will have a unique identifier, created from the timestamp at which it is launched. E.g. 0515_222602 (%m%d_%H%M%S)

<epochs_for_eval>: String seperated by spaces. E.g. "1999 3999 5999"

<output_folder>: E.g. "output" or "output_withvae"

Latent Diffusion Experiments

1. Download pre-trained vae checkpoints from huggingface.

cd <largefiles_dir>/skewed_relations_T2I &&
mkdir -p from_pretrained/vae/sd2 &&
cd from_pretrained/vae/sd2 &&
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/vae/config.json &&
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/vae/diffusion_pytorch_model.bin &&
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/vae/diffusion_pytorch_model.fp16.bin &&
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/vae/diffusion_pytorch_model.fp16.safetensors &&
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/vae/diffusion_pytorch_model.safetensors

2. Training configs

To reproduce results in our paper, copy configs from

Experiments on synthetic images: skewed_relations_T2I/scripts/diffuser_icons/configs/vae_icons_singleobj_pt_config.py and skewed_relations_T2I/scripts/diffuser_icons/configs/vae_icons_twoobjs_ft_config.py
Experiments on natural images: skewed_relations_T2I/scripts/diffuser_real/configs/vae_natural_singleobj_pt_config.py and skewed_relations_T2I/scripts/diffuser_real/configs/vae_natural_twoobjs_ft_config.py

3. Training/Testing/Evaluation commands

Same as previous sections.

Credits

@huggingface Diffusers

@amitakamath whatsup_vlms

Cite Us 🙏

@article{chang2024skews,
  title={Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation},
  author={Chang, Yingshan and Zhang, Yasi and Fang, Zhiyuan and Wu, Yingnian and Bisk, Yonatan and Gao, Feng},
  journal={arXiv preprint arXiv:2403.16394},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
data		data
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
.gitignore.master		.gitignore.master
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pre_commit_local_hook.py		pre_commit_local_hook.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation

Setup

1. Python Environment

2. Accelerate config

Pixel Diffusion Experiments with Synthetic Images

1. Training configs

2. Synthetic dataset

3. Training commands

4. Testing commands

5. Evaluation script

6. Ablation experiments

Pixel Diffusion Experiments with Natural Images

1. Download WhatsUp dataset

2. Training configs

3. Drawing subsamples

4. Training commands

5. Testing commands

6. AutoEval with ViT

7. Evaluation commands

Latent Diffusion Experiments

1. Download pre-trained vae checkpoints from huggingface.

2. Training configs

3. Training/Testing/Evaluation commands

Credits

Cite Us 🙏

About

Releases

Packages

Languages

zdxdsw/skewed_relations_T2I

Folders and files

Latest commit

History

Repository files navigation

Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation

Setup

1. Python Environment

2. Accelerate config

Pixel Diffusion Experiments with Synthetic Images

1. Training configs

2. Synthetic dataset

3. Training commands

4. Testing commands

5. Evaluation script

6. Ablation experiments

Pixel Diffusion Experiments with Natural Images

1. Download WhatsUp dataset

2. Training configs

3. Drawing subsamples

4. Training commands

5. Testing commands

6. AutoEval with ViT

7. Evaluation commands

Latent Diffusion Experiments

1. Download pre-trained vae checkpoints from huggingface.

2. Training configs

3. Training/Testing/Evaluation commands

Credits

Cite Us 🙏

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages