Model Card

Implementation for PopAlign

Environment Setup

pip install -r requirements.txt

Additional Dependencies:

hpsv2 https://github.com/tgxs002/HPSv2
ImageReward https://github.com/THUDM/ImageReward
deepface https://github.com/serengil/deepface

Experiments

first, run

python 1_generate_divers_prompts.py

to generate diverse prompts based on basic prompts in data/training_prompts.csv

then,run

bash 2_generate_images.sh

to generate images based on diverse prompts and basic prompts. this script also runs a classifier on generated images

then run

python 3_generate_preferences.py

To generate preference data

then run

bash 4_train.sh

to train the model using pop-align.

Finally, run

bash eval.sh

which will evaluate the model on identity-specific and identity-neutral prompts. It will also run the classifier to compute the discrepancy metric, as well as a series of score model to compute the image quality metric.

Model Card

Implementation Details

This codebase trains and evaluates SDXL using the PopAlign Objective

The assets used in this work (datasets, preference models) are publicly available and are used according to their respective licenses. This code is released privately for the purposes of our submission, and will eventually be made public under the Apache 2.0 License (LICENSE).

Training Details

Training Data

This data uses images generated by SDXL as training samples

Training Procedure

Preprocessing

Training Samples are generated with identity-neutral prompts and identity-specific prompts. They are paired to create a preference data. See paper for more details.

Training Hyperparameters

We train with the following hyperparameters:

- Learning Rate: 5e-7
- Batch size: 8
- Steps: 750

Evaluation

We evaluate using HPS v2, PickScore, CLIP, LAION Aesthetics, as well as Deepface classifer for fairness

Testing Data, Factors & Metrics

Testing Data

We curate our own test data, which is included in this repo under data

Technical Specifications

Model Architecture and Objective

We use the SDXL architecture (U-Net, VAE, CLIP text encoder) and only fine-tune the U-Net with our objective.

Compute

We train with 4 NVIDIA A5000 GPUs for less than 1 day per experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
eval		eval
scripts		scripts
.gitignore		.gitignore
1_generate_divers_prompts.py		1_generate_divers_prompts.py
2_generate_images.sh		2_generate_images.sh
3_generate_preferences.py		3_generate_preferences.py
4_train.sh		4_train.sh
5_eval.sh		5_eval.sh
config.json		config.json
extended_prompt.py		extended_prompt.py
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt
score_folder.py		score_folder.py
scorer.py		scorer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation for PopAlign

Environment Setup

Experiments

Model Card

Implementation Details

Training Details

Training Data

Training Procedure

Preprocessing

Training Hyperparameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Technical Specifications

Model Architecture and Objective

Compute

About

Releases

Packages

Languages

jacklishufan/PopAlignSDXL

Folders and files

Latest commit

History

Repository files navigation

Implementation for PopAlign

Environment Setup

Experiments

Model Card

Implementation Details

Training Details

Training Data

Training Procedure

Preprocessing

Training Hyperparameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Technical Specifications

Model Architecture and Objective

Compute

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages