This is model fit and inference code for CLIP aesthetic regressions trained on Simulacra Aesthetic Captions. These remarkably simple models emulate human aesthetic judgment. They can be used in tasks such as dataset filtering to remove obviously poor quality images from the corpus before training. The following grids, one sorted by John David Pressman and one sorted by the machine give some idea of the models capabilities:
Git clone this repository:
git clone https://github.com/crowsonkb/simulacra-aesthetic-models.git
Install pytorch if you don't already have it:
pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
Then pip install our other dependencies:
pip3 install tqdm pillow torchvision sklearn numpy
If you don't already have it installed, you'll need to install CLIP:
git clone https://github.com/openai/CLIP.git
cd CLIP
pip3 install .
cd ..
The models are largely meant to be used as a library, i.e. you'll need to write
specific code for your use case. But to get you started we've provided a sample
script rank_images.py
which finds all the .jpg
or .png
images in a directory
tree and ranks the top N (default 50) with the aesthetic model:
python3 rank_images.py demo_images/