Refactoring Taming Transformers for TPU VM.
Taming Transformers for High-Resolution Image Synthesis
Patrick Esser*,
Robin Rombach*,
Björn Ommer
* equal contribution
tl;dr We combine the efficiancy of convolutional approaches with the expressivity of transformers by introducing a convolutional VQGAN, which learns a codebook of context-rich visual parts, whose composition is modeled with an autoregressive transformer.
arXiv | BibTeX | Project Page
pip install -r requirements.txt
Place any image dataset with ImageNet-style directory structure (at least 1 subfolder) to fit the dataset into pytorch ImageFolder.
You can easily test main.py with randomly generated fake data.
python main.py --use_tpus --fake_data
For actual training provide specific directory for train_dir, val_dir, log_dir:
python main.py --use_tpus --train_dir [training_set] --val_dir [val_set] --log_dir [where to save results]
@misc{esser2020taming,
title={Taming Transformers for High-Resolution Image Synthesis},
author={Patrick Esser and Robin Rombach and Björn Ommer},
year={2020},
eprint={2012.09841},
archivePrefix={arXiv},
primaryClass={cs.CV}
}