Abstract: Diffusion Probabilistic Model (DPM) has achieved remarkable advancement in the area of Image Synthesis. DPM is defined on a forward process where small amount of noise is progressively added to image until the signal is destroyed. A Neural-Network is trained to denoise a white noise back to a real data sample by minimizing the variational Lower-Bound (VLB) of negative log-likelihood, which is called the reverse process. By extending the diffusion length to infinity, both forward and reverse processes can be generalized to Stochastic Differential Equations (SDEs), and these two processes are then integration of the corresponding SDE along time dimension. Current DPM samplers turn out to be some first-order SDE solvers. In this project, we expand the SDE to include higher order terms using It^o-Taylor expansion, and examine the performance of a second-order SDE solver implemented using forward-mode auto-differentiation in PyTorch.
- Linux and Windows are supported, but the program was implemented solely on Linux so we recommand running on Linux to avoid any unexpected problems.
- 1 high-end NVIDIA GPU (VRAM>12GB) can reproduce the result. We have done all testing and development on 4070Ti.
- 64-bit Python 3.8 and PyTorch 2.0.1 (or later). See https://pytorch.org for PyTorch install instructions.
- Python libraries: See environment.yml for exact library dependencies. You can use the following commands with Anaconda3/Miniconda3 to create and activate your Python environment:
conda env create -f environment.yml -n 2nd_DMSDE
conda activate 2nd_DMSDE
To reproduce the main results from our paper, first download checkpoint. Model url is provided in next section:
python recompileNN.py --pkl_dir=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-imagenet-64x64-cond-adm.pkl
After the model is recompiled, simply run:
python example.py
This is a minimal standalone script that loads the best pre-trained model for each dataset and generates a random 8x8 grid of images using the optimal sampler settings. Expected results:
Dataset | Runtime | Reference image |
---|---|---|
CIFAR-10 | ~6 sec | cifar10-32x32.png |
FFHQ | ~28 sec | ffhq-64x64.png |
AFHQv2 | ~28 sec | afhqv2-64x64.png |
ImageNet | ~5 min | imagenet-64x64.png |
We develop our 2nd-sampler based on pre-trained models from:
Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras, Miika Aittala, Timo Aila, Samuli Laine
https://arxiv.org/abs/2206.00364
- https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pkl
- https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-imagenet-64x64-cond-adm.pkl
- https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-ffhq-64x64-uncond-vp.pkl
Feel free to generate more samples with your customize hyper-parameters. The sampler settings can be controlled through command-line options; see python generate.py --help
for more information. For best results, we recommend using the following settings for each dataset:
# For CIFAR-10 at 32x32, use deterministic sampling with 84 steps
python generate.py --outdir=Sampler1.2.0/cifar10_N40_rho3_2nd --network=ckpts/edm-cifar10-32x32-cond-vp.pkl --batch=500 --seeds=0-99 --steps=84 --randn_like=ddb --rho=3 --subdirs
# For FFHQ at 64x64, use deterministic sampling with 84 steps
python generate.py --outdir=Sampler1.2.0/ffhq_N84_rho3_2nd --network=ckpts/edm-ffhq-64x64-uncond-vp.pkl --batch=250 --seeds=0-99 --steps=84 --randn_like=ddb --rho=3 --subdirs
# For ImageNet at 64x64, use stochastic sampling with 84 steps
python generate.py --outdir=imgSamples --network=ckpts/edm-imagenet-64x64-cond-adm.pkl --batch=100 --seeds=0-9 --steps=84 --randn_like=ddb --rho=3 --subdirs
To compute Fréchet inception distance (FID) for a given model and sampler, first generate 50,000 random images and then compare them against the dataset reference statistics using fid.py
:
# Generate 50000 images and save them as fid-tmp/*/*.png
python generate.py --outdir=imagenet --network=ckpts/edm-imagenet-64x64-cond-adm.pkl --batch=100 --seeds=0-49999 --steps=84 --randn_like=ddb --rho=3 --subdirs
# Calculate FID
python fid.py calc --images=imgSamples --ref=https://nvlabs-fi-cdn.nvidia.com/edm/fid-refs/imagenet-64x64.npz --num=1000
Note that the numerical value of FID varies across different random seeds and is highly sensitive to the number of images. By default, fid.py
will always use 50,000 generated images; providing fewer images will result in an error, whereas providing more will use a random subset. To reduce the effect of random variation, we recommend repeating the calculation multiple times with different seeds, e.g., --seeds=0-49999
, --seeds=50000-99999
, and --seeds=100000-149999
. In our paper, we calculated each FID three times and reported the minimum.
python train.py --outdir=/root/autodl-tmp/imgnet --data=/root/autodl-tmp/ImageNet --cond=1 --duration=0.5 --batch=1000 --batch-gpu=50 --lr=1e-1 --ema=0.01 --augment=0 --ls=1 --tick=10 --snap=5 --pretrain=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-imagenet-64x64-cond-adm.pkl --sigma-precond=Dns --sigma-arch=sigmoid --dm-length=10 -s
python dataset_tool.py --source=/root/autodl-tmp/ImageNet --dest=/root/autodl-tmp/imagenet-64x64.zip --resolution=64x64 --transform=center-crop