Skip to content

v0.2.0

Latest
Compare
Choose a tag to compare
@mkshing mkshing released this 12 Apr 13:39
· 7 commits to main since this release
9199552

What's changed

Released v0.2.0

Improved the following parts based on the author @phymhan's feedback (#3)!

  • Train spectral shifts for 1-D weights such as LayerNorm too. (file size: 935kB (before: 923kB))
  • Using different learning rate for 1-D weights via --learning_rate_1d
  • Additionally, train spectral shifts of text encoder by --train_text_encoder (file size: 1.17MB)

By this change, you get better results with less training steps than the first release v0.1.1!!

sample example

accelerate launch svdiff-pytorch-2/train_svdiff.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"\
  --instance_data_dir=$INSTANCE_DATA_DIR \
  --class_data_dir=$CLASS_DATA_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="photo of sks woman" \
  --class_prompt="photo of a woman" \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-3 \
  --learning_rate_1d=1e-6 \
  --train_text_encoder \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --checkpointing_steps=200 \
  --max_train_steps=1000 \
  --use_8bit_adam \
  --enable_xformers_memory_efficient_attention \
  --seed=42 \
  --gradient_checkpointing

"portrait of sks woman wearing kimono" where sks indicates Gal Gadot.
image

Added Single Image Editing

sample script
training

accelerate launch train_svdiff.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"  \
  --instance_data_dir="pink-chair-dir" \
  --output_dir="output-dir" \
  --instance_prompt="photo of a pink chair with black legs" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-3 \
  --learning_rate_1d=1e-6 \
  --train_text_encoder \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=500 \
  --use_8bit_adam \
  --enable_xformers_memory_efficient_attention \
  --seed=42 \
  --gradient_checkpointing 

inference

import sys
import torch
from PIL import Image
from diffusers import DDIMScheduler
sys.path.append("/content/svdiff-pytorch-2")
from svdiff_pytorch import load_unet_for_svdiff, load_text_encoder_for_svdiff, StableDiffusionPipelineWithDDIMInversion

pretrained_model_name_or_path = "runwayml/stable-diffusion-v1-5"
spectral_shifts_ckpt_dir = "/content/SIE/checkpoint-500"
image = "pink-chair.jpeg"
source_prompt = "photo of a pink chair with black legs"
target_prompt = "photo of a blue chair with black legs"

unet = load_unet_for_svdiff(pretrained_model_name_or_path, spectral_shifts_ckpt=spectral_shifts_ckpt_dir, subfolder="unet")
text_encoder = load_text_encoder_for_svdiff(pretrained_model_name_or_path, spectral_shifts_ckpt=spectral_shifts_ckpt_dir, subfolder="text_encoder")
# load pipe
pipe = StableDiffusionPipelineWithDDIMInversion.from_pretrained(
    pretrained_model_name_or_path,
    unet=unet,
    text_encoder=text_encoder,
)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")

# in this example, i didn't use ddim inversion 
inv_latents = None
# (optional) ddim inversion
# image = Image.open(image).convert("RGB").resize((512, 512))
# in SVDiff, they use guidance scale=1 in ddim inversion
# inv_latents = pipe.invert(source_prompt, image=image, guidance_scale=1.0).latents
image = pipe(target_prompt, latents=inv_latents).images[0]

image"photo of a pink blue chair with black legs"

* the input image was taken from https://unsplash.com/photos/1JJJIHh7-Mk