Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize DPM++, especially for SDXL and SDE-DPM++ #5541

Merged
merged 6 commits into from
Oct 30, 2023

Conversation

LuChengTHU
Copy link
Contributor

What does this PR do?

Fixes #5433

When using DPM++ for SDXL (especially when using the SDE variant, i.e., DPM++2M SDE), for steps < 50, we usually get visual artifacts, and the results are even worse than Euler's method. This PR fixes this issue and ensures DPM++ can generate better and more detailed images than Euler's method.

Why does DPM++ fail for the small number of inference steps with SDXL?

In a word, it is because of the numerical instability near $t=0$ (i.e., small noises / clean images). When we use second-order solvers for $t$ near to $0$, the solver becomes numerically unstable and causes unsatisfied artifacts, as in #5433 .

To address such an issue, this PR proposes two methods:

  1. Add a new config, use_lu_lambdas, for setting step sizes during sampling. This setting uses uniform intervals for log-SNR (i.e., $\lambda(t)$), which is used in the original DPM-Solver. This new step size setting can provide stable and sometimes better samples, which has no artifacts.

    • Reason 1: DPM-Solver and DPM-Solver++ are derived by introducing the change-of-variable for $\lambda(t)$, and then use Taylor expansions for $\lambda(t)$. Thus, the discretization errors for second-order DPM and DPM++ are proportional to $\mathcal{O}(h_{max}^{2})$, where $h_{max} = \max_{i} |\lambda(t_{i+1}) - \lambda(t_{i}) |$ is the maximum interval of the $\lambda$ between two time steps. Therefore, a natural way is to set each $\Delta \lambda$ equal, i.e., uniformly split $\lambda(t)$.

    • Reason 2: The "Karras sigmas" step sizes are highly related to uniform $\lambda$. Note that the definition of "Karras sigmas" is equivalent to $\exp(\lambda(t))$, so the "log sigmas" in Karras' setting is just $\lambda(t)$. Moreover, as Karras use an exponential splitting for sigmas with a hyperparameter $\rho=7$. We can prove that when $\rho$ goes to infinity, the step sizes are equivalent to uniform $\lambda$ (it is because of the definition of the exponential function). As $\rho=7$ is already quite large, the samples by Karras sigmas and my uniform lambdas are similar, and both can reduce the discretization errors.

  2. Add a new config, euler_at_final, for trading off the numerical stability and sample details.

When setting euler_at_final = True, we will use Euler's method in the final step. For example, if we use 5-step DPM++, the order will be: [1, 2, 2, 2, 1], where the first step uses Eulers' method for initialization, and the intermediate steps use DPM++ for reducing the discretization errors, and the final step uses Euler's method for improving the numerical stability.

This setting can improve the numerical stability around $t$ near to $0$, and it will cancel all the artifacts in #5433 . For example, we can add this setting for DPM++2M and DPM++2M SDE, and the artifact will disappear.

However, as Euler's method is a first-order method, sometimes the sample will be slightly blurry than samples with use_karras_sigmas = True or use_lu_lambdas = True. Neverthess, this setting is useful when we want to exactly improve the sample quality by uniform step sizes (which is used in DPM++2M and DPM++2M SDE).

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

cc @williamberman and @patrickvonplaten

@LuChengTHU
Copy link
Contributor Author

Here is a testing script:

import torch
from diffusers import StableDiffusionXLPipeline, StableDiffusionPipeline
from diffusers import DPMSolverMultistepScheduler, EulerDiscreteScheduler
import os

common_config = {'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear'}
schedulers = {
    "Euler_K": (EulerDiscreteScheduler, {"use_karras_sigmas": True}),

    "DPMPP_2M": (DPMSolverMultistepScheduler, {}),
    "DPMPP_2M_K": (DPMSolverMultistepScheduler, {"use_karras_sigmas": True}),
    "DPMPP_2M_Lu": (DPMSolverMultistepScheduler, {"use_lu_lambdas": True}),
    "DPMPP_2M_Stable": (DPMSolverMultistepScheduler, {"euler_at_final": True}),

    "DPMPP_2M_SDE": (DPMSolverMultistepScheduler, {"algorithm_type": "sde-dpmsolver++"}),
    "DPMPP_2M_SDE_K": (DPMSolverMultistepScheduler, {"use_karras_sigmas": True, "algorithm_type": "sde-dpmsolver++"}),
    "DPMPP_2M_SDE_Lu": (DPMSolverMultistepScheduler, {"use_lu_lambdas": True, "algorithm_type": "sde-dpmsolver++"}),
    "DPMPP_2M_SDE_Stable": (DPMSolverMultistepScheduler, {"algorithm_type": "sde-dpmsolver++", "euler_at_final": True}),
}


## Test SD-XL

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = StableDiffusionXLPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
    add_watermarker=False)
pipe = pipe.to('cuda')
save_dir = './samples_sdxl'


## Test SD v2.1

# model_id = "stabilityai/stable-diffusion-2-1"
# pipe = StableDiffusionPipeline.from_pretrained(
#     model_id,
#     torch_dtype=torch.float16,
#     use_safetensors=True,
#     variant="fp16",
#     add_watermarker=False
# )
# pipe = pipe.to('cuda')
# save_dir = './samples_sd-2-1'


if not os.path.exists(save_dir):
    os.mkdir(save_dir)

steps = 25

params = {
    "prompt": ['a cat'],
    "num_inference_steps": steps,
    "guidance_scale": 7,
}
for scheduler_name in [
    "DPMPP_2M",
    "DPMPP_2M_Stable",
    "DPMPP_2M_K",
    "DPMPP_2M_Lu",
    "DPMPP_2M_SDE",
    "DPMPP_2M_SDE_Stable",
    "DPMPP_2M_SDE_K",
    "DPMPP_2M_SDE_Lu",
]:
    for seed in [12345, 1234, 123, 12, 1]:
        generator = torch.Generator(device='cuda').manual_seed(seed)

        scheduler = schedulers[scheduler_name][0].from_pretrained(
            model_id,
            subfolder="scheduler",
            **schedulers[scheduler_name][1],
        )
        pipe.scheduler = scheduler

        sdxl_img = pipe(**params, generator=generator).images[0]
        sdxl_img.save(os.path.join(save_dir, f"seed_{seed}_steps_{steps}_{scheduler_name}.png"))

@LuChengTHU
Copy link
Contributor Author

LuChengTHU commented Oct 26, 2023

  1. Comparing Karras' sigmas and Lu' lambdas for ODE solver (DPM++2M).

We can find that for ODE solvers, these two step sizes are similar (with slightly different details).

It is because Karras' step uses $\rho=7$ and Lu's step is equivalent to $\rho=\infty$. As ODE solvers do not have randomness, the differences are small. And both settings can generate quite good samples.

image

@LuChengTHU
Copy link
Contributor Author

  1. Comparing Karras' sigmas and Lu' lambdas for SDE solver (DPM++2M SDE).

We can find that for SDE solvers, these two step sizes are quite different. Although both settings can generate quite good samples, In my opinion, I think Lu's lambdas are slightly better than Karra's sigmas for DPM++2M SDE.

image

@LuChengTHU
Copy link
Contributor Author

  1. Comparing euler_at_final for ODE solver (DPM++2M) with the default linear spacing for step sizes.

We can find that for ODE solvers, when euler_at_final = False, DPM++2M will have small artifacts (they can be found when enlarging the picture, e.g., the hair of the cats).

But when using euler_at_final = True, the result will have no artifacts, and also maintain the original image for linear spacing.

image

@LuChengTHU
Copy link
Contributor Author

  1. Comparing euler_at_final for SDE solver (DPM++2M SDE) with the default linear spacing for step sizes.

We can find that for SDE solvers, when euler_at_final = False, DPM++2M SDE will have very obvious artifacts, which is consistent with #5433 .

But when using euler_at_final = True, the result will have no artifacts, and also maintain the original image for linear spacing.

image

@LuChengTHU
Copy link
Contributor Author

LuChengTHU commented Oct 26, 2023

  1. Conclusion.
  • For ODE solvers, we can try either use_karras_sigmas or use_lu_lambdas for better sample quality.
  • For ODE solvers, if we want to improve the sample quality for linear spacing (default DPM++2M), we can set euler_at_final = True.
  • For SDE solvers, we can try either use_karras_sigmas or use_lu_lambdas for better sample quality, and I personally prefer use_lu_lambdas because its sample quality is better than Karra's spacing and linear spacing when using SDXL.
  • For SDE solvers, if we want to improve the sample quality for linear spacing (default DPM++2M SDE), we can set euler_at_final = True.

@sayakpaul
Copy link
Member

You should turn it into a mini technical report!

Probably the quality PR award goes to you!

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 26, 2023

The documentation is not available anymore as the PR was closed or merged.

@AmericanPresidentJimmyCarter
Copy link
Contributor

Amazing work!

@DN6 DN6 requested a review from yiyixuxu October 27, 2023 13:38
Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great! thank you so much @LuChengTHU

I left one comment about the euler_at_final config. Additionally:

  1. I noticed one test is failing - happy to help fix if you don't have time
  2. you mentioned that when you set euler_at_final to be False, there is a trade there is a trade-off in detail richness and may result in blurry images - it this based on theory or did you notice such effect in your experiments? It is unnoticeable in the example you provided
  3. Just curious, do you know why the artifacts did not present in SD but in SDXL?

@@ -154,7 +162,9 @@ def __init__(
algorithm_type: str = "dpmsolver++",
solver_type: str = "midpoint",
lower_order_final: bool = True,
euler_at_final: bool = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. should we default euler_at_final to be True? you mentioned there is a tradeoff in image details but I think artifacts are much more undesirable
  2. I think we should deprecate lower_order_final now we have euler_at_final- we can strongly recommend to set euler_at_final to be True when using less than 15 steps

Copy link
Contributor

@patrickvonplaten patrickvonplaten Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we default euler_at_final to be True? you mentioned there is a tradeoff in image details but I think artifacts are much more undesirable

I would leave euler_at_final as False since the current default setting works great for SDv15. SDXL seems to be more of a special case here

I think we should deprecate lower_order_final now we have euler_at_final- we can strongly recommend to set euler_at_final to be True when using less than 15 steps

Don't think we should deprecate it. Already for backwards compatibility reasons we'll need to keep the two and again I think we should be careful to not destroy a functioning, well-working scheduler setting for SDv15.

Essentially, what is done here is to give the user a possibility to have lower_order_final=True even for models where we use more than 15 inference steps, but this should not come at the expense of breaking existing workflows, so I don't think we can default euler_at_final to True or that we can remove lower_order_final

Instead of adding euler_at_final, we could add a parameter, called enable_lower_order_below: int = 15 that we allow user to set to 1000 for SDXL. But I'm not sure that this is easier to understand / cleaner actually. So ok to leave as is for me!

@yiyixuxu yiyixuxu merged commit ac7b171 into huggingface:main Oct 30, 2023
11 checks passed
@yiyixuxu
Copy link
Collaborator

cc @stevhliu here

we need up update doc so that the world know about lu lambda :)

@spezialspezial
Copy link
Contributor

I'm getting a RuntimeError: a Tensor with 2 elements cannot be converted to Scalar for this slightly exotic setup. Doesn't seem to happen for other schedulers

Traceback (most recent call last):

  File "dpmms_stable_inpaint_issue.py", line 36, in <module>
    dog_img = pipe_inpaint(prompt="a dog", num_inference_steps=30, image=cat_img, mask_image=mask_img, strength=0.35).images[0]

  File "torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)

  File "pipeline_stable_diffusion_inpaint.py", line 1066, in __call__
    init_latents_proper = self.scheduler.add_noise(

  File "scheduling_dpmsolver_multistep_stable.py", line 894, in add_noise
    step_indices = [(schedule_timesteps == t).nonzero().item() for t in timesteps]

  File "scheduling_dpmsolver_multistep_stable.py", line 894, in <listcomp>
    step_indices = [(schedule_timesteps == t).nonzero().item() for t in timesteps]

import torch
from PIL import Image
from diffusers import StableDiffusionPipeline, StableDiffusionInpaintPipeline
# change next line to your needs
from .scheduling_dpmsolver_multistep_stable import DPMSolverMultistepSchedulerStable


pipe = StableDiffusionPipeline.from_pretrained(
	"runwayml/stable-diffusion-v1-5",
	torch_dtype=torch.float16,
	variant="fp16",
	use_safetensors=True,
	local_files_only=True,
	load_safety_checker=False,
	requires_safety_checker=False
)

config = {
	'beta_start': 0.00085,
	'beta_end': 0.012,
	'beta_schedule': 'scaled_linear',
	'steps_offset': 1,
	'skip_prk_steps': True,
	"algorithm_type": "dpmsolver++",
	"use_lu_lambdas": True
}

pipe.scheduler = DPMSolverMultistepSchedulerStable.from_config(config)
print(pipe.scheduler.config)
pipe.to(torch.device("cuda"))
cat_img = pipe(prompt="a cat", num_inference_steps=30).images[0]
cat_img.save("cat.png")


pipe_inpaint = StableDiffusionInpaintPipeline(**pipe.components)
mask_img = Image.new("L", cat_img.size, "white")
dog_img = pipe_inpaint(prompt="a dog", num_inference_steps=30, image=cat_img, mask_image=mask_img, strength=0.35).images[0]
dog_img.save("dog.png")

@spezialspezial
Copy link
Contributor

(schedule_timesteps == t).nonzero().item() also fails in some other scenarios like img2img with small strength due to duplicated timestep values

pipe.scheduler.timesteps -> [999, 963, 925, 884, 841, 795, 746, 694, 639, 579, 517, 452, 387, 323, 262, 206, 158, 117, 84, 59, 41, 27, 18, 12, 7, 5, 3, 1, 1, 0] len=30 unique=29

@songtianhui
Copy link

@LuChengTHU Thanks for your awesome work! I have a little problem. Does it support that we simultaneously enable use_karras_sigmas=True and euler_at_final=True, which I think is to stablize DPM++ 2M Karras.

kashif pushed a commit to kashif/diffusers that referenced this pull request Nov 11, 2023
* stabilize dpmpp for sdxl by using euler at the final step

* add lu's uniform logsnr time steps

* add test

* fix check_copies

* fix tests

---------

Co-authored-by: Patrick von Platen <[email protected]>
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* stabilize dpmpp for sdxl by using euler at the final step

* add lu's uniform logsnr time steps

* add test

* fix check_copies

* fix tests

---------

Co-authored-by: Patrick von Platen <[email protected]>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* stabilize dpmpp for sdxl by using euler at the final step

* add lu's uniform logsnr time steps

* add test

* fix check_copies

* fix tests

---------

Co-authored-by: Patrick von Platen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Visual artifacts when using DPM++ schedulers and SDXL without the refiner model
8 participants