Stabilize DPM++, especially for SDXL and SDE-DPM++ #5541

LuChengTHU · 2023-10-26T12:32:48Z

What does this PR do?

When using DPM++ for SDXL (especially when using the SDE variant, i.e., DPM++2M SDE), for steps < 50, we usually get visual artifacts, and the results are even worse than Euler's method. This PR fixes this issue and ensures DPM++ can generate better and more detailed images than Euler's method.

Why does DPM++ fail for the small number of inference steps with SDXL?

In a word, it is because of the numerical instability near $t=0$ (i.e., small noises / clean images). When we use second-order solvers for $t$ near to $0$, the solver becomes numerically unstable and causes unsatisfied artifacts, as in #5433 .

To address such an issue, this PR proposes two methods:

Add a new config, use_lu_lambdas, for setting step sizes during sampling. This setting uses uniform intervals for log-SNR (i.e., $\lambda(t)$), which is used in the original DPM-Solver. This new step size setting can provide stable and sometimes better samples, which has no artifacts.
- Reason 1: DPM-Solver and DPM-Solver++ are derived by introducing the change-of-variable for $\lambda(t)$, and then use Taylor expansions for $\lambda(t)$. Thus, the discretization errors for second-order DPM and DPM++ are proportional to $\mathcal{O}(h_{max}^{2})$, where $h_{max} = \max_{i} |\lambda(t_{i+1}) - \lambda(t_{i}) |$ is the maximum interval of the $\lambda$ between two time steps. Therefore, a natural way is to set each $\Delta \lambda$ equal, i.e., uniformly split $\lambda(t)$.
- Reason 2: The "Karras sigmas" step sizes are highly related to uniform $\lambda$. Note that the definition of "Karras sigmas" is equivalent to $\exp(\lambda(t))$, so the "log sigmas" in Karras' setting is just $\lambda(t)$. Moreover, as Karras use an exponential splitting for sigmas with a hyperparameter $\rho=7$. We can prove that when $\rho$ goes to infinity, the step sizes are equivalent to uniform $\lambda$ (it is because of the definition of the exponential function). As $\rho=7$ is already quite large, the samples by Karras sigmas and my uniform lambdas are similar, and both can reduce the discretization errors.
Add a new config, euler_at_final, for trading off the numerical stability and sample details.

When setting euler_at_final = True, we will use Euler's method in the final step. For example, if we use 5-step DPM++, the order will be: [1, 2, 2, 2, 1], where the first step uses Eulers' method for initialization, and the intermediate steps use DPM++ for reducing the discretization errors, and the final step uses Euler's method for improving the numerical stability.

This setting can improve the numerical stability around $t$ near to $0$, and it will cancel all the artifacts in #5433 . For example, we can add this setting for DPM++2M and DPM++2M SDE, and the artifact will disappear.

However, as Euler's method is a first-order method, sometimes the sample will be slightly blurry than samples with use_karras_sigmas = True or use_lu_lambdas = True. Neverthess, this setting is useful when we want to exactly improve the sample quality by uniform step sizes (which is used in DPM++2M and DPM++2M SDE).

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

cc @williamberman and @patrickvonplaten

LuChengTHU · 2023-10-26T12:33:17Z

Here is a testing script:

import torch
from diffusers import StableDiffusionXLPipeline, StableDiffusionPipeline
from diffusers import DPMSolverMultistepScheduler, EulerDiscreteScheduler
import os

common_config = {'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear'}
schedulers = {
    "Euler_K": (EulerDiscreteScheduler, {"use_karras_sigmas": True}),

    "DPMPP_2M": (DPMSolverMultistepScheduler, {}),
    "DPMPP_2M_K": (DPMSolverMultistepScheduler, {"use_karras_sigmas": True}),
    "DPMPP_2M_Lu": (DPMSolverMultistepScheduler, {"use_lu_lambdas": True}),
    "DPMPP_2M_Stable": (DPMSolverMultistepScheduler, {"euler_at_final": True}),

    "DPMPP_2M_SDE": (DPMSolverMultistepScheduler, {"algorithm_type": "sde-dpmsolver++"}),
    "DPMPP_2M_SDE_K": (DPMSolverMultistepScheduler, {"use_karras_sigmas": True, "algorithm_type": "sde-dpmsolver++"}),
    "DPMPP_2M_SDE_Lu": (DPMSolverMultistepScheduler, {"use_lu_lambdas": True, "algorithm_type": "sde-dpmsolver++"}),
    "DPMPP_2M_SDE_Stable": (DPMSolverMultistepScheduler, {"algorithm_type": "sde-dpmsolver++", "euler_at_final": True}),
}


## Test SD-XL

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = StableDiffusionXLPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
    add_watermarker=False)
pipe = pipe.to('cuda')
save_dir = './samples_sdxl'


## Test SD v2.1

# model_id = "stabilityai/stable-diffusion-2-1"
# pipe = StableDiffusionPipeline.from_pretrained(
#     model_id,
#     torch_dtype=torch.float16,
#     use_safetensors=True,
#     variant="fp16",
#     add_watermarker=False
# )
# pipe = pipe.to('cuda')
# save_dir = './samples_sd-2-1'


if not os.path.exists(save_dir):
    os.mkdir(save_dir)

steps = 25

params = {
    "prompt": ['a cat'],
    "num_inference_steps": steps,
    "guidance_scale": 7,
}
for scheduler_name in [
    "DPMPP_2M",
    "DPMPP_2M_Stable",
    "DPMPP_2M_K",
    "DPMPP_2M_Lu",
    "DPMPP_2M_SDE",
    "DPMPP_2M_SDE_Stable",
    "DPMPP_2M_SDE_K",
    "DPMPP_2M_SDE_Lu",
]:
    for seed in [12345, 1234, 123, 12, 1]:
        generator = torch.Generator(device='cuda').manual_seed(seed)

        scheduler = schedulers[scheduler_name][0].from_pretrained(
            model_id,
            subfolder="scheduler",
            **schedulers[scheduler_name][1],
        )
        pipe.scheduler = scheduler

        sdxl_img = pipe(**params, generator=generator).images[0]
        sdxl_img.save(os.path.join(save_dir, f"seed_{seed}_steps_{steps}_{scheduler_name}.png"))

LuChengTHU · 2023-10-26T12:55:46Z

Comparing Karras' sigmas and Lu' lambdas for ODE solver (DPM++2M).

We can find that for ODE solvers, these two step sizes are similar (with slightly different details).

It is because Karras' step uses $\rho=7$ and Lu's step is equivalent to $\rho=\infty$. As ODE solvers do not have randomness, the differences are small. And both settings can generate quite good samples.

LuChengTHU · 2023-10-26T12:58:08Z

Comparing Karras' sigmas and Lu' lambdas for SDE solver (DPM++2M SDE).

We can find that for SDE solvers, these two step sizes are quite different. Although both settings can generate quite good samples, In my opinion, I think Lu's lambdas are slightly better than Karra's sigmas for DPM++2M SDE.

LuChengTHU · 2023-10-26T13:01:21Z

Comparing euler_at_final for ODE solver (DPM++2M) with the default linear spacing for step sizes.

We can find that for ODE solvers, when euler_at_final = False, DPM++2M will have small artifacts (they can be found when enlarging the picture, e.g., the hair of the cats).

But when using euler_at_final = True, the result will have no artifacts, and also maintain the original image for linear spacing.

LuChengTHU · 2023-10-26T13:02:49Z

Comparing euler_at_final for SDE solver (DPM++2M SDE) with the default linear spacing for step sizes.

We can find that for SDE solvers, when euler_at_final = False, DPM++2M SDE will have very obvious artifacts, which is consistent with #5433 .

But when using euler_at_final = True, the result will have no artifacts, and also maintain the original image for linear spacing.

LuChengTHU · 2023-10-26T13:06:14Z

Conclusion.

For ODE solvers, we can try either use_karras_sigmas or use_lu_lambdas for better sample quality.
For ODE solvers, if we want to improve the sample quality for linear spacing (default DPM++2M), we can set euler_at_final = True.
For SDE solvers, we can try either use_karras_sigmas or use_lu_lambdas for better sample quality, and I personally prefer use_lu_lambdas because its sample quality is better than Karra's spacing and linear spacing when using SDXL.
For SDE solvers, if we want to improve the sample quality for linear spacing (default DPM++2M SDE), we can set euler_at_final = True.

sayakpaul · 2023-10-26T13:08:50Z

You should turn it into a mini technical report!

Probably the quality PR award goes to you!

HuggingFaceDocBuilderDev · 2023-10-26T13:17:30Z

The documentation is not available anymore as the PR was closed or merged.

AmericanPresidentJimmyCarter · 2023-10-26T17:52:08Z

Amazing work!

yiyixuxu

this is great! thank you so much @LuChengTHU

I left one comment about the euler_at_final config. Additionally:

I noticed one test is failing - happy to help fix if you don't have time
you mentioned that when you set euler_at_final to be False, there is a trade there is a trade-off in detail richness and may result in blurry images - it this based on theory or did you notice such effect in your experiments? It is unnoticeable in the example you provided
Just curious, do you know why the artifacts did not present in SD but in SDXL?

yiyixuxu · 2023-10-28T10:54:28Z

src/diffusers/schedulers/scheduling_dpmsolver_multistep.py

@@ -154,7 +162,9 @@ def __init__(
        algorithm_type: str = "dpmsolver++",
        solver_type: str = "midpoint",
        lower_order_final: bool = True,
+        euler_at_final: bool = False,


should we default euler_at_final to be True? you mentioned there is a tradeoff in image details but I think artifacts are much more undesirable

I think we should deprecate lower_order_final now we have euler_at_final- we can strongly recommend to set euler_at_final to be True when using less than 15 steps

should we default euler_at_final to be True? you mentioned there is a tradeoff in image details but I think artifacts are much more undesirable

I would leave euler_at_final as False since the current default setting works great for SDv15. SDXL seems to be more of a special case here

I think we should deprecate lower_order_final now we have euler_at_final- we can strongly recommend to set euler_at_final to be True when using less than 15 steps

Don't think we should deprecate it. Already for backwards compatibility reasons we'll need to keep the two and again I think we should be careful to not destroy a functioning, well-working scheduler setting for SDv15.

Essentially, what is done here is to give the user a possibility to have lower_order_final=True even for models where we use more than 15 inference steps, but this should not come at the expense of breaking existing workflows, so I don't think we can default euler_at_final to True or that we can remove lower_order_final

Instead of adding euler_at_final, we could add a parameter, called enable_lower_order_below: int = 15 that we allow user to set to 1000 for SDXL. But I'm not sure that this is easier to understand / cleaner actually. So ok to leave as is for me!

yiyixuxu · 2023-10-30T16:38:46Z

cc @stevhliu here

we need up update doc so that the world know about lu lambda :)

spezialspezial · 2023-10-30T19:32:17Z

I'm getting a RuntimeError: a Tensor with 2 elements cannot be converted to Scalar for this slightly exotic setup. Doesn't seem to happen for other schedulers

Traceback (most recent call last):

  File "dpmms_stable_inpaint_issue.py", line 36, in <module>
    dog_img = pipe_inpaint(prompt="a dog", num_inference_steps=30, image=cat_img, mask_image=mask_img, strength=0.35).images[0]

  File "torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)

  File "pipeline_stable_diffusion_inpaint.py", line 1066, in __call__
    init_latents_proper = self.scheduler.add_noise(

  File "scheduling_dpmsolver_multistep_stable.py", line 894, in add_noise
    step_indices = [(schedule_timesteps == t).nonzero().item() for t in timesteps]

  File "scheduling_dpmsolver_multistep_stable.py", line 894, in <listcomp>
    step_indices = [(schedule_timesteps == t).nonzero().item() for t in timesteps]

import torch
from PIL import Image
from diffusers import StableDiffusionPipeline, StableDiffusionInpaintPipeline
# change next line to your needs
from .scheduling_dpmsolver_multistep_stable import DPMSolverMultistepSchedulerStable


pipe = StableDiffusionPipeline.from_pretrained(
	"runwayml/stable-diffusion-v1-5",
	torch_dtype=torch.float16,
	variant="fp16",
	use_safetensors=True,
	local_files_only=True,
	load_safety_checker=False,
	requires_safety_checker=False
)

config = {
	'beta_start': 0.00085,
	'beta_end': 0.012,
	'beta_schedule': 'scaled_linear',
	'steps_offset': 1,
	'skip_prk_steps': True,
	"algorithm_type": "dpmsolver++",
	"use_lu_lambdas": True
}

pipe.scheduler = DPMSolverMultistepSchedulerStable.from_config(config)
print(pipe.scheduler.config)
pipe.to(torch.device("cuda"))
cat_img = pipe(prompt="a cat", num_inference_steps=30).images[0]
cat_img.save("cat.png")


pipe_inpaint = StableDiffusionInpaintPipeline(**pipe.components)
mask_img = Image.new("L", cat_img.size, "white")
dog_img = pipe_inpaint(prompt="a dog", num_inference_steps=30, image=cat_img, mask_image=mask_img, strength=0.35).images[0]
dog_img.save("dog.png")

spezialspezial · 2023-10-30T20:47:03Z

(schedule_timesteps == t).nonzero().item() also fails in some other scenarios like img2img with small strength due to duplicated timestep values

pipe.scheduler.timesteps -> [999, 963, 925, 884, 841, 795, 746, 694, 639, 579, 517, 452, 387, 323, 262, 206, 158, 117, 84, 59, 41, 27, 18, 12, 7, 5, 3, 1, 1, 0] len=30 unique=29

songtianhui · 2023-11-03T12:35:29Z

@LuChengTHU Thanks for your awesome work! I have a little problem. Does it support that we simultaneously enable use_karras_sigmas=True and euler_at_final=True, which I think is to stablize DPM++ 2M Karras.

* stabilize dpmpp for sdxl by using euler at the final step * add lu's uniform logsnr time steps * add test * fix check_copies * fix tests --------- Co-authored-by: Patrick von Platen <[email protected]>

LuChengTHU added 3 commits October 26, 2023 20:02

stabilize dpmpp for sdxl by using euler at the final step

2748765

add lu's uniform logsnr time steps

892fec9

add test

2f4b5c4

LuChengTHU mentioned this pull request Oct 26, 2023

Visual artifacts when using DPM++ schedulers and SDXL without the refiner model #5433

Closed

fix check_copies

a0df7b3

LuChengTHU mentioned this pull request Oct 26, 2023

Stabilizing DPM++2M SDE for SDXL crowsonkb/k-diffusion#85

Open

DN6 requested a review from yiyixuxu October 27, 2023 13:38

yiyixuxu reviewed Oct 28, 2023

View reviewed changes

yiyixuxu requested a review from patrickvonplaten October 28, 2023 11:05

fix tests

3ba3ef4

patrickvonplaten approved these changes Oct 30, 2023

View reviewed changes

Merge branch 'main' into stabilize-dpmsolver

41de793

yiyixuxu merged commit ac7b171 into huggingface:main Oct 30, 2023
11 checks passed

patrickvonplaten mentioned this pull request Nov 5, 2023

DPM/UniPC schedulers (possibly more) seem to have massive stability issues on model KohakuXL vs ComfyUI #5646

Closed

yiyixuxu mentioned this pull request Nov 13, 2023

DPM++ leaves residual noise in SDXL images, which is unexpected #5689

Open

meditat2001 mentioned this pull request Jun 2, 2024

[Bug]: Visual artifacts when using DPM++ schedulers and SDXL without the refiner model openvinotoolkit/stable-diffusion-webui#109

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize DPM++, especially for SDXL and SDE-DPM++ #5541

Stabilize DPM++, especially for SDXL and SDE-DPM++ #5541

LuChengTHU commented Oct 26, 2023

LuChengTHU commented Oct 26, 2023

LuChengTHU commented Oct 26, 2023 •

edited

Loading

LuChengTHU commented Oct 26, 2023

LuChengTHU commented Oct 26, 2023

LuChengTHU commented Oct 26, 2023

LuChengTHU commented Oct 26, 2023 •

edited

Loading

sayakpaul commented Oct 26, 2023

HuggingFaceDocBuilderDev commented Oct 26, 2023 •

edited

Loading

AmericanPresidentJimmyCarter commented Oct 26, 2023

yiyixuxu left a comment

yiyixuxu Oct 28, 2023

patrickvonplaten Oct 30, 2023 •

edited

Loading

yiyixuxu commented Oct 30, 2023

spezialspezial commented Oct 30, 2023

spezialspezial commented Oct 30, 2023

songtianhui commented Nov 3, 2023

Stabilize DPM++, especially for SDXL and SDE-DPM++ #5541

Stabilize DPM++, especially for SDXL and SDE-DPM++ #5541

Conversation

LuChengTHU commented Oct 26, 2023

What does this PR do?

Why does DPM++ fail for the small number of inference steps with SDXL?

Before submitting

Who can review?

LuChengTHU commented Oct 26, 2023

LuChengTHU commented Oct 26, 2023 • edited Loading

LuChengTHU commented Oct 26, 2023

LuChengTHU commented Oct 26, 2023

LuChengTHU commented Oct 26, 2023

LuChengTHU commented Oct 26, 2023 • edited Loading

sayakpaul commented Oct 26, 2023

HuggingFaceDocBuilderDev commented Oct 26, 2023 • edited Loading

AmericanPresidentJimmyCarter commented Oct 26, 2023

yiyixuxu left a comment

Choose a reason for hiding this comment

yiyixuxu Oct 28, 2023

Choose a reason for hiding this comment

patrickvonplaten Oct 30, 2023 • edited Loading

Choose a reason for hiding this comment

yiyixuxu commented Oct 30, 2023

spezialspezial commented Oct 30, 2023

spezialspezial commented Oct 30, 2023

songtianhui commented Nov 3, 2023

LuChengTHU commented Oct 26, 2023 •

edited

Loading

LuChengTHU commented Oct 26, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 26, 2023 •

edited

Loading

patrickvonplaten Oct 30, 2023 •

edited

Loading