fix a bug in 2nd order schedulers when using in ensemble of experts config #5511

yiyixuxu · 2023-10-24T16:45:32Z

This PR fix the bug reported in #5447 and allow users to 2nd order schedulers for ensemble of experts approach

I'm using an toy example to quickly illustrate the error we are having

e.g. we have 4 inference steps: 3, 2, 1, 0
in set_timesteps, we repeat the index for 2nd order update (steps in [] are 2nd order update) : 3, [2], 2, [1], 1, [0], 0
now we split these timesteps between base and refiner, if the cutoff point is 1, we will have
- base: 3, [2], 2
- refiner [1], 1, [0], 0

-> You can see that the timesteps for refiner is wrong now, should be 2, [1], 1, [0], 0

this script now works

import torch
from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
from diffusers import HeunDiscreteScheduler

sdxl_model = StableDiffusionXLPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
sdxl_model.enable_model_cpu_offload()

refiner_model = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-refiner-1.0',
    text_encoder_2=sdxl_model.text_encoder_2,
    vae = sdxl_model.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner_model.enable_model_cpu_offload()

sdxl_model.scheduler = HeunDiscreteScheduler.from_config(sdxl_model.scheduler.config, use_karras_sigmas=True)
refiner_model.scheduler = HeunDiscreteScheduler.from_config(refiner_model.scheduler.config, use_karras_sigmas=True)

generator = torch.Generator(device='cuda').manual_seed(12345)

hnf = 0.5
params = {
    'prompt': ['evening sunset scenery blue sky nature, glass bottle with a galaxy in it'],
    'negative_prompt': ['text, watermark'],
    "num_inference_steps": 50,
    "height": 1024,
    "width": 1024,
    "guidance_scale": 7,
}

sdxl_res = sdxl_model(**params, denoising_end=hnf, generator=generator, output_type='latent')
sdxl_latents = sdxl_res.images

refiner_res = refiner_model(
    output_type="pil",
    image=sdxl_latents,
    prompt=params['prompt'],
    num_inference_steps=params['num_inference_steps'],
    negative_prompt=params['negative_prompt'],
    generator=generator,
    denoising_start=hnf,
)
refiner_imgs = refiner_res.images

refiner_imgs[0].save("out.png")

HuggingFaceDocBuilderDev · 2023-10-24T16:54:42Z

The documentation is not available anymore as the PR was closed or merged.

nhnt11 · 2023-10-24T22:34:34Z

Hey @yiyixuxu, seems like you adapted the original fix I shared in the issue description which does indeed work for some cases. But in our testing we found it didn't work in all cases.

Here is the final code we ended up shipping in our fork of diffusers: playgroundai@1bab033

This line didn't make sense to us: filter(lambda ts: ts < discrete_timestep_cutoff, timesteps)
Becasue - here we are comparing an actual timestep value, to an index. If you look at the way discrete_timestep_cutoff value, it appears to give the index of the cutoff timestep, within a list of num_train_timesteps total timesteps.

Overall I am not fully satisfied with the latest code in our fork because I don't fully understand second order schedulers, but empirically, the idea is that when splitting the timesteps array, the left split must be of even length and the right spilt must be of odd length.

I.e. if we have timesteps 1 [2] 3 [4] 5 we want to split like this: 1 [2] and 3 [4] 5.

This seems to work for various combinations of num_inference_steps and denoising_start/denoising_end.

yiyixuxu · 2023-10-25T00:30:13Z

@nhnt11

I don't think it is same as your fix - it seems like theget_timesteps function you provided is meant for the base pipeline. However, only the refiner pipeline has issue with timesteps after the split. so we only need to fix the refiner pipeline here.

I would really appreciate if you can try out the code in this PR and share with me if you find examples where it fails :)

to answer some of your questions

This line didn't make sense to us: filter(lambda ts: ts < discrete_timestep_cutoff, timesteps)
Becasue - here we are comparing an actual timestep value, to an index. If you look at the way discrete_timestep_cutoff value, it appears to give the index of the cutoff timestep, within a list of num_train_timesteps total timesteps.

yes discrete_timestep_cutoff is an index, but since we are looking aa training steps (not inference steps)here and training steps are always monotonic (e.g. 1000, 999, 987, ....0) so it is actually the same :)

I.e. if we have timesteps 1 [2] 3 [4] 5 we want to split like this: 1 [2] and 3 [4] 5.

heun is implemented based on the algorithem 2 in K-diffusion paper https://arxiv.org/abs/2206.00364. Feel free to take a look if you're interested. Basically for 2nd order scheduler, for each actual "step", we need to run the model foward pass twice to approximate the solution. In order to fit it in our API and to be consistent with our design philosophy, we decided to duplicate the timesteps. So if you intended to run 5 actual steps with heun scheduler, by passing num_inference_step = 5 to it, you will find the timesteps it generated is actually 4, [3], 3, [2], 2, [1], 1, [0], 0, where each index is duplicated for the second order update. so you can split your actual "step" however you want, i.e. you can totally split it as 1,2 and 3,4,5 if you want; but then when we handle the timesteps sequence in heun scheduler, you always have to make sure each second order update is proceeded by a first order update with a later timestep

src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint_sd_xl.py

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py

src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint_sd_xl.py

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py

src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint_sd_xl.py

patrickvonplaten · 2023-10-25T10:18:21Z

Hey @yiyixuxu, seems like you adapted the original fix I shared in the issue description which does indeed work for some cases. But in our testing we found it didn't work in all cases.

Here is the final code we ended up shipping in our fork of diffusers: playgroundai@1bab033

This line didn't make sense to us: filter(lambda ts: ts < discrete_timestep_cutoff, timesteps) Becasue - here we are comparing an actual timestep value, to an index. If you look at the way discrete_timestep_cutoff value, it appears to give the index of the cutoff timestep, within a list of num_train_timesteps total timesteps.

Overall I am not fully satisfied with the latest code in our fork because I don't fully understand second order schedulers, but empirically, the idea is that when splitting the timesteps array, the left split must be of even length and the right spilt must be of odd length.

I.e. if we have timesteps 1 [2] 3 [4] 5 we want to split like this: 1 [2] and 3 [4] 5.

This seems to work for various combinations of num_inference_steps and denoising_start/denoising_end.

@nhnt11 I think your code changes here are very similar / the same to what has been done here.

I've tested this PR and it fully fixes the problem (nice job @yiyixuxu).

What's happening here is the following:

1. timesteps are always in reverse order meaning they start with higher numbers
1. For 2nd order scheduler all timesteps except the first (highest one) are duplicated
  e.g. they always look as follows:

[999, 888, 888, 777, 777, ..., 111, 111, 1, 1]

This means that when we filter out all timesteps that are lower than a certain number so that we can start there, we always end up with an even number of timesteps:

[777, 777, ..., 111, 111, 1, 1]

or

[666, 666, ..., 111, 111, 1, 1]

It is not possible to end up with something like [666, 555, 555, ...., 111, 111, 1, 1] because all timesteps are duplicated.

Now, Heun schedulers are designed to always finish a full step in the middle of duplicated timesteps, e.g.
1st step, 1st heun derivate is 999
1st step, 2nd heun derivative is 888
<Step 1 finished>
2nd step, 1st heun derivative is 888
2nd step, 2nd heun devirate is 777
<Step 2 finished>
....

=> Therefore we always need to add plus one here to ensure that the heun step is finished.

nhnt11 · 2023-10-25T11:27:30Z

Thanks for the explanations! I am on the move so I definitely need to read everything again, but just want to call out that when I logged the timesteps for other second order schedulers (like DPM2), the duplication you mentioned was not seen. Only Heun had that duplication. Maybe that's also a bug? I am not sure.

nhnt11 · 2023-10-25T11:32:26Z

Some code:

common_config = {'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear'}
dpm2sched = KDPM2DiscreteScheduler(**common_config)
dpm2sched.set_timesteps(25)
print(dpm2sched.timesteps)

Result:

tensor([999.0000, 978.4973, 957.3750, 936.8752, 915.7500, 895.2502, 874.1250,
        853.6226, 832.5000, 811.9916, 790.8750, 770.3562, 749.2500, 728.7167,
        707.6250, 687.0720, 666.0000, 645.4221, 624.3750, 603.7672, 582.7500,
        562.1061, 541.1250, 520.4391, 499.5000, 478.7657, 457.8750, 437.0850,
        416.2500, 395.3960, 374.6250, 353.6959, 333.0000, 311.9814, 291.3750,
        270.2460, 249.7500, 228.4767, 208.1250, 186.6497, 166.5000, 144.7109,
        124.8750, 102.5155,  83.2500,  59.4987,  41.6250,   5.8573,   0.0000])

nhnt11 · 2023-10-25T11:39:11Z

I'm guessing this is because with DPM2 there is this interpolation step so while it's not exactly duplicated, it's effectively the same principle.

BTW thanks again for engaging as I learn this stuff, I want to make sure we are wiring things correctly in our production setup :)

And big thanks for this PR!

nhnt11 · 2023-10-25T11:45:14Z

src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint_sd_xl.py

+                # mean that we cut the timesteps in the middle of the denoising step
+                # (between 1st and 2nd devirative) which leads to incorrect results. By adding 1
+                # we ensure that the denoising process always ends after the 2nd derivate step of the scheduler
+                num_inference_steps = num_inference_steps + 1


I don't think num_inference_steps will always be even. Consider the case where the scheduler has been set up by calling set_timesteps(25) and then supplied denoising_start = 0.6. The following code prints an even number:

sched = SCHEDULERS["DPM2"] sched.set_timesteps(25) denoising_start = 0.6 discrete_timestep_cutoff = int( round( sched.config.num_train_timesteps - (denoising_start * sched.config.num_train_timesteps) ) ) num_inference_steps = (sched.timesteps < discrete_timestep_cutoff).sum().item() print(num_inference_steps)

This prints 20. For denoising_start=0.5, it prints 25. So num_inference_steps may be either odd or even depending on num_inference_steps and denoising_start passed into the pipeline call.

I think we should add 1 to it only if it's even, and not if it's odd. No?

@yiyixuxu @patrickvonplaten

Ah DPM2 can actually be different indeed if it interpolates, (Heun can't)

Nice catch! I'll open a PR to fix it

This is the result I get with hnf = 0.5 and steps = 25 with DPM2 with the code from this PR:

#5526 (comment)

Thanks @patrickvonplaten, all testcases I threw at it now produce good images.

I am just curious why we don't need to do any logic in the base step (where denoising_end is supplied to the base SDXL pipeline without img2img) since I would expect that we would want to ensure that the final timestep there is a second order timestep, and as far as I can tell it still has the odd/even issue when splitting. Maybe I am wrong!

In any case, latest main now produces good images for all my testcases like I said. Thank you! 🙏

…onfig (huggingface#5511) * fix * fix copies * remove heun from tests * add back heun and fix the tests to include 2nd order * fix the other test too * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * make style * add more comments --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]>

yiyixuxu added 2 commits October 24, 2023 15:56

fix

4b1e7d2

fix copies

8ea162b

yiyixuxu added 3 commits October 24, 2023 18:16

remove heun from tests

5a0ac58

add back heun and fix the tests to include 2nd order

443aa6b

fix the other test too

de85e0b

yiyixuxu requested a review from patrickvonplaten October 24, 2023 20:39