Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix a bug in 2nd order schedulers when using in ensemble of experts config #5511

Merged
merged 10 commits into from
Oct 25, 2023

Conversation

yiyixuxu
Copy link
Collaborator

@yiyixuxu yiyixuxu commented Oct 24, 2023

This PR fix the bug reported in #5447 and allow users to 2nd order schedulers for ensemble of experts approach

I'm using an toy example to quickly illustrate the error we are having

  • e.g. we have 4 inference steps: 3, 2, 1, 0
  • in set_timesteps, we repeat the index for 2nd order update (steps in [] are 2nd order update) : 3, [2], 2, [1], 1, [0], 0
  • now we split these timesteps between base and refiner, if the cutoff point is 1, we will have
    • base: 3, [2], 2
    • refiner [1], 1, [0], 0

-> You can see that the timesteps for refiner is wrong now, should be 2, [1], 1, [0], 0

this script now works

import torch
from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
from diffusers import HeunDiscreteScheduler

sdxl_model = StableDiffusionXLPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
sdxl_model.enable_model_cpu_offload()

refiner_model = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-refiner-1.0',
    text_encoder_2=sdxl_model.text_encoder_2,
    vae = sdxl_model.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner_model.enable_model_cpu_offload()

sdxl_model.scheduler = HeunDiscreteScheduler.from_config(sdxl_model.scheduler.config, use_karras_sigmas=True)
refiner_model.scheduler = HeunDiscreteScheduler.from_config(refiner_model.scheduler.config, use_karras_sigmas=True)

generator = torch.Generator(device='cuda').manual_seed(12345)

hnf = 0.5
params = {
    'prompt': ['evening sunset scenery blue sky nature, glass bottle with a galaxy in it'],
    'negative_prompt': ['text, watermark'],
    "num_inference_steps": 50,
    "height": 1024,
    "width": 1024,
    "guidance_scale": 7,
}

sdxl_res = sdxl_model(**params, denoising_end=hnf, generator=generator, output_type='latent')
sdxl_latents = sdxl_res.images

refiner_res = refiner_model(
    output_type="pil",
    image=sdxl_latents,
    prompt=params['prompt'],
    num_inference_steps=params['num_inference_steps'],
    negative_prompt=params['negative_prompt'],
    generator=generator,
    denoising_start=hnf,
)
refiner_imgs = refiner_res.images

refiner_imgs[0].save("out.png")

yiyi_test_4_out

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 24, 2023

The documentation is not available anymore as the PR was closed or merged.

@nhnt11
Copy link

nhnt11 commented Oct 24, 2023

Hey @yiyixuxu, seems like you adapted the original fix I shared in the issue description which does indeed work for some cases. But in our testing we found it didn't work in all cases.

Here is the final code we ended up shipping in our fork of diffusers: playgroundai@1bab033

This line didn't make sense to us: filter(lambda ts: ts < discrete_timestep_cutoff, timesteps)
Becasue - here we are comparing an actual timestep value, to an index. If you look at the way discrete_timestep_cutoff value, it appears to give the index of the cutoff timestep, within a list of num_train_timesteps total timesteps.

Overall I am not fully satisfied with the latest code in our fork because I don't fully understand second order schedulers, but empirically, the idea is that when splitting the timesteps array, the left split must be of even length and the right spilt must be of odd length.

I.e. if we have timesteps 1 [2] 3 [4] 5 we want to split like this: 1 [2] and 3 [4] 5.

This seems to work for various combinations of num_inference_steps and denoising_start/denoising_end.

@yiyixuxu
Copy link
Collaborator Author

yiyixuxu commented Oct 25, 2023

@nhnt11

I don't think it is same as your fix - it seems like theget_timesteps function you provided is meant for the base pipeline. However, only the refiner pipeline has issue with timesteps after the split. so we only need to fix the refiner pipeline here.

I would really appreciate if you can try out the code in this PR and share with me if you find examples where it fails :)

to answer some of your questions

This line didn't make sense to us: filter(lambda ts: ts < discrete_timestep_cutoff, timesteps)
Becasue - here we are comparing an actual timestep value, to an index. If you look at the way discrete_timestep_cutoff value, it appears to give the index of the cutoff timestep, within a list of num_train_timesteps total timesteps.

yes discrete_timestep_cutoff is an index, but since we are looking aa training steps (not inference steps)here and training steps are always monotonic (e.g. 1000, 999, 987, ....0) so it is actually the same :)

I.e. if we have timesteps 1 [2] 3 [4] 5 we want to split like this: 1 [2] and 3 [4] 5.

heun is implemented based on the algorithem 2 in K-diffusion paper https://arxiv.org/abs/2206.00364. Feel free to take a look if you're interested. Basically for 2nd order scheduler, for each actual "step", we need to run the model foward pass twice to approximate the solution. In order to fit it in our API and to be consistent with our design philosophy, we decided to duplicate the timesteps. So if you intended to run 5 actual steps with heun scheduler, by passing num_inference_step = 5 to it, you will find the timesteps it generated is actually 4, [3], 3, [2], 2, [1], 1, [0], 0, where each index is duplicated for the second order update. so you can split your actual "step" however you want, i.e. you can totally split it as 1,2 and 3,4,5 if you want; but then when we handle the timesteps sequence in heun scheduler, you always have to make sure each second order update is proceeded by a first order update with a later timestep

@patrickvonplaten
Copy link
Contributor

Hey @yiyixuxu, seems like you adapted the original fix I shared in the issue description which does indeed work for some cases. But in our testing we found it didn't work in all cases.

Here is the final code we ended up shipping in our fork of diffusers: playgroundai@1bab033

This line didn't make sense to us: filter(lambda ts: ts < discrete_timestep_cutoff, timesteps) Becasue - here we are comparing an actual timestep value, to an index. If you look at the way discrete_timestep_cutoff value, it appears to give the index of the cutoff timestep, within a list of num_train_timesteps total timesteps.

Overall I am not fully satisfied with the latest code in our fork because I don't fully understand second order schedulers, but empirically, the idea is that when splitting the timesteps array, the left split must be of even length and the right spilt must be of odd length.

I.e. if we have timesteps 1 [2] 3 [4] 5 we want to split like this: 1 [2] and 3 [4] 5.

This seems to work for various combinations of num_inference_steps and denoising_start/denoising_end.

@nhnt11 I think your code changes here are very similar / the same to what has been done here.

I've tested this PR and it fully fixes the problem (nice job @yiyixuxu).

What's happening here is the following:

    1. timesteps are always in reverse order meaning they start with higher numbers
    1. For 2nd order scheduler all timesteps except the first (highest one) are duplicated
      e.g. they always look as follows:
[999, 888, 888, 777, 777, ..., 111, 111, 1, 1]

This means that when we filter out all timesteps that are lower than a certain number so that we can start there, we always end up with an even number of timesteps:

[777, 777, ..., 111, 111, 1, 1]

or

[666, 666, ..., 111, 111, 1, 1]

It is not possible to end up with something like [666, 555, 555, ...., 111, 111, 1, 1] because all timesteps are duplicated.

Now, Heun schedulers are designed to always finish a full step in the middle of duplicated timesteps, e.g.
1st step, 1st heun derivate is 999
1st step, 2nd heun derivative is 888
<Step 1 finished>
2nd step, 1st heun derivative is 888
2nd step, 2nd heun devirate is 777
<Step 2 finished>
....

=> Therefore we always need to add plus one here to ensure that the heun step is finished.

@patrickvonplaten patrickvonplaten merged commit 0fc2571 into main Oct 25, 2023
13 checks passed
@nhnt11
Copy link

nhnt11 commented Oct 25, 2023

Thanks for the explanations! I am on the move so I definitely need to read everything again, but just want to call out that when I logged the timesteps for other second order schedulers (like DPM2), the duplication you mentioned was not seen. Only Heun had that duplication. Maybe that's also a bug? I am not sure.

@nhnt11
Copy link

nhnt11 commented Oct 25, 2023

Some code:

common_config = {'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear'}
dpm2sched = KDPM2DiscreteScheduler(**common_config)
dpm2sched.set_timesteps(25)
print(dpm2sched.timesteps)

Result:

tensor([999.0000, 978.4973, 957.3750, 936.8752, 915.7500, 895.2502, 874.1250,
        853.6226, 832.5000, 811.9916, 790.8750, 770.3562, 749.2500, 728.7167,
        707.6250, 687.0720, 666.0000, 645.4221, 624.3750, 603.7672, 582.7500,
        562.1061, 541.1250, 520.4391, 499.5000, 478.7657, 457.8750, 437.0850,
        416.2500, 395.3960, 374.6250, 353.6959, 333.0000, 311.9814, 291.3750,
        270.2460, 249.7500, 228.4767, 208.1250, 186.6497, 166.5000, 144.7109,
        124.8750, 102.5155,  83.2500,  59.4987,  41.6250,   5.8573,   0.0000])

@nhnt11
Copy link

nhnt11 commented Oct 25, 2023

I'm guessing this is because with DPM2 there is this interpolation step so while it's not exactly duplicated, it's effectively the same principle.

BTW thanks again for engaging as I learn this stuff, I want to make sure we are wiring things correctly in our production setup :)

And big thanks for this PR!

# mean that we cut the timesteps in the middle of the denoising step
# (between 1st and 2nd devirative) which leads to incorrect results. By adding 1
# we ensure that the denoising process always ends after the 2nd derivate step of the scheduler
num_inference_steps = num_inference_steps + 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think num_inference_steps will always be even. Consider the case where the scheduler has been set up by calling set_timesteps(25) and then supplied denoising_start = 0.6. The following code prints an even number:

sched = SCHEDULERS["DPM2"]
sched.set_timesteps(25)
denoising_start = 0.6
discrete_timestep_cutoff = int(
    round(
        sched.config.num_train_timesteps
        - (denoising_start * sched.config.num_train_timesteps)
    )
)
num_inference_steps = (sched.timesteps < discrete_timestep_cutoff).sum().item()
print(num_inference_steps)

This prints 20. For denoising_start=0.5, it prints 25. So num_inference_steps may be either odd or even depending on num_inference_steps and denoising_start passed into the pipeline call.

I think we should add 1 to it only if it's even, and not if it's odd. No?

@yiyixuxu @patrickvonplaten

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah DPM2 can actually be different indeed if it interpolates, (Heun can't)

Nice catch! I'll open a PR to fix it

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the result I get with hnf = 0.5 and steps = 25 with DPM2 with the code from this PR:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @patrickvonplaten, all testcases I threw at it now produce good images.

I am just curious why we don't need to do any logic in the base step (where denoising_end is supplied to the base SDXL pipeline without img2img) since I would expect that we would want to ensure that the final timestep there is a second order timestep, and as far as I can tell it still has the odd/even issue when splitting. Maybe I am wrong!

In any case, latest main now produces good images for all my testcases like I said. Thank you! 🙏

@kashif kashif deleted the 2nd-order-scheduler-denoising-start branch November 1, 2023 17:23
kashif pushed a commit to kashif/diffusers that referenced this pull request Nov 11, 2023
…onfig (huggingface#5511)

* fix

* fix copies

* remove heun from tests

* add back heun and fix the tests to include 2nd order

* fix the other test too

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* make style

* add more comments

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <[email protected]>
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
…onfig (huggingface#5511)

* fix

* fix copies

* remove heun from tests

* add back heun and fix the tests to include 2nd order

* fix the other test too

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* make style

* add more comments

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <[email protected]>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
…onfig (huggingface#5511)

* fix

* fix copies

* remove heun from tests

* add back heun and fix the tests to include 2nd order

* fix the other test too

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* make style

* add more comments

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants