Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting Guess Mode with StableDiffusionControlNetPipeline #2971

Closed
hireshgupta1997 opened this issue Apr 4, 2023 · 9 comments
Closed

Comments

@hireshgupta1997
Copy link

The ControlNet model supports Guess Mode that helps generate results without any text prompts. It recognizes the contents of the control map quite well and generates amazing outputs without any text prompt input.

I would want to place request for this feature that'll be helpful for the whole community. Please refer to the results below:

@takuma104
Copy link
Contributor

takuma104 commented Apr 5, 2023

I created a PoC for GuessMode. I had been working on it before but completely forgot about it until now...
main...takuma104:diffusers:controlnet-guess-mode

Usage:

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny")
pipe = StableDiffusionControlNetPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", controlnet=controlnet).to('cuda')
image = pipe("", image=canny_image, guess_mode=True).images[0]
image.save("guess_mode_generated.png")

I used lllyasviel/ControlNet as a reference implementation and compared its output with my implementation.

https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare_guess_mode/README.md

The reference implementation uses a trained Unet with ControlNet, while the Diffusers version uses an vanilla SD1.5 non-EMA model for Unet, so there is a slight difference in the output by default. I felt that GuessMode further increased this difference. Additionally, there seems to be a tendency for the colors to be more intense. It's possible that slightly lowering the controlnet_conditioning_scale might yield better results.

In this comparison, the "with prompt" version used the following prompts:

prompt = "best quality, extremely detailed"
negative_prompt = "lowres, bad anatomy, worst quality, low quality"

For Guess Mode, no prompts were specified (""). It seems that prompts can still be specified during GuessMode, but they were not included in this comparison.

This PoC implementation still needs improvements, as there are currently some limitations when using GuessMode:

  • Do not support for no classifier free guidance (guidance_scale=1.0)
  • Do not support for batch size != 1

When using GuessMode, only ControlNet should operate with no classifier free guidance, but the implementation for that part is still incomplete.

@hireshgupta1997 Could you please try out this PoC if it's alright with you?

@hireshgupta1997
Copy link
Author

hireshgupta1997 commented Apr 6, 2023

Thanks @takuma104. It works completely fine for me. 😄 Would it be possible to create a PR to release this in the diffusers library?

@patrickvonplaten
Copy link
Contributor

Happy to add such a feature to diffusers ! Think this would make a lot of sense

@takuma104
Copy link
Contributor

@hireshgupta1997 Thanks! I just opened Draft PR #2998.

@hireshgupta1997
Copy link
Author

Thanks a lot @takuma104 for making these changes so quick 😄. I just wanted to learn more about the limitations to support classifier_free_guidance with guess_mode. I see an option to provide Guidance Scale in controlnet's original implementation.

@takuma104
Copy link
Contributor

takuma104 commented Apr 8, 2023

@hireshgupta1997 In the Diffusers' Pipeline, the behavior changes based on the guidance_scale argument. If a value greater than 1.0 is specified for guidance_scale (which is commonly used), it becomes classifier_free_guidance. If a value less than or equal to 1.0 is specified, the processing is different and was not supported before. However, as of yesterday's PR version, this limitation has been removed. The previously mentioned limitation on the batch size has also been removed.

@hireshgupta1997
Copy link
Author

Sure, Thanks a lot @takuma104 for clarifying and making the necessary changes. 😄

@hireshgupta1997
Copy link
Author

hireshgupta1997 commented Apr 27, 2023

Thank you @patrickvonplaten and @takuma104 for releasing this feature in diffusers==0.16.0. 😄 Marking this issue as completed.

@BraveDistribution
Copy link

Hello,

how would one train this from scratch? A controlnet with guess mode? Id like to have an inpainting generator with diffusers.

thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants