Supporting Guess Mode with StableDiffusionControlNetPipeline #2971

hireshgupta1997 · 2023-04-04T17:22:57Z

The ControlNet model supports Guess Mode that helps generate results without any text prompts. It recognizes the contents of the control map quite well and generates amazing outputs without any text prompt input.

I would want to place request for this feature that'll be helpful for the whole community. Please refer to the results below:

Generated Examples - ControlNet Repo
Implementation - llyasviel/ControlNet@005008b
SD WebUI Implementation - c6fc96eef9c342
Related Issue in sd-webui-controlnet

takuma104 · 2023-04-05T16:51:09Z

I created a PoC for GuessMode. I had been working on it before but completely forgot about it until now...
main...takuma104:diffusers:controlnet-guess-mode

Usage:

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny")
pipe = StableDiffusionControlNetPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", controlnet=controlnet).to('cuda')
image = pipe("", image=canny_image, guess_mode=True).images[0]
image.save("guess_mode_generated.png")

I used lllyasviel/ControlNet as a reference implementation and compared its output with my implementation.

https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare_guess_mode/README.md

The reference implementation uses a trained Unet with ControlNet, while the Diffusers version uses an vanilla SD1.5 non-EMA model for Unet, so there is a slight difference in the output by default. I felt that GuessMode further increased this difference. Additionally, there seems to be a tendency for the colors to be more intense. It's possible that slightly lowering the controlnet_conditioning_scale might yield better results.

In this comparison, the "with prompt" version used the following prompts:

prompt = "best quality, extremely detailed"
negative_prompt = "lowres, bad anatomy, worst quality, low quality"

For Guess Mode, no prompts were specified (""). It seems that prompts can still be specified during GuessMode, but they were not included in this comparison.

This PoC implementation still needs improvements, as there are currently some limitations when using GuessMode:

Do not support for no classifier free guidance (guidance_scale=1.0)
Do not support for batch size != 1

When using GuessMode, only ControlNet should operate with no classifier free guidance, but the implementation for that part is still incomplete.

@hireshgupta1997 Could you please try out this PoC if it's alright with you?

hireshgupta1997 · 2023-04-06T05:46:57Z

Thanks @takuma104. It works completely fine for me. 😄 Would it be possible to create a PR to release this in the diffusers library?

patrickvonplaten · 2023-04-06T13:48:53Z

Happy to add such a feature to diffusers ! Think this would make a lot of sense

takuma104 · 2023-04-06T15:25:31Z

@hireshgupta1997 Thanks! I just opened Draft PR #2998.

hireshgupta1997 · 2023-04-07T14:41:23Z

Thanks a lot @takuma104 for making these changes so quick 😄. I just wanted to learn more about the limitations to support classifier_free_guidance with guess_mode. I see an option to provide Guidance Scale in controlnet's original implementation.

takuma104 · 2023-04-08T13:44:23Z

@hireshgupta1997 In the Diffusers' Pipeline, the behavior changes based on the guidance_scale argument. If a value greater than 1.0 is specified for guidance_scale (which is commonly used), it becomes classifier_free_guidance. If a value less than or equal to 1.0 is specified, the processing is different and was not supported before. However, as of yesterday's PR version, this limitation has been removed. The previously mentioned limitation on the batch size has also been removed.

hireshgupta1997 · 2023-04-10T03:51:56Z

Sure, Thanks a lot @takuma104 for clarifying and making the necessary changes. 😄

hireshgupta1997 · 2023-04-27T16:12:44Z

Thank you @patrickvonplaten and @takuma104 for releasing this feature in diffusers==0.16.0. 😄 Marking this issue as completed.

BraveDistribution · 2023-11-25T19:44:39Z

Hello,

how would one train this from scratch? A controlnet with guess mode? Id like to have an inpainting generator with diffusers.

thanks.

takuma104 mentioned this issue Apr 6, 2023

Add to support Guess Mode for StableDiffusionControlnetPipleline #2998

Merged

4 tasks

hireshgupta1997 closed this as completed Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting Guess Mode with StableDiffusionControlNetPipeline #2971

Supporting Guess Mode with StableDiffusionControlNetPipeline #2971

hireshgupta1997 commented Apr 4, 2023

takuma104 commented Apr 5, 2023 •

edited

Loading

hireshgupta1997 commented Apr 6, 2023 •

edited

Loading

patrickvonplaten commented Apr 6, 2023

takuma104 commented Apr 6, 2023

hireshgupta1997 commented Apr 7, 2023

takuma104 commented Apr 8, 2023 •

edited

Loading

hireshgupta1997 commented Apr 10, 2023

hireshgupta1997 commented Apr 27, 2023 •

edited

Loading

BraveDistribution commented Nov 25, 2023

Supporting Guess Mode with StableDiffusionControlNetPipeline #2971

Supporting Guess Mode with StableDiffusionControlNetPipeline #2971

Comments

hireshgupta1997 commented Apr 4, 2023

takuma104 commented Apr 5, 2023 • edited Loading

hireshgupta1997 commented Apr 6, 2023 • edited Loading

patrickvonplaten commented Apr 6, 2023

takuma104 commented Apr 6, 2023

hireshgupta1997 commented Apr 7, 2023

takuma104 commented Apr 8, 2023 • edited Loading

hireshgupta1997 commented Apr 10, 2023

hireshgupta1997 commented Apr 27, 2023 • edited Loading

BraveDistribution commented Nov 25, 2023

takuma104 commented Apr 5, 2023 •

edited

Loading

hireshgupta1997 commented Apr 6, 2023 •

edited

Loading

takuma104 commented Apr 8, 2023 •

edited

Loading

hireshgupta1997 commented Apr 27, 2023 •

edited

Loading