Skip to content

Commit

Permalink
Merge pull request #37 from v0xie/dev
Browse files Browse the repository at this point in the history
S-CFG, Optimizations, and More
  • Loading branch information
v0xie authored May 18, 2024
2 parents 1dd3b2b + 86ffc75 commit 0f90ebd
Show file tree
Hide file tree
Showing 9 changed files with 1,681 additions and 49 deletions.
101 changes: 93 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,78 @@
# sd-webui-incantations
This extension implements multiple novel algorithms that enhance image quality, prompt following, and more.

## COMPATIBILITY NOTICES:
#### Currently incompatible with stable-diffusion-webui-forge
Use this extension with Forge: https://github.com/pamparamm/sd-perturbed-attention

# Table of Contents
- [What is this?](#what-is-this)
- [Installation](#installation)
- [Compatibility Notice](#compatibility-notice)
- [News](#compatibility-notice)
- [Extension Features](#extension-features)
- [Semantic CFG](#semantic-cfg-s-cfg)
- [Perturbed Attention Guidance](#perturbed-attention-guidance)
- [CFG Scheduler](#cfg-interval--cfg-scheduler)
- [Multi-Concept T2I-Zero](#multi-concept-t2i-zero--attention-regulation)
- [Seek for Incantations](#seek-for-incantations)
- [Tutorial](#tutorial)
- [Other cool extensions](#also-check-out)
- [Credits](#credits)

## What is this?
### This extension for [AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) implements algorithms from state-of-the-art research to achieve **higher-quality** images with *more accurate* prompt adherence.

All methods are **training-free** and rely only on modifying the text embeddings or attention maps.


## Installation
To install the `sd-webui-incantations` extension, follow these steps:

0. **Ensure you have the latest Automatic1111 stable-diffusion-webui version ≥ 1.93 installed**

1. **Open the "Extensions" tab and navigate to the "Install from URL" section**:

2. **Paste the repository URL into the "URL for extension's git repository" field**:
```
https://github.com/v0xie/sd-webui-incantations.git
```

3. **Press the Install button**: Wait a few seconds for the extension to finish installing.

4. **Restart the Web UI**:
Completely restart your Stable Diffusion Web UI to load the new extension.

## Compatibility Notice
* Incompatible with **stable-diffusion-webui-forge**: Use this extension with Forge: https://github.com/pamparamm/sd-perturbed-attention
* Reported incompatible with Adetailer: https://github.com/v0xie/sd-webui-incantations/issues/21
* Incompatible with some older webui versions: https://github.com/v0xie/sd-webui-incantations/issues/14
* May conflict with other extensions which modify the CFGDenoiser

## News
- 15-05-2024 🔥 - S-CFG, optimizations for PAG and T2I-Zero, and more! https://github.com/v0xie/sd-webui-incantations/pull/37
- 29-04-2024 🔥 - The implementation of T2I-Zero is fixed and works much more stably now.

# Extension Features

---
## Semantic CFG (S-CFG)
https://arxiv.org/abs/2404.05384
Dynamically rescale CFG guidance per semantic region to a uniform level to improve image / text alignment.
**Very computationally expensive**: A batch size of 4 with 1024x1024 will max out a 24GB 4090.

#### Controls
* **SCFG Scale**: Multiplies the correction by a constant factor. Default: 1.0.
* **SCFG R**: A hyperparameter controlling the factor of cross-attention map refinement. Higher values use more memory and computation time. Default: 4.
* **Rate Min**: The minimum rate that the CFG can be scaled by. Default: 0.8.
* **Rate Max**: The maximum rate that the CFG can be scaled by. Default: 3.0.
* **Clamp Rate**: Overrides Rate Max. Clamps the Max Rate to Clamp Rate / CFG. Default: 0.0.
* **Start Step**: Start S-CFG on this step.
* **End Step**: End S-CFG after this step.

#### Results
Prompt: "A cute puppy on the moon", Min Rate: 0.5, Max Rate: 10.0
- SD 1.5
![image](./images/xyz_grid-0006-1-SCFG.jpg)

* May conflict with extensions that modify the CFGDenoiser
#### Also check out the paper authors' official project repository:
- https://github.com/SmilesDZgk/S-CFG
#### [Return to top](#sd-webui-incantations)

---
## Perturbed Attention Guidance
Expand All @@ -30,7 +95,10 @@ Prompt: "a puppy and a kitten on the moon"
#### Also check out the paper authors' official project page:
- https://ku-cvlab.github.io/Perturbed-Attention-Guidance/

#### [Return to top](#sd-webui-incantations)

---

## CFG Interval / CFG Scheduler
https://arxiv.org/abs/2404.07724 and https://arxiv.org/abs/2404.13040

Expand Down Expand Up @@ -62,6 +130,8 @@ Prompt: "A pointillist painting of a raccoon looking at the sea."
Prompt: "An epic lithograph of a handsome salaryman carefully pouring coffee from a cup into an overflowing carafe, 4K, directed by Wong Kar Wai"
- SD XL
![image](./images/xyz_grid-3380-1-An%20epic%20lithograph%20of%20a%20handsome%20salaryman%20carefully%20pouring%20coffee%20from%20a%20cup%20into%20an%20overflowing%20carafe,%204K,%20directed%20by%20Wong.jpg)

#### [Return to top](#sd-webui-incantations)
---
## Multi-Concept T2I-Zero / Attention Regulation

Expand Down Expand Up @@ -98,6 +168,7 @@ SD XL
- https://multi-concept-t2i-zero.github.io/
- https://github.com/YaNgZhAnG-V5/attention_regulation

#### [Return to top](#sd-webui-incantations)
---
### Seek for Incantations
An incomplete implementation of a "prompt-upsampling" method from https://arxiv.org/abs/2401.06345
Expand All @@ -121,6 +192,7 @@ SD XL
* Modified Prompt: cinematic 4K photo of a dog riding a bus and eating cake and wearing headphones BREAK - - - - - dog - - bus - - - - - -
![image](./images/xyz_grid-2652-1419902843-cinematic%204K%20photo%20of%20a%20dog%20riding%20a%20bus%20and%20eating%20cake%20and%20wearing%20headphones.png)

#### [Return to top](#sd-webui-incantations)
---

### Issues / Pull Requests are welcome!
Expand All @@ -132,6 +204,8 @@ SD XL

[![image](https://cdn-uploads.huggingface.co/production/uploads/6345bd89fe134dfd7a0dba40/TzuZWTiHAc3wTxh3PwGL5.png)](https://youtu.be/lMQ7DIPmrfI)

#### [Return to top](#sd-webui-incantations)

## Also check out:

* **Characteristic Guidance**: Awesome enhancements for sampling at high CFG levels [https://github.com/scraed/CharacteristicGuidanceWebUI](https://github.com/scraed/CharacteristicGuidanceWebUI)
Expand All @@ -144,6 +218,7 @@ SD XL

* **Agent Attention**: Faster image generation and improved image quality with Agent Attention [https://github.com/v0xie/sd-webui-agentattention](https://github.com/v0xie/sd-webui-agentattention)

#### [Return to top](#sd-webui-incantations)
---

### Credits
Expand Down Expand Up @@ -203,9 +278,19 @@ SD XL
primaryClass={cs.CV}
}

@misc{shen2024rethinking,
title={Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance},
author={Dazhong Shen and Guanglu Song and Zeyue Xue and Fu-Yun Wang and Yu Liu},
year={2024},
eprint={2404.05384},
archivePrefix={arXiv},
primaryClass={cs.CV}
}


- Hard Prompts Made Easy (https://github.com/YuxinWenRick/hard-prompts-made-easy)
- [Hard Prompts Made Easy](https://github.com/YuxinWenRick/hard-prompts-made-easy)
- [@udon-universe's extension templates](https://github.com/udon-universe/stable-diffusion-webui-extension-templates)

- @udon-universe's extension templates (https://github.com/udon-universe/stable-diffusion-webui-extension-templates)
#### [Return to top](#sd-webui-incantations)
---

Binary file added images/xyz_grid-0006-1-SCFG.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
237 changes: 237 additions & 0 deletions scripts/cfg_combiner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
import gradio as gr
import logging
import torch
from modules import shared, scripts, devices, patches, script_callbacks
from modules.script_callbacks import CFGDenoiserParams
from modules.processing import StableDiffusionProcessing
from scripts.incantation_base import UIWrapper
from scripts.scfg import scfg_combine_denoised

logger = logging.getLogger(__name__)

class CFGCombinerScript(UIWrapper):
""" Some scripts modify the CFGs in ways that are not compatible with each other.
This script will patch the CFG denoiser function to apply CFG in an ordered way.
This script adds a dict named 'incant_cfg_params' to the processing object.
This dict contains the following:
'denoiser': the denoiser object
'pag_params': list of PAG parameters
'scfg_params': the S-CFG parameters
...
"""
def __init__(self):
pass

# Extension title in menu UI
def title(self):
return "CFG Combiner"

# Decide to show menu in txt2img or img2img
def show(self, is_img2img):
return scripts.AlwaysVisible

# Setup menu ui detail
def setup_ui(self, is_img2img):
self.infotext_fields = []
self.paste_field_names = []
return []

def before_process(self, p: StableDiffusionProcessing, *args, **kwargs):
logger.debug("CFGCombinerScript before_process")
cfg_dict = {
"denoiser": None,
"pag_params": None,
"scfg_params": None
}
setattr(p, 'incant_cfg_params', cfg_dict)

def process(self, p: StableDiffusionProcessing, *args, **kwargs):
pass

def before_process_batch(self, p: StableDiffusionProcessing, *args, **kwargs):
pass

def process_batch(self, p: StableDiffusionProcessing, *args, **kwargs):
""" Process the batch and hook the CFG denoiser if PAG or S-CFG is active """
logger.debug("CFGCombinerScript process_batch")
pag_active = p.extra_generation_params.get('PAG Active', False)
cfg_active = p.extra_generation_params.get('CFG Interval Enable', False)
scfg_active = p.extra_generation_params.get('SCFG Active', False)

if not any([
pag_active,
cfg_active,
scfg_active
]):
return

#logger.debug("CFGCombinerScript process_batch: pag_active or scfg_active")

cfg_denoise_lambda = lambda params: self.on_cfg_denoiser_callback(params, p.incant_cfg_params)
unhook_lambda = lambda: self.unhook_callbacks()

script_callbacks.on_cfg_denoiser(cfg_denoise_lambda)
script_callbacks.on_script_unloaded(unhook_lambda)
logger.debug('Hooked callbacks')

def postprocess_batch(self, p: StableDiffusionProcessing, *args, **kwargs):
logger.debug("CFGCombinerScript postprocess_batch")
script_callbacks.remove_current_script_callbacks()

def unhook_callbacks(self, cfg_dict = None):
if not cfg_dict:
return
self.unpatch_cfg_denoiser(cfg_dict)

def on_cfg_denoiser_callback(self, params: CFGDenoiserParams, cfg_dict: dict):
""" Callback for when the CFG denoiser is called
Patches the combine_denoised function with a custom one.
"""
if cfg_dict['denoiser'] is None:
cfg_dict['denoiser'] = params.denoiser
else:
self.unpatch_cfg_denoiser(cfg_dict)
self.patch_cfg_denoiser(params.denoiser, cfg_dict)

def patch_cfg_denoiser(self, denoiser, cfg_dict: dict):
""" Patch the CFG Denoiser combine_denoised function """
if not cfg_dict:
logger.error("Unable to patch CFG Denoiser, no dict passed as cfg_dict")
return
if not denoiser:
logger.error("Unable to patch CFG Denoiser, denoiser is None")
return

if getattr(denoiser, 'combine_denoised_patched', False) is False:
try:
setattr(denoiser, 'combine_denoised_original', denoiser.combine_denoised)
# create patch that references the original function
pass_conds_func = lambda *args, **kwargs: combine_denoised_pass_conds_list(
*args,
**kwargs,
original_func = denoiser.combine_denoised_original,
pag_params = cfg_dict['pag_params'],
scfg_params = cfg_dict['scfg_params']
)
patched_combine_denoised = patches.patch(__name__, denoiser, "combine_denoised", pass_conds_func)
setattr(denoiser, 'combine_denoised_patched', True)
setattr(denoiser, 'combine_denoised_original', patches.original(__name__, denoiser, "combine_denoised"))
except KeyError:
logger.exception("KeyError patching combine_denoised")
pass
except RuntimeError:
logger.exception("RuntimeError patching combine_denoised")
pass

def unpatch_cfg_denoiser(self, cfg_dict = None):
""" Unpatch the CFG Denoiser combine_denoised function """
if cfg_dict is None:
return
denoiser = cfg_dict.get('denoiser', None)
if denoiser is None:
return

setattr(denoiser, 'combine_denoised_patched', False)
try:
patches.undo(__name__, denoiser, "combine_denoised")
except KeyError:
logger.exception("KeyError unhooking combine_denoised")
pass
except RuntimeError:
logger.exception("RuntimeError unhooking combine_denoised")
pass

cfg_dict['denoiser'] = None


def combine_denoised_pass_conds_list(*args, **kwargs):
""" Hijacked function for combine_denoised in CFGDenoiser
Currently relies on the original function not having any kwargs
If any of the params are not None, it will apply the corresponding guidance
The order of guidance is:
1. CFG and S-CFG are combined multiplicatively
2. PAG guidance is added to the result
3. ...
...
"""
original_func = kwargs.get('original_func', None)
pag_params = kwargs.get('pag_params', None)
scfg_params = kwargs.get('scfg_params', None)

if pag_params is None and scfg_params is None:
logger.warning("No reason to hijack combine_denoised")
return original_func(*args)

def new_combine_denoised(x_out, conds_list, uncond, cond_scale):
denoised_uncond = x_out[-uncond.shape[0]:]
denoised = torch.clone(denoised_uncond)

### Variables
# 0. Standard CFG Value
cfg_scale = cond_scale

# 1. CFG Interval
# Overrides cfg_scale if pag_params is not None
if pag_params is not None:
if pag_params.cfg_interval_enable:
cfg_scale = pag_params.cfg_interval_scheduled_value

# 2. PAG
pag_x_out = None
pag_scale = None
if pag_params is not None:
pag_active = pag_params.pag_active
pag_x_out = pag_params.pag_x_out
pag_scale = pag_params.pag_scale

### Combine Denoised
for i, conds in enumerate(conds_list):
for cond_index, weight in conds:

model_delta = x_out[cond_index] - denoised_uncond[i]

# S-CFG
rate = 1.0
if scfg_params is not None:
rate = scfg_combine_denoised(
model_delta = model_delta,
cfg_scale = cfg_scale,
scfg_params = scfg_params,
)
# If rate is not an int, convert to tensor
if rate is None:
logger.error("scfg_combine_denoised returned None, using default rate of 1.0")
rate = 1.0
elif not isinstance(rate, int) and not isinstance(rate, float):
rate = rate.to(device=shared.device, dtype=model_delta.dtype)
else:
# rate is tensor, probably
pass

# 1. Experimental formulation for S-CFG combined with CFG
denoised[i] += (model_delta) * rate * (weight * cfg_scale)
del rate

# 2. PAG
# PAG is added like CFG
if pag_params is not None:
if not pag_active:
pass
# Not within step interval?
elif not pag_params.pag_start_step <= pag_params.step <= pag_params.pag_end_step:
pass
# Scale is zero?
elif pag_scale <= 0:
pass
# do pag
else:
try:
denoised[i] += (x_out[cond_index] - pag_x_out[i]) * (weight * pag_scale)
except Exception as e:
logger.exception("Exception in combine_denoised_pass_conds_list - %s", e)

#torch.cuda.empty_cache()
devices.torch_gc()

return denoised
return new_combine_denoised(*args)
Loading

0 comments on commit 0f90ebd

Please sign in to comment.