Merge pull request #37 from v0xie/dev

S-CFG, Optimizations, and More
v0xie · May 18, 2024 · 0f90ebd · 0f90ebd
2 parents 1dd3b2b + 86ffc75
commit 0f90ebd
Show file tree

Hide file tree

Showing 9 changed files with 1,681 additions and 49 deletions.
diff --git a/README.md b/README.md
@@ -1,13 +1,78 @@
 # sd-webui-incantations
-This extension implements multiple novel algorithms that enhance image quality, prompt following, and more.
-
-## COMPATIBILITY NOTICES:
-####  Currently incompatible with stable-diffusion-webui-forge 
-Use this extension with Forge: https://github.com/pamparamm/sd-perturbed-attention
 
+# Table of Contents
+- [What is this?](#what-is-this)
+- [Installation](#installation)
+- [Compatibility Notice](#compatibility-notice)
+- [News](#compatibility-notice)
+- [Extension Features](#extension-features)
+    - [Semantic CFG](#semantic-cfg-s-cfg)
+    - [Perturbed Attention Guidance](#perturbed-attention-guidance)
+    - [CFG Scheduler](#cfg-interval--cfg-scheduler)
+    - [Multi-Concept T2I-Zero](#multi-concept-t2i-zero--attention-regulation)
+    - [Seek for Incantations](#seek-for-incantations)
+- [Tutorial](#tutorial)
+- [Other cool extensions](#also-check-out)
+- [Credits](#credits)
+
+## What is this?
+### This extension for [AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) implements algorithms from state-of-the-art research to achieve **higher-quality** images with *more accurate* prompt adherence.
+
+All methods are **training-free** and rely only on modifying the text embeddings or attention maps.
+
+
+## Installation
+To install the `sd-webui-incantations` extension, follow these steps:
+
+0. **Ensure you have the latest Automatic1111 stable-diffusion-webui version ≥ 1.93 installed**
+
+1. **Open the "Extensions" tab and navigate to the "Install from URL" section**:
+
+2. **Paste the repository URL into the "URL for extension's git repository" field**:  
+    ```
+    https://github.com/v0xie/sd-webui-incantations.git
+    ```
+
+3. **Press the Install button**: Wait a few seconds for the extension to finish installing.
+
+4. **Restart the Web UI**:
+    Completely restart your Stable Diffusion Web UI to load the new extension.
+
+## Compatibility Notice
+* Incompatible with **stable-diffusion-webui-forge**: Use this extension with Forge: https://github.com/pamparamm/sd-perturbed-attention
 * Reported incompatible with Adetailer: https://github.com/v0xie/sd-webui-incantations/issues/21
+* Incompatible with some older webui versions: https://github.com/v0xie/sd-webui-incantations/issues/14
+* May conflict with other extensions which modify the CFGDenoiser
+
+## News
+- 15-05-2024 🔥 - S-CFG, optimizations for PAG and T2I-Zero, and more! https://github.com/v0xie/sd-webui-incantations/pull/37
+- 29-04-2024 🔥 - The implementation of T2I-Zero is fixed and works much more stably now.
+
+# Extension Features
+
+---
+## Semantic CFG (S-CFG)
+https://arxiv.org/abs/2404.05384  
+Dynamically rescale CFG guidance per semantic region to a uniform level to improve image / text alignment.  
+**Very computationally expensive**: A batch size of 4 with 1024x1024 will max out a 24GB 4090.
+
+#### Controls
+* **SCFG Scale**: Multiplies the correction by a constant factor. Default: 1.0.
+* **SCFG R**: A hyperparameter controlling the factor of cross-attention map refinement. Higher values use more memory and computation time. Default: 4.
+* **Rate Min**: The minimum rate that the CFG can be scaled by. Default: 0.8.
+* **Rate Max**: The maximum rate that the CFG can be scaled by. Default: 3.0.
+* **Clamp Rate**: Overrides Rate Max. Clamps the Max Rate to Clamp Rate / CFG. Default: 0.0.
+* **Start Step**: Start S-CFG on this step.
+* **End Step**: End S-CFG after this step.
+
+#### Results
+Prompt: "A cute puppy on the moon", Min Rate: 0.5, Max Rate: 10.0
+- SD 1.5  
+![image](./images/xyz_grid-0006-1-SCFG.jpg)
 
-* May conflict with extensions that modify the CFGDenoiser
+#### Also check out the paper authors' official project repository:
+- https://github.com/SmilesDZgk/S-CFG
+#### [Return to top](#sd-webui-incantations)
 
 ---
 ## Perturbed Attention Guidance
@@ -30,7 +95,10 @@ Prompt: "a puppy and a kitten on the moon"
 #### Also check out the paper authors' official project page:
 - https://ku-cvlab.github.io/Perturbed-Attention-Guidance/
 
+#### [Return to top](#sd-webui-incantations)
+
 ---
+
 ## CFG Interval / CFG Scheduler
 https://arxiv.org/abs/2404.07724 and https://arxiv.org/abs/2404.13040 
 
@@ -62,6 +130,8 @@ Prompt: "A pointillist painting of a raccoon looking at the sea."
 Prompt: "An epic lithograph of a handsome salaryman carefully pouring coffee from a cup into an overflowing carafe, 4K, directed by Wong Kar Wai"
 - SD XL  
 ![image](./images/xyz_grid-3380-1-An%20epic%20lithograph%20of%20a%20handsome%20salaryman%20carefully%20pouring%20coffee%20from%20a%20cup%20into%20an%20overflowing%20carafe,%204K,%20directed%20by%20Wong.jpg)
+
+#### [Return to top](#sd-webui-incantations)
 ---
 ## Multi-Concept T2I-Zero / Attention Regulation
 
@@ -98,6 +168,7 @@ SD XL
 - https://multi-concept-t2i-zero.github.io/ 
 - https://github.com/YaNgZhAnG-V5/attention_regulation
 
+#### [Return to top](#sd-webui-incantations)
 ---
 ### Seek for Incantations
 An incomplete implementation of a "prompt-upsampling" method from https://arxiv.org/abs/2401.06345  
@@ -121,6 +192,7 @@ SD XL
 * Modified Prompt: cinematic 4K photo of a dog riding a bus and eating cake and wearing headphones BREAK - - - - - dog - - bus - - - - - -
 ![image](./images/xyz_grid-2652-1419902843-cinematic%204K%20photo%20of%20a%20dog%20riding%20a%20bus%20and%20eating%20cake%20and%20wearing%20headphones.png)
 
+#### [Return to top](#sd-webui-incantations)
 ---
 
 ### Issues / Pull Requests are welcome!
@@ -132,6 +204,8 @@ SD XL
 
 [![image](https://cdn-uploads.huggingface.co/production/uploads/6345bd89fe134dfd7a0dba40/TzuZWTiHAc3wTxh3PwGL5.png)](https://youtu.be/lMQ7DIPmrfI)
 
+#### [Return to top](#sd-webui-incantations)
+
 ## Also check out:
 
 * **Characteristic Guidance**: Awesome enhancements for sampling at high CFG levels [https://github.com/scraed/CharacteristicGuidanceWebUI](https://github.com/scraed/CharacteristicGuidanceWebUI) 
@@ -144,6 +218,7 @@ SD XL
 
 * **Agent Attention**: Faster image generation and improved image quality with Agent Attention [https://github.com/v0xie/sd-webui-agentattention](https://github.com/v0xie/sd-webui-agentattention)
 
+#### [Return to top](#sd-webui-incantations)
 --- 
 
 ### Credits
@@ -203,9 +278,19 @@ SD XL
        primaryClass={cs.CV}
       }
 
+      @misc{shen2024rethinking,
+       title={Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance}, 
+       author={Dazhong Shen and Guanglu Song and Zeyue Xue and Fu-Yun Wang and Yu Liu},
+       year={2024},
+       eprint={2404.05384},
+       archivePrefix={arXiv},
+       primaryClass={cs.CV}
+      }
+
 
-- Hard Prompts Made Easy (https://github.com/YuxinWenRick/hard-prompts-made-easy)
+- [Hard Prompts Made Easy](https://github.com/YuxinWenRick/hard-prompts-made-easy)
+- [@udon-universe's extension templates](https://github.com/udon-universe/stable-diffusion-webui-extension-templates)
 
-- @udon-universe's extension templates (https://github.com/udon-universe/stable-diffusion-webui-extension-templates)
+#### [Return to top](#sd-webui-incantations)
 ---
 
diff --git a/images/xyz_grid-0006-1-SCFG.jpg b/images/xyz_grid-0006-1-SCFG.jpg
diff --git a/scripts/cfg_combiner.py b/scripts/cfg_combiner.py
@@ -0,0 +1,237 @@
+import gradio as gr
+import logging
+import torch
+from modules import shared, scripts, devices, patches, script_callbacks
+from modules.script_callbacks import CFGDenoiserParams
+from modules.processing import StableDiffusionProcessing
+from scripts.incantation_base import UIWrapper
+from scripts.scfg import scfg_combine_denoised
+
+logger = logging.getLogger(__name__)
+
+class CFGCombinerScript(UIWrapper):
+        """ Some scripts modify the CFGs in ways that are not compatible with each other.
+            This script will patch the CFG denoiser function to apply CFG in an ordered way.
+            This script adds a dict named 'incant_cfg_params' to the processing object.
+            This dict contains the following:
+                'denoiser': the denoiser object
+                'pag_params': list of PAG parameters
+                'scfg_params': the S-CFG parameters
+                ...
+        """
+        def __init__(self):
+                pass
+
+        # Extension title in menu UI
+        def title(self):
+                return "CFG Combiner"
+
+        # Decide to show menu in txt2img or img2img
+        def show(self, is_img2img):
+                return scripts.AlwaysVisible
+
+        # Setup menu ui detail
+        def setup_ui(self, is_img2img):
+            self.infotext_fields = []
+            self.paste_field_names = []
+            return []
+
+        def before_process(self, p: StableDiffusionProcessing, *args, **kwargs):
+            logger.debug("CFGCombinerScript before_process")
+            cfg_dict = {
+                "denoiser": None,
+                "pag_params": None,
+                "scfg_params": None
+            }
+            setattr(p, 'incant_cfg_params', cfg_dict)
+
+        def process(self, p: StableDiffusionProcessing, *args, **kwargs):
+            pass
+
+        def before_process_batch(self, p: StableDiffusionProcessing, *args, **kwargs):
+            pass
+
+        def process_batch(self, p: StableDiffusionProcessing, *args, **kwargs):
+            """ Process the batch and hook the CFG denoiser if PAG or S-CFG is active """
+            logger.debug("CFGCombinerScript process_batch")
+            pag_active = p.extra_generation_params.get('PAG Active', False)
+            cfg_active = p.extra_generation_params.get('CFG Interval Enable', False)
+            scfg_active = p.extra_generation_params.get('SCFG Active', False)
+
+            if not any([
+                        pag_active,
+                        cfg_active,
+                        scfg_active
+                    ]):
+                return
+
+            #logger.debug("CFGCombinerScript process_batch: pag_active or scfg_active")
+
+            cfg_denoise_lambda = lambda params: self.on_cfg_denoiser_callback(params, p.incant_cfg_params)
+            unhook_lambda = lambda: self.unhook_callbacks()
+
+            script_callbacks.on_cfg_denoiser(cfg_denoise_lambda)
+            script_callbacks.on_script_unloaded(unhook_lambda)
+            logger.debug('Hooked callbacks')
+
+        def postprocess_batch(self, p: StableDiffusionProcessing, *args, **kwargs):
+            logger.debug("CFGCombinerScript postprocess_batch")
+            script_callbacks.remove_current_script_callbacks()
+
+        def unhook_callbacks(self, cfg_dict = None):
+            if not cfg_dict:
+                    return
+            self.unpatch_cfg_denoiser(cfg_dict)
+
+        def on_cfg_denoiser_callback(self, params: CFGDenoiserParams, cfg_dict: dict):
+            """ Callback for when the CFG denoiser is called 
+            Patches the combine_denoised function with a custom one.
+            """
+            if cfg_dict['denoiser'] is None:
+                    cfg_dict['denoiser'] = params.denoiser
+            else:
+                    self.unpatch_cfg_denoiser(cfg_dict)
+            self.patch_cfg_denoiser(params.denoiser, cfg_dict)
+
+        def patch_cfg_denoiser(self, denoiser, cfg_dict: dict):
+            """ Patch the CFG Denoiser combine_denoised function """
+            if not cfg_dict:
+                    logger.error("Unable to patch CFG Denoiser, no dict passed as cfg_dict")
+                    return
+            if not denoiser:
+                    logger.error("Unable to patch CFG Denoiser, denoiser is None")
+                    return
+
+            if getattr(denoiser, 'combine_denoised_patched', False) is False:
+                    try:
+                            setattr(denoiser, 'combine_denoised_original', denoiser.combine_denoised)
+                            # create patch that references the original function
+                            pass_conds_func = lambda *args, **kwargs: combine_denoised_pass_conds_list(
+                                    *args,
+                                    **kwargs,
+                                    original_func = denoiser.combine_denoised_original,
+                                    pag_params = cfg_dict['pag_params'],
+                                    scfg_params = cfg_dict['scfg_params']
+                                )
+                            patched_combine_denoised = patches.patch(__name__, denoiser, "combine_denoised", pass_conds_func)
+                            setattr(denoiser, 'combine_denoised_patched', True)
+                            setattr(denoiser, 'combine_denoised_original', patches.original(__name__, denoiser, "combine_denoised"))
+                    except KeyError:
+                            logger.exception("KeyError patching combine_denoised")
+                            pass
+                    except RuntimeError:
+                            logger.exception("RuntimeError patching combine_denoised")
+                            pass
+
+        def unpatch_cfg_denoiser(self, cfg_dict = None):
+            """ Unpatch the CFG Denoiser combine_denoised function """
+            if cfg_dict is None:
+                    return
+            denoiser = cfg_dict.get('denoiser', None)
+            if denoiser is None:
+                    return
+
+            setattr(denoiser, 'combine_denoised_patched', False)
+            try:
+                    patches.undo(__name__, denoiser, "combine_denoised")
+            except KeyError:
+                    logger.exception("KeyError unhooking combine_denoised")
+                    pass
+            except RuntimeError:
+                    logger.exception("RuntimeError unhooking combine_denoised")
+                    pass
+
+            cfg_dict['denoiser'] = None
+
+
+def combine_denoised_pass_conds_list(*args, **kwargs):
+        """ Hijacked function for combine_denoised in CFGDenoiser 
+        Currently relies on the original function not having any kwargs
+        If any of the params are not None, it will apply the corresponding guidance
+        The order of guidance is:
+            1. CFG and S-CFG are combined multiplicatively
+            2. PAG guidance is added to the result
+            3. ...
+            ...
+        """
+        original_func = kwargs.get('original_func', None)
+        pag_params = kwargs.get('pag_params', None)
+        scfg_params = kwargs.get('scfg_params', None)
+
+        if pag_params is None and scfg_params is None:
+                logger.warning("No reason to hijack combine_denoised")
+                return original_func(*args)
+
+        def new_combine_denoised(x_out, conds_list, uncond, cond_scale):
+                denoised_uncond = x_out[-uncond.shape[0]:]
+                denoised = torch.clone(denoised_uncond)
+
+                ### Variables
+                # 0. Standard CFG Value
+                cfg_scale = cond_scale
+
+                # 1. CFG Interval
+                # Overrides cfg_scale if pag_params is not None
+                if pag_params is not None:
+                        if pag_params.cfg_interval_enable:
+                                cfg_scale = pag_params.cfg_interval_scheduled_value
+
+                # 2. PAG
+                pag_x_out = None
+                pag_scale = None
+                if pag_params is not None:
+                        pag_active = pag_params.pag_active
+                        pag_x_out = pag_params.pag_x_out
+                        pag_scale = pag_params.pag_scale
+
+                ### Combine Denoised
+                for i, conds in enumerate(conds_list):
+                        for cond_index, weight in conds:
+
+                                model_delta = x_out[cond_index] - denoised_uncond[i]
+
+                                # S-CFG
+                                rate = 1.0
+                                if scfg_params is not None:
+                                        rate = scfg_combine_denoised(
+                                                        model_delta = model_delta,
+                                                        cfg_scale = cfg_scale,
+                                                        scfg_params = scfg_params,
+                                        )
+                                        # If rate is not an int, convert to tensor
+                                        if rate is None:
+                                               logger.error("scfg_combine_denoised returned None, using default rate of 1.0")
+                                               rate = 1.0
+                                        elif not isinstance(rate, int) and not isinstance(rate, float):
+                                               rate = rate.to(device=shared.device, dtype=model_delta.dtype)
+                                        else:
+                                               # rate is tensor, probably
+                                               pass
+
+                                # 1. Experimental formulation for S-CFG combined with CFG
+                                denoised[i] += (model_delta) * rate * (weight * cfg_scale)
+                                del rate
+
+                                # 2. PAG
+                                # PAG is added like CFG
+                                if pag_params is not None:
+                                        if not pag_active:
+                                                pass
+                                        # Not within step interval? 
+                                        elif not pag_params.pag_start_step <= pag_params.step <= pag_params.pag_end_step:
+                                                pass
+                                        # Scale is zero?
+                                        elif pag_scale <= 0:
+                                                pass
+                                        # do pag
+                                        else:
+                                                try:
+                                                        denoised[i] += (x_out[cond_index] - pag_x_out[i]) * (weight * pag_scale)
+                                                except Exception as e:
+                                                        logger.exception("Exception in combine_denoised_pass_conds_list - %s", e)
+
+                                #torch.cuda.empty_cache()
+                                devices.torch_gc()
+
+                return denoised
+        return new_combine_denoised(*args)