add consistory

Signed-off-by: Vladimir Mandic <[email protected]>
vladmandic · Nov 6, 2024 · f9076a2 · f9076a2
1 parent d95cfc9
commit f9076a2
Show file tree

Hide file tree

Showing 7 changed files with 299 additions and 436 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,6 +1,6 @@
 # Change Log for SD.Next
 
-## Update for 2024-11-05
+## Update for 2024-11-06
 
 Smaller release just few days after the last one, but with some important fixes and improvements.  
 This release can be considered an LTS release before we kick off the next round of major updates.  
@@ -9,55 +9,71 @@ This release can be considered an LTS release before we kick off the next round
   - add built-in [changelog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) search  
     since changelog is the best up-to-date source of info  
     go to system -> changelog and search/highligh/navigate directly in UI!  
-- [PuLID](https://github.com/ToTheBeginning/PuLID): Pure and Lightning ID Customization via Contrastive Alignment  
-  - advanced method of face transfer with better quality as well as control over identity and appearance  
-    try it out, likely the best quality available for sdxl models  
-  - select in *scripts -> pulid*  
-  - compatible with *sdxl*  
-  - can be used in xyz grid  
-- [InstantIR](https://github.com/instantX-research/InstantIR): Blind Image Restoration with Instant Generative Reference
-  - alternative to traditional `img2img` with more control over restoration process  
-  - select in *image -> scripts -> instantir*  
-  - compatible with *sdxl*  
-  - *note*: after used once it cannot be unloaded without reloading base model  
-- SD3: ControlNets:  
-  - *InstantX Canny, Pose, Depth, Tile*  
-  - *Alimama Inpainting, SoftEdge*  
-  - *note*: that just like with FLUX.1 or any large model, ControlNet are also large and can push your system over the limit  
-    e.g. SD3 controlnets vary from 1GB to over 4GB in size  
-- SD3: all-in-one safetensors  
-  - *examples*: [large](https://civitai.com/models/882666/sd35-large-google-flan?modelVersionId=1003031), [medium](https://civitai.com/models/900327)  
-  - *note*: enable *bnb* on-the-fly quantization for even bigger gains  
-- [MiaoshouAI PromptGen v2.0](https://huggingface.co/MiaoshouAI/Florence-2-base-PromptGen-v2.0) base and large:
-  - *in process -> visual query*
-  - caption modes:
-    `<GENERATE_TAGS>` generate tags  
-    `<CAPTION>`, `<DETAILED_CAPTION>`, `<MORE_DETAILED_CAPTION>` caption image  
-    `<ANALYZE>` image composition  
-    `<MIXED_CAPTION>`, `<MIXED_CAPTION_PLUS>` detailed caption and tags with optional analyze  
-- XYZ grid:  
-  - optional time benchmark info to individual images  
-  - optional add params to individual images  
-  - create video from generated grid images  
-    supports all standard video types and interpolation  
-- UI:  
-  - add additional [hotkeys](https://github.com/vladmandic/automatic/wiki/Hotkeys)  
-  - add show networks on startup setting  
-  - better mapping of networks previews  
-  - optimize networks display load  
-- Installer:
-  - Log `venv` and package search paths
-  - Auto-remove invalid packages from `venv/site-packages`  
-    e.g. packages starting with `~` which are left-over due to windows access violation  
-- CLI:  
-  - refactor command line params  
-    run `webui.sh`/`webui.bat` with `--help` to see all options  
-  - added `cli/model-metadata.py` to display metadata in any safetensors file
-  - added `cli/model-keys.py` to quicky display content of any safetensors file
+
+- Integrations:
+  - [PuLID](https://github.com/ToTheBeginning/PuLID): Pure and Lightning ID Customization via Contrastive Alignment  
+    - advanced method of face transfer with better quality as well as control over identity and appearance  
+      try it out, likely the best quality available for sdxl models  
+    - select in *scripts -> pulid*  
+    - compatible with *sdxl*  
+    - can be used in xyz grid  
+  - [InstantIR](https://github.com/instantX-research/InstantIR): Blind Image Restoration with Instant Generative Reference
+    - alternative to traditional `img2img` with more control over restoration process  
+    - select in *image -> scripts -> instantir*  
+    - compatible with *sdxl*  
+    - *note*: after used once it cannot be unloaded without reloading base model  
+  - [ConsiStory](https://github.com/NVlabs/consistory): Consistent Image Generation  
+    - create consistent anchor image and then generate images that are consistent with anchor  
+    - select in *scripts -> consistory*  
+    - compatible with *sdxl*
+    - *note*: very resource intensive and not compatible with model offloading  
+    - *note*: changing default parameters can lead to unexpected results and/or failures  
+    - *note*: after used once it cannot be unloaded without reloading base model  
+  - [MiaoshouAI PromptGen v2.0](https://huggingface.co/MiaoshouAI/Florence-2-base-PromptGen-v2.0) base and large:
+    - *in process -> visual query*
+    - caption modes:
+      `<GENERATE_TAGS>` generate tags  
+      `<CAPTION>`, `<DETAILED_CAPTION>`, `<MORE_DETAILED_CAPTION>` caption image  
+      `<ANALYZE>` image composition  
+      `<MIXED_CAPTION>`, `<MIXED_CAPTION_PLUS>` detailed caption and tags with optional analyze  
+
+- Model improvements:
+  - SD3: ControlNets:  
+    - *InstantX Canny, Pose, Depth, Tile*  
+    - *Alimama Inpainting, SoftEdge*  
+    - *note*: that just like with FLUX.1 or any large model, ControlNet are also large and can push your system over the limit  
+      e.g. SD3 controlnets vary from 1GB to over 4GB in size  
+  - SD3: all-in-one safetensors  
+    - *examples*: [large](https://civitai.com/models/882666/sd35-large-google-flan?modelVersionId=1003031), [medium](https://civitai.com/models/900327)  
+    - *note*: enable *bnb* on-the-fly quantization for even bigger gains  
+
+- Workflow improvements:
+  - XYZ grid:  
+    - optional time benchmark info to individual images  
+    - optional add params to individual images  
+    - create video from generated grid images  
+      supports all standard video types and interpolation  
+  - UI:  
+    - add additional [hotkeys](https://github.com/vladmandic/automatic/wiki/Hotkeys)  
+    - add show networks on startup setting  
+    - better mapping of networks previews  
+    - optimize networks display load  
 - Other:  
-  - Model loader: Report modules included in safetensors when attempting to load a model  
-  - Repo: move screenshots to GH pages  
-  - Requirements: update  
+  - Installer:
+    - Log `venv` and package search paths
+    - Auto-remove invalid packages from `venv/site-packages`  
+      e.g. packages starting with `~` which are left-over due to windows access violation  
+    - Requirements: update  
+  - Model loader:
+    - Report modules included in safetensors when attempting to load a model  
+  - CLI:  
+    - refactor command line params  
+      run `webui.sh`/`webui.bat` with `--help` to see all options  
+    - added `cli/model-metadata.py` to display metadata in any safetensors file
+    - added `cli/model-keys.py` to quicky display content of any safetensors file
+  - Internal:
+    - Repo: move screenshots to GH pages  
+
 - Fixes:  
   - custom watermark add alphablending  
   - detailer min/max size as fractions of image size  

diff --git a/modules/consistory/attention_processor.py b/modules/consistory/attention_processor.py
@@ -17,12 +17,11 @@
 # are not a contribution and subject to the license under the LICENSE file located at the root directory.
 
 
-from diffusers.utils import USE_PEFT_BACKEND
 from typing import Callable, Optional
 import torch
 import torch.nn.functional as F
+from diffusers.utils import USE_PEFT_BACKEND
 from diffusers.models.attention_processor import Attention
-
 from .consistory_utils import AnchorCache, FeatureInjector, QueryStore
 
 
@@ -235,7 +234,7 @@ def __call__(
         # dropout
         hidden_states = attn.to_out[1](hidden_states)
 
-        if (feature_injector is not None):
+        if feature_injector is not None:
             output_res = int(hidden_states.shape[1] ** 0.5)
 
             if anchors_cache and anchors_cache.is_inject_mode():
@@ -270,8 +269,7 @@ def register_extended_self_attn(unet, attnstore, extended_attn_kwargs):
                          'up_113': 64, 'up_115': 64, 'up_117': 64, 'up_119': 64}
     attn_procs = {}
     for i, name in enumerate(unet.attn_processors.keys()):
-        is_self_attn = (i % 2 == 0)
-
+        is_self_attn = i % 2 == 0
         if name.startswith("mid_block"):
             place_in_unet = f"mid_{i}"
         elif name.startswith("up_blocks"):
@@ -286,4 +284,4 @@ def register_extended_self_attn(unet, attnstore, extended_attn_kwargs):
         else:
             attn_procs[name] = ConsistoryAttnStoreProcessor(attnstore, place_in_unet)
 
-    unet.set_attn_processor(attn_procs)
+    unet.set_attn_processor(attn_procs)
diff --git a/modules/consistory/consistory_cache.py b/modules/consistory/consistory_cache.py