vladmandic · vladmandic · Aug 31, 2024 · Jul 10, 2024 · Jul 10, 2024 · Jul 10, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,39 +1,104 @@
 # Change Log for SD.Next
 
-## Update for 2024-07-09: WiP
+## Update for 2024-08-31
 
-### Pending
+### Highlights for 2024-08-31
 
-- Requires `diffusers==0.30.0`
-- [AuraFlow/LavenderFlow](https://github.com/huggingface/diffusers/pull/8796) (previously known as LavenderFlow)
-- [Kolors](https://github.com/huggingface/diffusers/pull/8812)
-- [ControlNet Union](https://huggingface.co/xinsir/controlnet-union-sdxl-1.0) pipeline
-- FlowMatchHeunDiscreteScheduler enable
+Summer break is over and we are back with a massive update!  
 
-### Highlights
+Support for all of the new models:  
+- [Black Forest Labs FLUX.1](https://blackforestlabs.ai/announcing-black-forest-labs/)  
+- [AuraFlow 0.3](https://huggingface.co/fal/AuraFlow)  
+- [AlphaVLLM Lumina-Next-SFT](https://huggingface.co/Alpha-VLLM/Lumina-Next-SFT-diffusers)  
+- [Kwai Kolors](https://huggingface.co/Kwai-Kolors/Kolors)  
+- [HunyuanDiT 1.2](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers)  
+
+What else? Just a bit... ;)  
+
+New **fast-install** mode, new **Optimum Quanto** and **BitsAndBytes** based quantization modes, new **balanced offload** mode that dynamically offloads GPU<->CPU as needed, and more...  
+And from previous service-pack: new **ControlNet-Union** *all-in-one* model, support for **DoRA** networks, additional **VLM** models, new **AuraSR** upscaler  
+
+**Breaking Changes...**
+
+Due to internal changes, you'll need to reset your **attention** and **offload** settings!  
+But...For a good reason, new *balanced offload* is magic when it comes to memory utilization while sacrificing minimal performance!
 
-Massive update to WiKi with over 20 new pages and articles, now includes guides for nearly all major features
-Support for new models:
-- [AlphaVLLM Lumina-Next-SFT](https://huggingface.co/Alpha-VLLM/Lumina-Next-SFT-diffusers)
-- [Kwai Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
-- [HunyuanDiT 1.2](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers)
+### Details for 2024-08-31
 
-What else? Just a bit... ;)
-New **fast-install** mode, new **controlnet-union** *all-in-one* model, support for **DoRA** networks, additional **VLM** models, new **AuraSR** upscaler, and more...
+**New Models...**
 
-### New Models
+To use and of the new models, simply select model from *Networks -> Reference* and it will be auto-downloaded on first use  
 
+- [Black Forest Labs FLUX.1](https://blackforestlabs.ai/announcing-black-forest-labs/)  
+  FLUX.1 models are based on a hybrid architecture of multimodal and parallel diffusion transformer blocks, scaled to 12B parameters and builing on flow matching  
+  This is a very large model at ~32GB in size, its recommended to use a) offloading, b) quantization  
+  For more information on variations, requirements, options, and how to donwload and use FLUX.1, see [Wiki](https://github.com/vladmandic/automatic/wiki/FLUX)  
+  SD.Next supports:  
+  - [FLUX.1 Dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) and [FLUX.1 Schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) original variations  
+  - additional [qint8](https://huggingface.co/Disty0/FLUX.1-dev-qint8) and [qint4](https://huggingface.co/Disty0/FLUX.1-dev-qint4) quantized variations  
+  - additional [nf4](https://huggingface.co/sayakpaul/flux.1-dev-nf4) quantized variation  
+- [AuraFlow](https://huggingface.co/fal/AuraFlow)  
+  AuraFlow v0.3 is the fully open-sourced largest flow-based text-to-image generation model  
+  This is a very large model at 6.8B params and nearly 31GB in size, smaller variants are expected in the future  
+  Use scheduler: Default or Euler FlowMatch or Heun FlowMatch  
 - [AlphaVLLM Lumina-Next-SFT](https://huggingface.co/Alpha-VLLM/Lumina-Next-SFT-diffusers)  
-  to use, simply select from *networks -> reference
-  use scheduler: default or euler flowmatch or heun flowmatch  
-  note: this model uses T5 XXL variation of text encoder  
-  (previous version of Lumina used Gemma 2B as text encoder)  
-- [Kwai Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
-  to use, simply select from *networks -> reference  
-  note: this is an SDXL style model that replaces standard CLiP-L and CLiP-G text encoders with a massive `chatglm3-6b` encoder  
-  however, this new encoder does support both English and Chinese prompting  
-- [HunyuanDiT 1.2](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers)
-  to use, simply select from *networks -> reference
+  Lumina-Next-SFT is a Next-DiT model containing 2B parameters, enhanced through high-quality supervised fine-tuning (SFT)  
+  This model uses T5 XXL variation of text encoder (previous version of Lumina used Gemma 2B as text encoder)  
+  Use scheduler: Default or Euler FlowMatch or Heun FlowMatch  
+- [Kwai Kolors](https://huggingface.co/Kwai-Kolors/Kolors)  
+  Kolors is a large-scale text-to-image generation model based on latent diffusion  
+  This is an SDXL style model that replaces standard CLiP-L and CLiP-G text encoders with a massive `chatglm3-6b` encoder supporting both English and Chinese prompting  
+- [HunyuanDiT 1.2](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers)  
+  Hunyuan-DiT is a powerful multi-resolution diffusion transformer (DiT) with fine-grained Chinese understanding  
+- [AnimateDiff](https://github.com/guoyww/animatediff/)  
+  support for additional models: **SD 1.5 v3** (Sparse), **SD Lightning** (4-step), **SDXL Beta**  
+
+**New Features...**
+
+- support for **Balanced Offload**, thanks @Disty0!  
+  balanced offload will dynamically split and offload models from the GPU based on the max configured GPU and CPU memory size  
+  model parts that dont fit in the GPU will be dynamically sliced and offloaded to the CPU  
+  see *Settings -> Diffusers Settings -> Max GPU memory and Max CPU memory*  
+  *note*: recommended value for max GPU memory is ~80% of your total GPU memory  
+  *note*: balanced offload will force loading LoRA with Diffusers method  
+  *note*: balanced offload is not compatible with Optimum Quanto  
+- support for **Optimum Quanto** with 8 bit and 4 bit quantization options, thanks @Disty0 and @Trojaner!  
+  to use, go to Settings -> Compute Settings and enable "Quantize Model weights with Optimum Quanto" option  
+  *note*: Optimum Quanto requires PyTorch 2.4  
+- new prompt attention mode: **xhinker** which brings support for prompt attention to new models such as FLUX.1 and SD3  
+  to use, enable in *Settings -> Execution -> Prompt attention*
+- use [PEFT](https://huggingface.co/docs/peft/main/en/index) for **LoRA** handling on all models other than SD15/SD21/SDXL  
+  this improves LoRA compatibility for SC, SD3, AuraFlow, Flux, etc.  
+
+**Changes & Fixes...**
+
+- default resolution bumped from 512x512 to 1024x1024, time to move on ;)
+- convert **Dynamic Attention SDP** into a global SDP option, thanks @Disty0!  
+  *note*: requires reset of selected attention option
+- update default **CUDA** version from 12.1 to 12.4
+- update `requirements`
+- samplers now prefers the model defaults over the diffusers defaults, thanks @Disty0!  
+- improve xyz grid for lora handling and add lora strength option  
+- don't enable Dynamic Attention by default on platforms that support Flash Attention, thanks @Disty0!  
+- convert offload options into a single choice list, thanks @Disty0!  
+  *note*: requires reset of selected offload option  
+- control module allows reszing of indivudual process override images to match input image  
+  for example: set size->before->method:nearest, mode:fixed or mode:fill  
+- control tab includes superset of txt and img scripts
+- automatically offload disabled controlnet units  
+- prioritize specified backend if `--use-*` option is used, thanks @lshqqytiger
+- ipadapter option to auto-crop input images to faces to improve efficiency of face-transfter ipadapters  
+- update **IPEX** to 2.1.40+xpu on Linux, thanks @Disty0!  
+- general **ROCm** fixes, thanks @lshqqytiger!  
+- support for HIP SDK 6.1 on ZLUDA backend, thanks @lshqqytiger!
+- fix full vae previews, thanks @Disty0!  
+- fix default scheduler not being applied, thanks @Disty0!  
+- fix Stable Cascade with custom schedulers, thanks @Disty0!  
+- fix LoRA apply with force-diffusers
+- fix LoRA scales with force-diffusers
+- fix control API
+- fix VAE load refrerencing incorrect configuration
+- fix NVML gpu monitoring
 
 ## Update for 2024-07-08
 
@@ -57,13 +122,13 @@ This release is primary service release with cumulative fixes and several improv
 **And fixes...**
 - enable **Florence VLM**  for all platforms, thanks @lshqqytiger!  
 - improve ROCm detection under WSL2, thanks @lshqqytiger!  
-- add SD3 with FP16 T5 to list of detected models
+- add SD3 with FP16 T5 to list of detected models  
 - fix executing extensions with zero params  
-- add support for embeddings bundled in LoRA, thanks @AI-Casanova!
+- add support for embeddings bundled in LoRA, thanks @AI-Casanova!  
 - fix executing extensions with zero params  
-- fix nncf for lora, thanks @Disty0!
-- fix diffusers version detection for SD3
-- fix current step for higher order samplers
+- fix nncf for lora, thanks @Disty0!  
+- fix diffusers version detection for SD3  
+- fix current step for higher order samplers  
 - fix control input type video  
 - fix reset pipeline at the end of each iteration  
 - fix faceswap when no faces detected  

diff --git a/README.md b/README.md
@@ -31,7 +31,7 @@ All individual features are not listed here, instead check [ChangeLog](CHANGELOG
 - Multiple UIs!  
   ▹ **Standard | Modern**  
 - Multiple diffusion models!  
-  ▹ **Stable Diffusion 1.5/2.1/XL/3.0 | LCM | Lightning | Segmind | Kandinsky | Pixart-α | Pixart-Σ | Stable Cascade | Würstchen | aMUSEd | DeepFloyd IF | UniDiffusion | SD-Distilled | BLiP Diffusion | KOALA | SDXS | Hyper-SD | HunyuanDiT | etc.**
+  ▹ **Stable Diffusion 1.5/2.1/XL/3.0 | LCM | Lightning | Segmind | Kandinsky | Pixart-α | Pixart-Σ | Stable Cascade | FLUX.1 | AuraFlow | Würstchen | Lumina | Kolors | aMUSEd | DeepFloyd IF | UniDiffusion | SD-Distilled | BLiP Diffusion | KOALA | SDXS | Hyper-SD | HunyuanDiT | etc.**
 - Built-in Control for Text, Image, Batch and video processing!  
   ▹ **ControlNet | ControlNet XS | Control LLLite | T2I Adapters | IP Adapters**  
 - Multiplatform!  
@@ -53,6 +53,7 @@ All individual features are not listed here, instead check [ChangeLog](CHANGELOG
 ![Screenshot-Dark](html/screenshot-text2image.jpg)
 
 *Main interface using **ModernUI***:  
+![Screenshot-Dark](html/screenshot-modernui-f1.jpg)
 ![Screenshot-Dark](html/screenshot-modernui.jpg)
 ![Screenshot-Dark](html/screenshot-modernui-sd3.jpg)
 
@@ -69,6 +70,10 @@ Additional models will be added as they become available and there is public int
 - [StabilityAI Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium)
 - [StabilityAI Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid) Base, XT 1.0, XT 1.1
 - [LCM: Latent Consistency Models](https://github.com/openai/consistency_models)
+- [Black Forest Labs FLUX.1](https://blackforestlabs.ai/announcing-black-forest-labs/) Dev, Schnell  
+- [AuraFlow](https://huggingface.co/fal/AuraFlow)
+- [AlphaVLLM Lumina-Next-SFT](https://huggingface.co/Alpha-VLLM/Lumina-Next-SFT-diffusers)  
+- [Kwai Kolors](https://huggingface.co/Kwai-Kolors/Kolors)  
 - [Playground](https://huggingface.co/playgroundai/playground-v2-256px-base) *v1, v2 256, v2 512, v2 1024 and latest v2.5*
 - [Stable Cascade](https://github.com/Stability-AI/StableCascade) *Full* and *Lite*
 - [aMUSEd 256](https://huggingface.co/amused/amused-256) 256 and 512

diff --git a/TODO.md b/TODO.md
@@ -4,20 +4,14 @@ Main ToDo list can be found at [GitHub projects](https://github.com/users/vladma
 
 ## Future Candidates
 
-- animatediff-sdxl <https://github.com/huggingface/diffusers/pull/6721>
+- cogvideo-x: <https://huggingface.co/THUDM/CogVideoX-5b>
+- animatediff prompt-travel: <https://github.com/huggingface/diffusers/pull/9231>
 - async lowvram: <https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14855>
 - fp8: <https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14031>
+- ipadapter-negative: https://github.com/huggingface/diffusers/discussions/7167
+- hd-painter: https://github.com/huggingface/diffusers/blob/main/examples/community/README.md#hd-painter
 - init latents: variations, img2img
-- diffusers public callbacks  
 - include reference styles
-- lora: sc lora, etc
-
-## Experimental
-
-- [SDXL Flash Mini](https://huggingface.co/sd-community/sdxl-flash-mini)  
-  SDXL type that weighs less, consumes less video memory, and the quality has not dropped much  
-  to use, simply select from *networks -> models -> reference -> SDXL Flash Mini*  
-  recommended parameters: steps: 6-9, cfg scale: 2.5-3.5, sampler: DPM++ SDE  
 
 ### Missing
 

diff --git a/cli/api-control.py b/cli/api-control.py
@@ -132,7 +132,7 @@ def get_image(encoded, output):
 
 
 if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description = 'api-img2img')
+    parser = argparse.ArgumentParser(description = 'api-control')
     parser.add_argument('--init', required=False, default=None, help='init image')
     parser.add_argument('--input', required=False, default=None, help='input image')
     parser.add_argument('--mask', required=False, help='mask image')
@@ -148,5 +148,5 @@ def get_image(encoded, output):
     parser.add_argument('--control', required=False, help='control units')
     parser.add_argument('--ipadapter', required=False, help='ipadapter units')
     args = parser.parse_args()
-    log.info(f'img2img: {args}')
+    log.info(f'api-control: {args}')
     generate(args)
diff --git a/cli/api-faceid.py b/cli/api-faceid.py
@@ -95,7 +95,7 @@ def generate(args): # pylint: disable=redefined-outer-name
     parser.add_argument('--output', required=False, default=None, help='output image file')
     parser.add_argument('--model', required=False, help='model name')
     args = parser.parse_args()
-    log.info(f'img2img: {args}')
+    log.info(f'api-faceid: {args}')
     generate(args)
 
 """

diff --git a/cli/api-faces.py b/cli/api-faces.py
@@ -0,0 +1,59 @@
+#!/usr/bin/env python
+import os
+import io
+import base64
+import logging
+import argparse
+import requests
+import urllib3
+from PIL import Image
+
+sd_url = os.environ.get('SDAPI_URL', "http://127.0.0.1:7860")
+sd_username = os.environ.get('SDAPI_USR', None)
+sd_password = os.environ.get('SDAPI_PWD', None)
+
+logging.basicConfig(level = logging.INFO, format = '%(asctime)s %(levelname)s: %(message)s')
+log = logging.getLogger(__name__)
+urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
+
+
+def auth():
+    if sd_username is not None and sd_password is not None:
+        return requests.auth.HTTPBasicAuth(sd_username, sd_password)
+    return None
+
+
+def post(endpoint: str, dct: dict = None):
+    req = requests.post(f'{sd_url}{endpoint}', json = dct, timeout=300, verify=False, auth=auth())
+    if req.status_code != 200:
+        return { 'error': req.status_code, 'reason': req.reason, 'url': req.url }
+    else:
+        return req.json()
+
+
+def encode(f):
+    image = Image.open(f)
+    if image.mode == 'RGBA':
+        image = image.convert('RGB')
+    with io.BytesIO() as stream:
+        image.save(stream, 'JPEG')
+        image.close()
+        values = stream.getvalue()
+        encoded = base64.b64encode(values).decode()
+        return encoded
+
+
+def detect(args): # pylint: disable=redefined-outer-name
+    data = post('/sdapi/v1/faces', { 'image': encode(args.image) })
+    for face in zip(data['images'], data['scores']):
+        log.info(f'Face: score={face[1]}')
+        image = Image.open(io.BytesIO(base64.b64decode(face[0])))
+        image.save(f'/tmp/face_{face[1]}.jpg')
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description = 'api-faces')
+    parser.add_argument('--image', required=True, help='input image')
+    args = parser.parse_args()
+    log.info(f'api-faces: {args}')
+    detect(args)
diff --git a/cli/api-img2img.py b/cli/api-img2img.py
@@ -94,5 +94,5 @@ def generate(args): # pylint: disable=redefined-outer-name
     parser.add_argument('--output', required=False, default=None, help='output image file')
     parser.add_argument('--model', required=False, help='model name')
     args = parser.parse_args()
-    log.info(f'img2img: {args}')
+    log.info(f'api-img2img: {args}')
     generate(args)
diff --git a/cli/api-info.py b/cli/api-info.py
@@ -53,5 +53,5 @@ def info(args): # pylint: disable=redefined-outer-name
     parser = argparse.ArgumentParser(description = 'api-info')
     parser.add_argument('--input', required=True, help='input image')
     args = parser.parse_args()
-    log.info(f'info: {args}')
+    log.info(f'api-info: {args}')
     info(args)
diff --git a/cli/api-json.py b/cli/api-json.py
@@ -38,7 +38,7 @@ def post(endpoint: str, payload: dict = None):
 
 
 if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description = 'api-txt2img')
+    parser = argparse.ArgumentParser(description = 'api-json')
     parser.add_argument('endpoint', nargs=1, help='endpoint')
     parser.add_argument('json', nargs=1, help='json data or file')
     args = parser.parse_args()

diff --git a/cli/api-mask.py b/cli/api-mask.py
@@ -79,5 +79,5 @@ def info(args): # pylint: disable=redefined-outer-name
     parser.add_argument('--type', required=False, help='output mask type')
     parser.add_argument('--output', required=False, help='output image')
     args = parser.parse_args()
-    log.info(f'info: {args}')
+    log.info(f'api-mask: {args}')
     info(args)
diff --git a/cli/api-preprocess.py b/cli/api-preprocess.py
@@ -72,5 +72,5 @@ def info(args): # pylint: disable=redefined-outer-name
     parser.add_argument('--model', required=True, help='preprocessing model')
     parser.add_argument('--output', required=False, help='output image')
     args = parser.parse_args()
-    log.info(f'info: {args}')
+    log.info(f'api-preprocess: {args}')
     info(args)
diff --git a/cli/api-txt2img.py b/cli/api-txt2img.py
@@ -80,5 +80,5 @@ def generate(args): # pylint: disable=redefined-outer-name
     parser.add_argument('--output', required=False, default=None, help='output image file')
     parser.add_argument('--model', required=False, help='model name')
     args = parser.parse_args()
-    log.info(f'txt2img: {args}')
+    log.info(f'api-txt2img: {args}')
     generate(args)
diff --git a/cli/api-upscale.py b/cli/api-upscale.py
@@ -86,5 +86,5 @@ def upscale(args): # pylint: disable=redefined-outer-name
     parser.add_argument('--upscaler', required=False, default='Nearest', help='upscaler name')
     parser.add_argument('--scale', required=False, default=2, help='upscaler scale')
     args = parser.parse_args()
-    log.info(f'upscale: {args}')
+    log.info(f'api-upscale: {args}')
     upscale(args)
diff --git a/cli/api-vqa.py b/cli/api-vqa.py
@@ -60,5 +60,5 @@ def info(args): # pylint: disable=redefined-outer-name
     parser.add_argument('--model', required=False, help='vqa model')
     parser.add_argument('--question', required=False, help='question')
     args = parser.parse_args()
-    log.info(f'info: {args}')
+    log.info(f'api-vqa: {args}')
     info(args)
diff --git a/cli/hf-search.py b/cli/hf-search.py
@@ -14,5 +14,5 @@
         library=['diffusers'],
     )
     res = hf_api.list_models(filter=model_filter, full=True, limit=50, sort="downloads", direction=-1)
-    models = [{ 'name': m.modelId, 'downloads': m.downloads, 'mtime': m.lastModified, 'url': f'https://huggingface.co/{m.modelId}', 'pipeline': m.pipeline_tag, 'tags': m.tags } for m in res]
+    models = [{ 'name': m.id, 'downloads': m.downloads, 'mtime': m.lastModified, 'url': f'https://huggingface.co/{m.id}', 'pipeline': m.pipeline_tag, 'tags': m.tags } for m in res]
     print(models)
diff --git a/configs/flux/model_index.json b/configs/flux/model_index.json
@@ -0,0 +1,32 @@
+{
+  "_class_name": "FluxPipeline",
+  "_diffusers_version": "0.30.0.dev0",
+  "scheduler": [
+    "diffusers",
+    "FlowMatchEulerDiscreteScheduler"
+  ],
+  "text_encoder": [
+    "transformers",
+    "CLIPTextModel"
+  ],
+  "text_encoder_2": [
+    "transformers",
+    "T5EncoderModel"
+  ],
+  "tokenizer": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "tokenizer_2": [
+    "transformers",
+    "T5TokenizerFast"
+  ],
+  "transformer": [
+    "diffusers",
+    "FluxTransformer2DModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}
diff --git a/configs/flux/scheduler/scheduler_config.json b/configs/flux/scheduler/scheduler_config.json
@@ -0,0 +1,11 @@
+{
+  "_class_name": "FlowMatchEulerDiscreteScheduler",
+  "_diffusers_version": "0.30.0.dev0",
+  "base_image_seq_len": 256,
+  "base_shift": 0.5,
+  "max_image_seq_len": 4096,
+  "max_shift": 1.15,
+  "num_train_timesteps": 1000,
+  "shift": 1.0,
+  "use_dynamic_shifting": false
+}