Merge pull request #1787 from bmaltais/dev2

v22.3.1
bmaltais · Dec 21, 2023 · 41009ae · 41009ae
2 parents 5f9f891 + ce74aac
commit 41009ae
Show file tree

Hide file tree

Showing 22 changed files with 2,241 additions and 261 deletions.
diff --git a/.release b/.release
@@ -1 +1 @@
-v22.3.0
+v22.3.1
diff --git a/README.md b/README.md
@@ -202,8 +202,8 @@ This Colab notebook was not created or maintained by me; however, it appears to
 
 I would like to express my gratitude to camendutu for their valuable contribution. If you encounter any issues with the Colab notebook, please report them on their repository.
 
-| Colab                                                                                                                                                                          | Info                |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------- |
+| Colab                                                                                                                                                                          | Info               |
+| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------ |
 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/camenduru/kohya_ss-colab/blob/main/kohya_ss_colab.ipynb) | kohya_ss_gui_colab |
 
 ## Installation
@@ -651,6 +651,12 @@ masterpiece, best quality, 1boy, in business suit, standing at street, looking b
 
 
 ## Change History
+* 2023/12/20 (v22.3.1)
+- Add goto button to manual caption utility
+- Add missing options for various LyCORIS training algorythms
+- Refactor how feilds are shown or hidden
+- Made max value for network and convolution rank 512 except for LyCORIS/LoKr.
+
 * 2023/12/06 (v22.3.0)
 - Merge sd-scripts updates:
   - `finetune\tag_images_by_wd14_tagger.py` now supports the separator other than `,` with `--caption_separator` option. Thanks to KohakuBlueleaf! PR [#913](https://github.com/kohya-ss/sd-scripts/pull/913)
@@ -664,42 +670,4 @@ masterpiece, best quality, 1boy, in business suit, standing at street, looking b
     - `--ds_ratio` option denotes the ratio of the Deep Shrink. `0.5` means the half of the original latent size for the Deep Shrink.
     - `--dst1`, `--dst2`, `--dsd1`, `--dsd2` and `--dsr` prompt options are also available.
   - Add GLoRA support
-
-* 2023/12/03 (v22.2.2)
-- Update Lycoris module to 2.0.0 (https://github.com/KohakuBlueleaf/LyCORIS/blob/0006e2ffa05a48d8818112d9f70da74c0cd30b99/README.md)
-- Update Lycoris merge and extract tools
-- Remove anoying warning about local pip modules that is not necessary.
-- Adding support for LyCORIS presets
-- Adding Support for LyCORIS Native Fine-Tuning
-- Adding support for Lycoris Diag-OFT
-
-* 2023/11/20 (v22.2.1)
-- Fix issue with `Debiased Estimation loss` not getting properly loaded from json file. Oups.
-
-* 2023/11/15 (v22.2.0)
-- sd-scripts code base update:
-  - `sdxl_train.py` now supports different learning rates for each Text Encoder.
-    - Example:
-      - `--learning_rate 1e-6`: train U-Net only
-      - `--train_text_encoder --learning_rate 1e-6`: train U-Net and two Text Encoders with the same learning rate (same as the previous version)
-      - `--train_text_encoder --learning_rate 1e-6 --learning_rate_te1 1e-6 --learning_rate_te2 1e-6`: train U-Net and two Text Encoders with the different learning rates
-      - `--train_text_encoder --learning_rate 0 --learning_rate_te1 1e-6 --learning_rate_te2 1e-6`: train two Text Encoders only 
-      - `--train_text_encoder --learning_rate 1e-6 --learning_rate_te1 1e-6 --learning_rate_te2 0`: train U-Net and one Text Encoder only
-      - `--train_text_encoder --learning_rate 0 --learning_rate_te1 0 --learning_rate_te2 1e-6`: train one Text Encoder only
-
-  - `train_db.py` and `fine_tune.py` now support different learning rates for Text Encoder. Specify with `--learning_rate_te` option. 
-    - To train Text Encoder with `fine_tune.py`, specify `--train_text_encoder` option too. `train_db.py` trains Text Encoder by default.
-
-  - Fixed the bug that Text Encoder is not trained when block lr is specified in `sdxl_train.py`.
-
-  - Debiased Estimation loss is added to each training script. Thanks to sdbds!
-    - Specify `--debiased_estimation_loss` option to enable it. See PR [#889](https://github.com/kohya-ss/sd-scripts/pull/889) for details.
-  - Training of Text Encoder is improved in `train_network.py` and `sdxl_train_network.py`. Thanks to KohakuBlueleaf! PR [#895](https://github.com/kohya-ss/sd-scripts/pull/895)
-  - The moving average of the loss is now displayed in the progress bar in each training script. Thanks to shirayu! PR [#899](https://github.com/kohya-ss/sd-scripts/pull/899)
-  - PagedAdamW32bit optimizer is supported. Specify `--optimizer_type=PagedAdamW32bit`. Thanks to xzuyn! PR [#900](https://github.com/kohya-ss/sd-scripts/pull/900)
-  - Other bug fixes and improvements.
-- kohya_ss gui updates:
-  - Implement GUI support for SDXL finetune TE1 and TE2 training LR parameters and for non SDXL finetune TE training parameter
-  - Implement GUI support for Dreambooth TE LR parameter
-  - Implement Debiased Estimation loss at the botom of the Advanced Parameters tab.
-
+- 
diff --git a/gui.sh b/gui.sh
@@ -72,14 +72,8 @@ fi
 #Set OneAPI if it's not set by the user
 if [[ "$@" == *"--use-ipex"* ]]
 then
-    echo "Setting OneAPI environment"
-    if [ ! -x "$(command -v sycl-ls)" ]
-    then
-        if [[ -z "$ONEAPI_ROOT" ]]
-        then
-            ONEAPI_ROOT=/opt/intel/oneapi
-        fi
-        source $ONEAPI_ROOT/setvars.sh
+    if [ -d "$SCRIPT_DIR/venv" ]; then
+        export LD_LIBRARY_PATH=$(realpath "$SCRIPT_DIR/venv")/lib/:$LD_LIBRARY_PATH
     fi
     export NEOReadDebugKeys=1
     export ClDeviceGlobalMemSizeAvailablePercent=100

diff --git a/library/class_basic_training.py b/library/class_basic_training.py
@@ -115,9 +115,15 @@ def __init__(
                 interactive=True,
             )
         with gr.Row():
+            self.max_grad_norm = gr.Slider(
+                label="Max grad norm",
+                value=1.0,
+                minimum=0.0,
+                maximum=1.0
+            )
             self.lr_scheduler_args = gr.Textbox(
                 label="LR scheduler extra arguments",
-                placeholder='(Optional) eg: "lr_end=5e-5"',
+                placeholder='(Optional) eg: "milestones=[1,10,30,50]" "gamma=0.1"',
             )
             self.optimizer_args = gr.Textbox(
                 label="Optimizer extra arguments",

diff --git a/library/class_lora_tab.py b/library/class_lora_tab.py
@@ -4,6 +4,7 @@
 from library.verify_lora_gui import gradio_verify_lora_tab
 from library.resize_lora_gui import gradio_resize_lora_tab
 from library.extract_lora_gui import gradio_extract_lora_tab
+from library.convert_lcm_gui import gradio_convert_lcm_tab
 from library.extract_lycoris_locon_gui import gradio_extract_lycoris_locon_tab
 from library.extract_lora_from_dylora_gui import gradio_extract_dylora_tab
 from library.merge_lycoris_gui import gradio_merge_lycoris_tab
@@ -24,6 +25,7 @@ def __init__(self, folders='', headless: bool = False):
             'This section provide LoRA tools to help setup your dataset...'
         )
         gradio_extract_dylora_tab(headless=headless)
+        gradio_convert_lcm_tab(headless=headless)
         gradio_extract_lora_tab(headless=headless)
         gradio_extract_lycoris_locon_tab(headless=headless)
         gradio_merge_lora_tab = GradioMergeLoRaTab()

diff --git a/library/common_gui.py b/library/common_gui.py
@@ -710,6 +710,11 @@ def run_cmd_training(**kwargs):
     lr_scheduler_args = kwargs.get('lr_scheduler_args', '')
     if lr_scheduler_args != '':
         run_cmd += f' --lr_scheduler_args {lr_scheduler_args}'
+
+    max_grad_norm = kwargs.get('max_grad_norm', '')
+    if max_grad_norm != '':
+        run_cmd += f' --max_grad_norm="{max_grad_norm}"'
+
     return run_cmd
 
 

diff --git a/library/convert_lcm_gui.py b/library/convert_lcm_gui.py
@@ -0,0 +1,118 @@
+import gradio as gr
+import os
+import subprocess
+from .common_gui import (
+    get_saveasfilename_path,
+    get_file_path,
+)
+from library.custom_logging import setup_logging
+
+# Set up logging
+log = setup_logging()
+
+folder_symbol = "\U0001f4c2"  # 📂
+refresh_symbol = "\U0001f504"  # 🔄
+save_style_symbol = "\U0001f4be"  # 💾
+document_symbol = "\U0001F4C4"  # 📄
+
+PYTHON = "python3" if os.name == "posix" else "./venv/Scripts/python.exe"
+
+
+def convert_lcm(
+    name,
+    model_path,
+    lora_scale,
+    model_type
+):
+    run_cmd = f'{PYTHON} "{os.path.join("tools","lcm_convert.py")}"'
+    # Construct the command to run the script
+    run_cmd += f' --name "{name}"'
+    run_cmd += f' --model "{model_path}"'
+    run_cmd += f" --lora-scale {lora_scale}"
+
+    if model_type == "SDXL":
+        run_cmd += f" --sdxl"
+    if model_type == "SSD-1B":
+        run_cmd += f" --ssd-1b"
+
+    log.info(run_cmd)
+
+    # Run the command
+    if os.name == "posix":
+        os.system(run_cmd)
+    else:
+        subprocess.run(run_cmd)
+
+    # Return a success message
+    log.info("Done extracting...")
+
+
+def gradio_convert_lcm_tab(headless=False):
+    with gr.Tab("Convert to LCM"):
+        gr.Markdown("This utility convert a model to an LCM model.")
+        lora_ext = gr.Textbox(value="*.safetensors", visible=False)
+        lora_ext_name = gr.Textbox(value="LCM model types", visible=False)
+        model_ext = gr.Textbox(value="*.safetensors", visible=False)
+        model_ext_name = gr.Textbox(value="Model types", visible=False)
+
+        with gr.Row():
+            model_path = gr.Textbox(
+                label="Stable Diffusion model to convert to LCM",
+                interactive=True,
+            )
+            button_model_path_file = gr.Button(
+                folder_symbol,
+                elem_id="open_folder_small",
+                visible=(not headless),
+            )
+            button_model_path_file.click(
+                get_file_path,
+                inputs=[model_path, model_ext, model_ext_name],
+                outputs=model_path,
+                show_progress=False,
+            )
+
+            name = gr.Textbox(
+                label="Name of the new LCM model",
+                placeholder="Path to the LCM file to create",
+                interactive=True,
+            )
+            button_name = gr.Button(
+                folder_symbol,
+                elem_id="open_folder_small",
+                visible=(not headless),
+            )
+            button_name.click(
+                get_saveasfilename_path,
+                inputs=[name, lora_ext, lora_ext_name],
+                outputs=name,
+                show_progress=False,
+            )
+
+        with gr.Row():
+            lora_scale = gr.Slider(
+                label="Strength of the LCM",
+                minimum=0.0,
+                maximum=2.0,
+                step=0.1,
+                value=1.0,
+                interactive=True,
+            )
+        # with gr.Row():
+            # no_half = gr.Checkbox(label="Convert the new LCM model to FP32", value=False)
+            model_type = gr.Dropdown(
+                label="Model type", choices=["SD15", "SDXL", "SD-1B"], value="SD15"
+            )
+
+        extract_button = gr.Button("Extract LCM")
+
+        extract_button.click(
+            convert_lcm,
+            inputs=[
+                name,
+                model_path,
+                lora_scale,
+                model_type
+            ],
+            show_progress=False,
+        )
diff --git a/library/lpw_stable_diffusion.py b/library/lpw_stable_diffusion.py
@@ -9,7 +9,7 @@
 import PIL.Image
 import torch
 from packaging import version
-from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
+from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer, CLIPVisionModelWithProjection
 
 import diffusers
 from diffusers import SchedulerMixin, StableDiffusionPipeline
@@ -516,12 +516,13 @@ def __init__(
         tokenizer: CLIPTokenizer,
         unet: UNet2DConditionModel,
         scheduler: SchedulerMixin,
-        # clip_skip: int,
         safety_checker: StableDiffusionSafetyChecker,
         feature_extractor: CLIPFeatureExtractor,
+        image_encoder: CLIPVisionModelWithProjection = None,  # Incluindo o image_encoder
         requires_safety_checker: bool = True,
         clip_skip: int = 1,
     ):
+        self._clip_skip_internal = clip_skip
         super().__init__(
             vae=vae,
             text_encoder=text_encoder,
@@ -530,11 +531,25 @@ def __init__(
             scheduler=scheduler,
             safety_checker=safety_checker,
             feature_extractor=feature_extractor,
+            image_encoder=image_encoder,
             requires_safety_checker=requires_safety_checker,
         )
-        self.clip_skip = clip_skip
         self.__init__additional__()
 
+    @property
+    def clip_skip(self):
+        return self._clip_skip_internal
+
+    @clip_skip.setter
+    def clip_skip(self, value):
+        self._clip_skip_internal = value
+
+    def __setattr__(self, name: str, value):
+        if name == "clip_skip":
+            object.__setattr__(self, "_clip_skip_internal", value)
+        else:
+            super().__setattr__(name, value)
+
     # else:
     #     def __init__(
     #         self,