Merge branch 'dev' into 173-error-on-train-typeerror-int-argument-mus…

…t-be-a-string-a-bytes-like-object-or-a-real-number-not-nonetype
bmaltais · Aug 2, 2024 · 9a818cf · 9a818cf
2 parents 3121d5e + 092138b
commit 9a818cf
Show file tree

Hide file tree

Showing 26 changed files with 526 additions and 79 deletions.
diff --git a/.github/workflows/docker_publish.yml b/.github/workflows/docker_publish.yml
@@ -71,7 +71,7 @@ jobs:
           password: ${{ secrets.GITHUB_TOKEN }}
 
       - name: Build and push
-        uses: docker/build-push-action@v5
+        uses: docker/build-push-action@v6
         id: publish
         with:
           context: .

diff --git a/.release b/.release
@@ -1 +1 @@
-v24.1.6
+v24.2.0
diff --git a/README.md b/README.md
@@ -46,6 +46,12 @@ The GUI allows you to set the training parameters and generate and run the requi
       - [Potential Solutions](#potential-solutions)
   - [SDXL training](#sdxl-training)
   - [Masked loss](#masked-loss)
+  - [Guides](#guides)
+    - [Using Accelerate Lora Tab to Select GPU ID](#using-accelerate-lora-tab-to-select-gpu-id)
+      - [Starting Accelerate in GUI](#starting-accelerate-in-gui)
+      - [Running Multiple Instances (linux)](#running-multiple-instances-linux)
+      - [Monitoring Processes](#monitoring-processes)
+  - [Interesting Forks](#interesting-forks)
   - [Change History](#change-history)
 
 ## 🦒 Colab
@@ -438,6 +444,35 @@ The feature is not fully tested, so there may be bugs. If you find any issues, p
 
 ControlNet dataset is used to specify the mask. The mask images should be the RGB images. The pixel value 255 in R channel is treated as the mask (the loss is calculated only for the pixels with the mask), and 0 is treated as the non-mask. The pixel values 0-255 are converted to 0-1 (i.e., the pixel value 128 is treated as the half weight of the loss). See details for the dataset specification in the [LLLite documentation](./docs/train_lllite_README.md#preparing-the-dataset).
 
+## Guides
+
+The following are guides extracted from issues discussions
+
+### Using Accelerate Lora Tab to Select GPU ID
+
+#### Starting Accelerate in GUI
+
+- Open the kohya GUI on your desired port.
+- Open the `Accelerate launch` tab
+- Ensure the Multi-GPU checkbox is unchecked.
+- Set GPU IDs to the desired GPU (like 1).
+
+#### Running Multiple Instances (linux)
+
+- For tracking multiple processes, use separate kohya GUI instances on different ports (e.g., 7860, 7861).
+- Start instances using `nohup ./gui.sh --listen 0.0.0.0 --server_port <port> --headless > log.log 2>&1 &`.
+
+#### Monitoring Processes
+
+- Open each GUI in a separate browser tab.
+- For terminal access, use SSH and tools like `tmux` or `screen`.
+
+For more details, visit the [GitHub issue](https://github.com/bmaltais/kohya_ss/issues/2577).
+
+## Interesting Forks
+
+To finetune HunyuanDiT models or create LoRAs, visit this [fork](https://github.com/Tencent/HunyuanDiT/tree/main/kohya_ss-hydit)
+
 ## Change History
 
 See release information.
diff --git a/_typos.toml b/_typos.toml
@@ -9,6 +9,7 @@ parms="parms"
 nin="nin"
 extention="extention" # Intentionally left
 nd="nd"
+pn="pn"
 shs="shs"
 sts="sts"
 scs="scs"

diff --git a/config example.toml b/config example.toml
@@ -48,6 +48,7 @@ learning_rate_te1 = 0.0001     # Learning rate text encoder 1
 learning_rate_te2 = 0.0001     # Learning rate text encoder 2
 lr_scheduler = "cosine"        # LR Scheduler
 lr_scheduler_args = ""         # LR Scheduler args
+lr_scheduler_type = ""         # LR Scheduler type
 lr_warmup = 0                  # LR Warmup (% of total steps)
 lr_scheduler_num_cycles = 1    # LR Scheduler num cycles
 lr_scheduler_power = 1.0       # LR Scheduler power
@@ -150,6 +151,9 @@ sample_prompts = ""        # Sample prompts
 sample_sampler = "euler_a" # Sampler to use for image sampling
 
 [sdxl]
+disable_mmap_load_safetensors = false   # Disable mmap load safe tensors
+fused_backward_pass = false             # Fused backward pass
+fused_optimizer_groups = 0              # Fused optimizer groups
 sdxl_cache_text_encoder_outputs = false # Cache text encoder outputs
 sdxl_no_half_vae = true                 # No half VAE
 

diff --git a/docker-compose.yaml b/docker-compose.yaml
@@ -25,6 +25,7 @@ services:
       - ./dataset/logs:/app/logs
       - ./dataset/outputs:/app/outputs
       - ./dataset/regularization:/app/regularization
+      - ./models:/app/models
       - ./.cache/config:/app/config
       - ./.cache/user:/home/1000/.cache
       - ./.cache/triton:/home/1000/.triton

diff --git a/examples/pull kohya_ss sd-scripts updates in.md b/examples/pull kohya_ss sd-scripts updates in.md
@@ -1,32 +1,27 @@
-## Updating a Local Branch with the Latest sd-scripts Changes
+## Updating a Local Submodule with the Latest sd-scripts Changes
 
 To update your local branch with the most recent changes from kohya/sd-scripts, follow these steps:
 
-1. Add sd-scripts as an alternative remote by executing the following command:
+1. When you wish to perform an update of the dev branch, execute the following commands:
 
-   ```
-   git remote add sd-scripts https://github.com/kohya-ss/sd-scripts.git
-   ```
-
-2. When you wish to perform an update, execute the following commands:
-
-   ```
-   git checkout dev
-   git pull sd-scripts main
-   ```
-
-   Alternatively, if you want to obtain the latest code, even if it may be unstable:
-
-   ```
+   ```bash
+   cd sd-scripts
+   git fetch
    git checkout dev
-   git pull sd-scripts dev
+   git pull origin dev
+   cd ..
+   git add sd-scripts
+   git commit -m "Update sd-scripts submodule to the latest on dev"
    ```
 
-3. If you encounter a conflict with the Readme file, you can resolve it by taking the following steps:
+   Alternatively, if you want to obtain the latest code from main:
 
+   ```bash
+   cd sd-scripts
+   git fetch
+   git checkout main
+   git pull origin main
+   cd ..
+   git add sd-scripts
+   git commit -m "Update sd-scripts submodule to the latest on main"
    ```
-   git add README.md
-   git merge --continue
-   ```
-
-   This may open a text editor for a commit message, but you can simply save and close it to proceed. Following these steps should resolve the conflict. If you encounter additional merge conflicts, consider them as valuable learning opportunities for personal growth.
diff --git a/gui.bat b/gui.bat
@@ -9,10 +9,15 @@ call .\venv\Scripts\deactivate.bat
 call .\venv\Scripts\activate.bat
 set PATH=%PATH%;%~dp0venv\Lib\site-packages\torch\lib
 
+:: If the first argument is --help, skip the validation step
+if "%~1" equ "--help" goto :skip_validation
+
 :: Validate requirements
 python.exe .\setup\validate_requirements.py
 if %errorlevel% neq 0 exit /b %errorlevel%
 
+:skip_validation
+
 :: If the exit code is 0, run the kohya_gui.py script with the command-line arguments
 if %errorlevel% equ 0 (
     REM Check if the batch was started via double-click

diff --git a/gui.ps1 b/gui.ps1
@@ -12,6 +12,13 @@ $env:PATH += ";$($MyInvocation.MyCommand.Path)\venv\Lib\site-packages\torch\lib"
 # Debug info about system
 # python.exe .\setup\debug_info.py
 
+# If the --help parameter is passed, skip the validation step
+if ($args -contains "--help") {
+    # Run the kohya_gui.py script with the command-line arguments
+    python.exe kohya_gui.py $args
+    exit 0
+}
+
 # Validate the requirements and store the exit code
 python.exe .\setup\validate_requirements.py
 

diff --git a/kohya_gui.py b/kohya_gui.py
@@ -106,6 +106,7 @@ def UI(**kwargs):
     do_not_share = kwargs.get("do_not_share", False)
     server_name = kwargs.get("listen")
     root_path = kwargs.get("root_path", None)
+    debug = kwargs.get("debug", False)
 
     launch_kwargs["server_name"] = server_name
     if username and password:
@@ -121,7 +122,8 @@ def UI(**kwargs):
             launch_kwargs["share"] = share
     if root_path:
         launch_kwargs["root_path"] = root_path
-    launch_kwargs["debug"] = True
+    if debug:
+        launch_kwargs["debug"] = True
     interface.launch(**launch_kwargs)
 
 

diff --git a/kohya_gui/blip2_caption_gui.py b/kohya_gui/blip2_caption_gui.py
@@ -42,7 +42,7 @@ def get_images_in_directory(directory_path):
     import os
 
     # List of common image file extensions to look for
-    image_extensions = [".jpg", ".jpeg", ".png", ".bmp", ".gif"]
+    image_extensions = [".jpg", ".jpeg", ".png", ".bmp", ".gif", ".webp"]
 
     # Generate a list of image file paths in the directory
     image_files = [

diff --git a/kohya_gui/class_accelerate_launch.py b/kohya_gui/class_accelerate_launch.py
@@ -3,6 +3,10 @@
 import shlex
 
 from .class_gui_config import KohyaSSGUIConfig
+from .custom_logging import setup_logging
+
+# Set up logging
+log = setup_logging()
 
 
 class AccelerateLaunch:
@@ -79,12 +83,16 @@ def __init__(
                 )
                 self.dynamo_use_fullgraph = gr.Checkbox(
                     label="Dynamo use fullgraph",
-                    value=self.config.get("accelerate_launch.dynamo_use_fullgraph", False),
+                    value=self.config.get(
+                        "accelerate_launch.dynamo_use_fullgraph", False
+                    ),
                     info="Whether to use full graph mode for dynamo or it is ok to break model into several subgraphs",
                 )
                 self.dynamo_use_dynamic = gr.Checkbox(
                     label="Dynamo use dynamic",
-                    value=self.config.get("accelerate_launch.dynamo_use_dynamic", False),
+                    value=self.config.get(
+                        "accelerate_launch.dynamo_use_dynamic", False
+                    ),
                     info="Whether to enable dynamic shape tracing.",
                 )
 
@@ -103,6 +111,24 @@ def __init__(
                     placeholder="example: 0,1",
                     info=" What GPUs (by id) should be used for training on this machine as a comma-separated list",
                 )
+
+                def validate_gpu_ids(value):
+                    if value == "":
+                        return
+                    if not (
+                        value.isdigit() and int(value) >= 0 and int(value) <= 128
+                    ):
+                        log.error("GPU IDs must be an integer between 0 and 128")
+                        return
+                    else:
+                        for id in value.split(","):
+                            if not id.isdigit() or int(id) < 0 or int(id) > 128:
+                                log.error(
+                                    "GPU IDs must be an integer between 0 and 128"
+                                )
+
+                self.gpu_ids.blur(fn=validate_gpu_ids, inputs=self.gpu_ids)
+
                 self.main_process_port = gr.Number(
                     label="Main process port",
                     value=self.config.get("accelerate_launch.main_process_port", 0),
@@ -136,9 +162,14 @@ def run_cmd(run_cmd: list, **kwargs):
 
         if "dynamo_use_dynamic" in kwargs and kwargs.get("dynamo_use_dynamic"):
             run_cmd.append("--dynamo_use_dynamic")
-
-        if "extra_accelerate_launch_args" in kwargs and kwargs["extra_accelerate_launch_args"] != "":
-            extra_accelerate_launch_args = kwargs["extra_accelerate_launch_args"].replace('"', "")
+
+        if (
+            "extra_accelerate_launch_args" in kwargs
+            and kwargs["extra_accelerate_launch_args"] != ""
+        ):
+            extra_accelerate_launch_args = kwargs[
+                "extra_accelerate_launch_args"
+            ].replace('"', "")
             for arg in extra_accelerate_launch_args.split():
                 run_cmd.append(shlex.quote(arg))
 

diff --git a/kohya_gui/class_advanced_training.py b/kohya_gui/class_advanced_training.py
@@ -534,6 +534,11 @@ def list_log_tracker_config_files(path):
                 self.current_log_tracker_config_dir = path if not path == "" else "."
                 return list(list_files(path, exts=[".json"], all=True))
 
+            self.log_config = gr.Checkbox(
+                label="Log config",
+                value=self.config.get("advanced.log_config", False),
+                info="Log training parameter to WANDB",
+            )
             self.log_tracker_name = gr.Textbox(
                 label="Log tracker name",
                 value=self.config.get("advanced.log_tracker_name", ""),

diff --git a/kohya_gui/class_basic_training.py b/kohya_gui/class_basic_training.py
@@ -162,12 +162,23 @@ def init_lr_and_optimizer_controls(self) -> None:
                     "cosine",
                     "cosine_with_restarts",
                     "linear",
+                    "piecewise_constant",
                     "polynomial",
                 ],
                 value=self.config.get("basic.lr_scheduler", self.lr_scheduler_value),
             )
 
-
+            # Initialize the learning rate scheduler type dropdown
+            self.lr_scheduler_type = gr.Dropdown(
+                label="LR Scheduler type",
+                info="(Optional) custom scheduler module name",
+                choices=[
+                    "",
+                    "CosineAnnealingLR",
+                ],
+                value=self.config.get("basic.lr_scheduler_type", ""),
+                allow_custom_value=True,
+            )
 
             # Initialize the optimizer dropdown
             self.optimizer = gr.Dropdown(
@@ -240,7 +251,7 @@ def init_learning_rate_controls(self) -> None:
             self.learning_rate = gr.Number(
                 label=lr_label,
                 value=self.config.get("basic.learning_rate", self.learning_rate_value),
-                minimum=0,
+                minimum=-1,
                 maximum=1,
                 info="Set to 0 to not train the Unet",
             )
@@ -251,7 +262,7 @@ def init_learning_rate_controls(self) -> None:
                     "basic.learning_rate_te", self.learning_rate_value
                 ),
                 visible=self.finetuning or self.dreambooth,
-                minimum=0,
+                minimum=-1,
                 maximum=1,
                 info="Set to 0 to not train the Text Encoder",
             )
@@ -262,7 +273,7 @@ def init_learning_rate_controls(self) -> None:
                     "basic.learning_rate_te1", self.learning_rate_value
                 ),
                 visible=False,
-                minimum=0,
+                minimum=-1,
                 maximum=1,
                 info="Set to 0 to not train the Text Encoder 1",
             )
@@ -273,7 +284,7 @@ def init_learning_rate_controls(self) -> None:
                     "basic.learning_rate_te2", self.learning_rate_value
                 ),
                 visible=False,
-                minimum=0,
+                minimum=-1,
                 maximum=1,
                 info="Set to 0 to not train the Text Encoder 2",
             )

diff --git a/kohya_gui/class_command_executor.py b/kohya_gui/class_command_executor.py
@@ -48,7 +48,7 @@ def execute_command(self, run_cmd: str, **kwargs):
 
             # Execute the command securely
             self.process = subprocess.Popen(run_cmd, **kwargs)
-            log.info("Command executed.")
+            log.debug("Command executed.")
 
     def kill_command(self):
         """

diff --git a/kohya_gui/class_sample_images.py b/kohya_gui/class_sample_images.py
@@ -28,7 +28,10 @@ def create_prompt_file(sample_prompts, output_dir):
     Returns:
         str: The path to the prompt file.
     """
-    sample_prompts_path = os.path.join(output_dir, "prompt.txt")
+    sample_prompts_path = os.path.join(output_dir, "sample/prompt.txt")
+
+    if not os.path.exists(os.path.dirname(sample_prompts_path)):
+        os.makedirs(os.path.dirname(sample_prompts_path))
 
     with open(sample_prompts_path, "w", encoding="utf-8") as f:
         f.write(sample_prompts)