Merge pull request #1907 from bmaltais/dev

v22.6.0
bmaltais · Jan 27, 2024 · 62fbae6 · 62fbae6
2 parents bfe8b06 + 36b666b
commit 62fbae6
Show file tree

Hide file tree

Showing 32 changed files with 1,248 additions and 832 deletions.
diff --git a/.release b/.release
@@ -1 +1 @@
-v22.5.0
+v22.6.0
diff --git a/README.md b/README.md
@@ -47,11 +47,6 @@ The GUI allows you to set the training parameters and generate and run the requi
     - [ControlNet-LLLite](#controlnet-lllite)
     - [Sample image generation during training](#sample-image-generation-during-training-1)
   - [Change History](#change-history)
-    - [Jan 15, 2024 / 2024/1/15: v0.8.0](#jan-15-2024--2024115-v080)
-    - [Naming of LoRA](#naming-of-lora)
-    - [LoRAの名称について](#loraの名称について)
-  - [Sample image generation during training](#sample-image-generation-during-training-2)
-  - [Change History](#change-history-1)
 
 ## 🦒 Colab
 
@@ -109,7 +104,7 @@ Please note that the CUDNN 8.6 DLLs needed for this process cannot be hosted on
 
 1. Unzip the downloaded file and place the `cudnn_windows` folder in the root directory of the `kohya_ss` repository.
 
-2. Run .\setup.bat and select the option to install cudann.
+2. Run .\setup.bat and select the option to install cudnn.
 
 ### Linux and macOS
 
@@ -122,7 +117,7 @@ To install the necessary dependencies on a Linux system, ensure that you fulfill
   apt install python3.10-venv
   ```
 
-- Install the cudaNN drivers by following the instructions provided in [this link](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64).
+- Install the cudNN drivers by following the instructions provided in [this link](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64).
 
 - Make sure you have Python version 3.10.6 or higher (but lower than 3.11.0) installed on your system.
 
@@ -484,78 +479,7 @@ save_file(state_dict, file)
 
 ControlNet-LLLite, a novel method for ControlNet with SDXL, is added. See [documentation](./docs/train_lllite_README.md) for details.
 
-<<<<<<< HEAD
 ### Sample image generation during training
-=======
-
-## Change History
-
-### Jan 15, 2024 / 2024/1/15: v0.8.0
-
-- Diffusers, Accelerate, Transformers and other related libraries have been updated. Please update the libraries with [Upgrade](#upgrade).
-  - Some model files (Text Encoder without position_id) based on the latest Transformers can be loaded.
-- `torch.compile` is supported (experimental). PR [#1024](https://github.com/kohya-ss/sd-scripts/pull/1024) Thanks to p1atdev!
-  - This feature works only on Linux or WSL.
-  - Please specify `--torch_compile` option in each training script.
-  - You can select the backend with `--dynamo_backend` option. The default is `"inductor"`. `inductor` or `eager` seems to work.
-  - Please use `--spda` option instead of `--xformers` option.
-  - PyTorch 2.1 or later is recommended.
-  - Please see [PR](https://github.com/kohya-ss/sd-scripts/pull/1024) for details.
-- The session name for wandb can be specified with `--wandb_run_name` option. PR [#1032](https://github.com/kohya-ss/sd-scripts/pull/1032) Thanks to hopl1t!
-- IPEX library is updated. PR [#1030](https://github.com/kohya-ss/sd-scripts/pull/1030) Thanks to Disty0!
-- Fixed a bug that Diffusers format model cannot be saved.
-
-- Diffusers、Accelerate、Transformers 等の関連ライブラリを更新しました。[Upgrade](#upgrade) を参照し更新をお願いします。
-  - 最新の Transformers を前提とした一部のモデルファイル（Text Encoder が position_id を持たないもの）が読み込めるようになりました。
-- `torch.compile` がサポートされしました（実験的）。 PR [#1024](https://github.com/kohya-ss/sd-scripts/pull/1024) p1atdev 氏に感謝します。
-  - Linux または WSL でのみ動作します。
-  - 各学習スクリプトで `--torch_compile` オプションを指定してください。
-  - `--dynamo_backend` オプションで使用される backend を選択できます。デフォルトは `"inductor"` です。 `inductor` または `eager` が動作するようです。
-  - `--xformers` オプションとは互換性がありません。 代わりに `--spda` オプションを使用してください。
-  - PyTorch 2.1以降を推奨します。
-  - 詳細は [PR](https://github.com/kohya-ss/sd-scripts/pull/1024) をご覧ください。
-- wandb 保存時のセッション名が各学習スクリプトの `--wandb_run_name` オプションで指定できるようになりました。 PR [#1032](https://github.com/kohya-ss/sd-scripts/pull/1032) hopl1t 氏に感謝します。
-- IPEX ライブラリが更新されました。[PR #1030](https://github.com/kohya-ss/sd-scripts/pull/1030) Disty0 氏に感謝します。
-- Diffusers 形式でのモデル保存ができなくなっていた不具合を修正しました。
-
-
-Please read [Releases](https://github.com/kohya-ss/sd-scripts/releases) for recent updates.
-最近の更新情報は [Release](https://github.com/kohya-ss/sd-scripts/releases) をご覧ください。
-
-### Naming of LoRA
-
-The LoRA supported by `train_network.py` has been named to avoid confusion. The documentation has been updated. The following are the names of LoRA types in this repository.
-
-1. __LoRA-LierLa__ : (LoRA for __Li__ n __e__ a __r__  __La__ yers)
-
-    LoRA for Linear layers and Conv2d layers with 1x1 kernel
-
-2. __LoRA-C3Lier__ : (LoRA for __C__ olutional layers with __3__ x3 Kernel and  __Li__ n __e__ a __r__ layers)
-
-    In addition to 1., LoRA for Conv2d layers with 3x3 kernel 
-    
-LoRA-LierLa is the default LoRA type for `train_network.py` (without `conv_dim` network arg). LoRA-LierLa can be used with [our extension](https://github.com/kohya-ss/sd-webui-additional-networks) for AUTOMATIC1111's Web UI, or with the built-in LoRA feature of the Web UI.
-
-To use LoRA-C3Lier with Web UI, please use our extension.
-
-### LoRAの名称について
-
-`train_network.py` がサポートするLoRAについて、混乱を避けるため名前を付けました。ドキュメントは更新済みです。以下は当リポジトリ内の独自の名称です。
-
-1. __LoRA-LierLa__ : (LoRA for __Li__ n __e__ a __r__  __La__ yers、リエラと読みます)
-
-    Linear 層およびカーネルサイズ 1x1 の Conv2d 層に適用されるLoRA
-
-2. __LoRA-C3Lier__ : (LoRA for __C__ olutional layers with __3__ x3 Kernel and  __Li__ n __e__ a __r__ layers、セリアと読みます)
-
-    1.に加え、カーネルサイズ 3x3 の Conv2d 層に適用されるLoRA
-
-LoRA-LierLa は[Web UI向け拡張](https://github.com/kohya-ss/sd-webui-additional-networks)、またはAUTOMATIC1111氏のWeb UIのLoRA機能で使用することができます。
-
-LoRA-C3Lierを使いWeb UIで生成するには拡張を使用してください。
-
-## Sample image generation during training
->>>>>>> 26d35794e3b858e7b5bd20d1e70547c378550b3d
   A prompt file might look like this, for example
 
 ```
@@ -579,6 +503,62 @@ masterpiece, best quality, 1boy, in business suit, standing at street, looking b
 
 
 ## Change History
+* 2024/01/27 (v22.6.0)
+- Merge sd-scripts v0.8.3 code update
+  - Fixed a bug that the training crashes when `--fp8_base` is specified with `--save_state`. PR [#1079](https://github.com/kohya-ss/sd-scripts/pull/1079) Thanks to feffy380!
+    - `safetensors` is updated. Please see [Upgrade](#upgrade) and update the library.
+  - Fixed a bug that the training crashes when `network_multiplier` is specified with multi-GPU training. PR [#1084](https://github.com/kohya-ss/sd-scripts/pull/1084) Thanks to fireicewolf!
+  - Fixed a bug that the training crashes when training ControlNet-LLLite.
+
+- Merge sd-scripts v0.8.2 code update
+  - [Experimental] The `--fp8_base` option is added to the training scripts for LoRA etc. The base model (U-Net, and Text Encoder when training modules for Text Encoder) can be trained with fp8. PR [#1057](https://github.com/kohya-ss/sd-scripts/pull/1057) Thanks to KohakuBlueleaf!
+    - Please specify `--fp8_base` in `train_network.py` or `sdxl_train_network.py`.
+    - PyTorch 2.1 or later is required.
+    - If you use xformers with PyTorch 2.1, please see [xformers repository](https://github.com/facebookresearch/xformers) and install the appropriate version according to your CUDA version.
+    - The sample image generation during training consumes a lot of memory. It is recommended to turn it off.
+
+  - [Experimental] The network multiplier can be specified for each dataset in the training scripts for LoRA etc.
+    - This is an experimental option and may be removed or changed in the future.
+    - For example, if you train with state A as `1.0` and state B as `-1.0`, you may be able to generate by switching between state A and B depending on the LoRA application rate.
+    - Also, if you prepare five states and train them as `0.2`, `0.4`, `0.6`, `0.8`, and `1.0`, you may be able to generate by switching the states smoothly depending on the application rate.
+    - Please specify `network_multiplier` in `[[datasets]]` in `.toml` file.
+
+  - Some options are added to `networks/extract_lora_from_models.py` to reduce the memory usage.
+    - `--load_precision` option can be used to specify the precision when loading the model. If the model is saved in fp16, you can reduce the memory usage by specifying `--load_precision fp16` without losing precision.
+    - `--load_original_model_to` option can be used to specify the device to load the original model. `--load_tuned_model_to` option can be used to specify the device to load the derived model. The default is `cpu` for both options, but you can specify `cuda` etc. You can reduce the memory usage by loading one of them to GPU. This option is available only for SDXL.
+
+  - The gradient synchronization in LoRA training with multi-GPU is improved. PR [#1064](https://github.com/kohya-ss/sd-scripts/pull/1064) Thanks to KohakuBlueleaf!
+
+  - The code for Intel IPEX support is improved. PR [#1060](https://github.com/kohya-ss/sd-scripts/pull/1060) Thanks to akx!
+
+  - Fixed a bug in multi-GPU Textual Inversion training.
+
+  - `.toml` example for network multiplier
+
+    ```toml
+    [general]
+    [[datasets]]
+    resolution = 512
+    batch_size = 8
+    network_multiplier = 1.0
+
+    ... subset settings ...
+
+    [[datasets]]
+    resolution = 512
+    batch_size = 8
+    network_multiplier = -1.0
+
+    ... subset settings ...
+    ```
+
+- Merge sd-scripts v0.8.1 code update
+
+  - Fixed a bug that the VRAM usage without Text Encoder training is larger than before in training scripts for LoRA etc (`train_network.py`, `sdxl_train_network.py`).
+    - Text Encoders were not moved to CPU.
+
+  - Fixed typos. Thanks to akx! [PR #1053](https://github.com/kohya-ss/sd-scripts/pull/1053)
+
 * 2024/01/15 (v22.5.0)
 - Merged sd-scripts v0.8.0 updates
   - Diffusers, Accelerate, Transformers and other related libraries have been updated. Please update the libraries with [Upgrade](#upgrade).

diff --git a/README_中文教程.md b/README_中文教程.md
@@ -1,141 +1,127 @@
-嗨!我把日语 README 文件的主要内容翻译成中文如下:
+SDXL已得到支持。sdxl分支已合并到main分支。当更新仓库时,请执行升级步骤。由于accelerate版本也已经升级,请重新运行accelerate config。
 
-## 关于这个仓库
+有关SDXL训练的信息,请参见[此处](./README.md#sdxl-training)(英文)。
 
-这个是用于Stable Diffusion模型训练、图像生成和其他脚本的仓库。 
+## 关于本仓库
 
-[英文版 README](./README.md) <-- 更新信息在这里
+用于Stable Diffusion的训练、图像生成和其他脚本的仓库。
 
-GUI和PowerShell脚本等使其更易用的功能在[bmaltais的仓库](https://github.com/bmaltais/kohya_ss)(英语)中提供,一并参考。感谢bmaltais。
+[英文README](./README.md) <- 更新信息在这里
+
+[bmaltais的仓库](https://github.com/bmaltais/kohya_ss)中提供了GUI和PowerShell脚本等使其更易于使用的功能(英文),也请一并参阅。衷心感谢bmaltais。
 
 包含以下脚本:
 
-* 支持DreamBooth、U-Net和文本编码器的训练
-* fine-tuning的支持
+* 支持DreamBooth、U-Net和Text Encoder的训练
+* 微调,同上
+* 支持LoRA的训练
 * 图像生成
-* 模型转换(Stable Diffusion ckpt/safetensors 和 Diffusers之间的相互转换)
-
-## 使用方法 (中国用户只需要按照这个安装教程操作）
-- 进入kohya_ss文件夹根目录下，点击 setup.bat 启动安装程序 *（需要科学上网）
-- 根据界面上给出的英文选项：
-Kohya_ss GUI setup menu:
-
-1. Install kohya_ss gui
-2. (Optional) Install cudann files (avoid unless you really need it)
-3. (Optional) Install specific bitsandbytes versions
-4. (Optional) Manually configure accelerate
-5. (Optional) Start Kohya_ss GUI in browser
-6. Quit
-
-Enter your choice: 1
+* 模型转换(在Stable Diffision ckpt/safetensors与Diffusers之间转换)
 
-1. Torch 1 (legacy, no longer supported. Will be removed in v21.9.x)
-2. Torch 2 (recommended)
-3. Cancel
-
-Enter your choice: 2
-
-开始安装环境依赖，接着再出来的选项，按照下列选项操作：
-```txt
-- This machine
-- No distributed training
-- NO
-- NO
-- NO
-- all
-- bf16
-```
---------------------------------------------------------------------
-这里都选择完毕，即可关闭终端窗口，直接点击 gui.bat或者 kohya中文启动器.bat 即可运行kohya
+## 使用方法
 
-
-当仓库内和note.com有相关文章,请参考那里。(未来可能全部移到这里)
-
-* [关于训练,通用篇](./docs/train_README-zh.md): 数据准备和选项等
-    * [数据集设置](./docs/config_README-ja.md)
-* [DreamBooth训练指南](./docs/train_db_README-zh.md) 
-* [fine-tuning指南](./docs/fine_tune_README_ja.md)
-* [LoRA训练指南](./docs/train_network_README-zh.md)
-* [文本反转训练指南](./docs/train_ti_README-ja.md)
+* [通用部分的训练信息](./docs/train_README-ja.md): 数据准备和选项等
+* [数据集设置](./docs/config_README-ja.md)
+* [DreamBooth的训练信息](./docs/train_db_README-ja.md)  
+* [微调指南](./docs/fine_tune_README_ja.md)
+* [LoRA的训练信息](./docs/train_network_README-ja.md)
+* [Textual Inversion的训练信息](./docs/train_ti_README-ja.md)
 * [图像生成脚本](./docs/gen_img_README-ja.md)
 * note.com [模型转换脚本](https://note.com/kohya_ss/n/n374f316fe4ad)
 
-## Windows环境所需程序
+## Windows上需要的程序
 
 需要Python 3.10.6和Git。
 
 - Python 3.10.6: https://www.python.org/ftp/python/3.10.6/python-3.10.6-amd64.exe
-- git: https://git-scm.com/download/win
+- git: https://git-scm.com/download/win  
 
-如果要在PowerShell中使用venv,需要按以下步骤更改安全设置:
-(不仅仅是venv,使脚本可以执行。请注意。)
+如果要在PowerShell中使用,请按以下步骤更改安全设置以使用venv。
+(不仅仅是venv,这使得脚本的执行成为可能,所以请注意。)
 
-- 以管理员身份打开PowerShell
-- 输入"Set-ExecutionPolicy Unrestricted",选择Y
-- 关闭管理员PowerShell
+- 以管理员身份打开PowerShell。
+- 输入“Set-ExecutionPolicy Unrestricted”,并回答Y。  
+- 关闭管理员PowerShell。
 
 ## 在Windows环境下安装
 
-下例中安装的是PyTorch 1.12.1/CUDA 11.6版。如果要使用CUDA 11.3或PyTorch 1.13,请适当修改。
+脚本已在PyTorch 2.0.1上通过测试。PyTorch 1.12.1也应该可以工作。
+
+下例中,将安装PyTorch 2.0.1/CUDA 11.8版。如果使用CUDA 11.6版或PyTorch 1.12.1,请酌情更改。  
 
-(如果只显示"python",请将下例中的"python"改为"py")  
+(注意,如果python -m venv~这行只显示“python”,请将其更改为py -m venv~。)
 
-在普通(非管理员)PowerShell中依次执行以下命令:
+如果使用PowerShell,请打开常规(非管理员)PowerShell并按顺序执行以下操作:  
 
 ```powershell
-git clone https://github.com/kohya-ss/sd-scripts.git
+git clone https://github.com/kohya-ss/sd-scripts.git 
 cd sd-scripts
 
 python -m venv venv
-.\venv\Scripts\activate
+.\venv\Scripts\activate  
 
-pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
+pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118
 pip install --upgrade -r requirements.txt
-pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
-
-cp .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
-cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
-cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
+pip install xformers==0.0.20  
 
 accelerate config
 ```
 
-在命令提示符中:
-
-```bat
-git clone https://github.com/kohya-ss/sd-scripts.git
-cd sd-scripts
+在命令提示符下也相同。  
 
-python -m venv venv 
-.\venv\Scripts\activate
+(注:由于 ``python -m venv venv`` 比 ``python -m venv --system-site-packages venv`` 更安全,已进行更改。如果global python中安装了package,后者会引发各种问题。) 
 
-pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
-pip install --upgrade -r requirements.txt
-pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
+在accelerate config的提示中,请按以下方式回答。(如果以bf16学习,最后一个问题回答bf16。)  
 
-copy /y .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
-copy /y .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
-copy /y .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
+※从0.15.0开始,在日语环境中按方向键选择会崩溃(......)。请使用数字键0、1、2......进行选择。  
 
-accelerate config
+```txt
+- This machine  
+- No distributed training
+- NO  
+- NO
+- NO
+- all
+- fp16
 ```
 
-accelerate config的问题请按以下回答:
-(如果要用bf16训练,最后一个问题选择bf16)
+※有时可能会出现 ``ValueError: fp16 mixed precision requires a GPU`` 错误。在这种情况下,对第6个问题 ``(What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:``
+)回答“0”。(将使用id `0`的GPU)。
 
-```
-- 此计算机
-- 不进行分布式训练  
-- 否
-- 否 
-- 否
-- 所有
-- fp16
-```
+### 可选:``bitsandbytes``(8位优化器)
+
+`bitsandbytes`现在是可选的。在Linux上,可以通过pip正常安装(推荐0.41.1或更高版本)。  
+
+在Windows上,推荐0.35.0或0.41.1。
+
+- `bitsandbytes` 0.35.0: 似乎是稳定的版本。可以使用AdamW8bit,但不能使用其他一些8位优化器和`full_bf16`学习时的选项。
+- `bitsandbytes` 0.41.1: 支持 Lion8bit、PagedAdamW8bit、PagedLion8bit。可以使用`full_bf16`。   
 
-### PyTorch和xformers版本注意事项
+注意:`bitsandbytes` 从0.35.0到0.41.0之间的版本似乎存在问题。 https://github.com/TimDettmers/bitsandbytes/issues/659  
 
-在其他版本中训练可能失败。如果没有特殊原因,请使用指定版本。
+请按以下步骤安装`bitsandbytes`。   
+
+### 使用0.35.0  
+
+以下是PowerShell的例子。在命令提示符中,请使用copy代替cp。   
+
+```powershell    
+cd sd-scripts
+.\venv\Scripts\activate
+pip install bitsandbytes==0.35.0  
+
+cp .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\  
+cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
+cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
+```  
+
+### 使用0.41.1  
+
+请从[此处](https://github.com/jllllll/bitsandbytes-windows-webui)或其他地方安装jllllll发布的Windows whl文件。   
+
+```powershell   
+python -m pip install bitsandbytes==0.41.1 --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui 
+```
 
 
 ### 可选:使用Lion8bit