Skip to content

Commit

Permalink
Merge pull request #1907 from bmaltais/dev
Browse files Browse the repository at this point in the history
v22.6.0
  • Loading branch information
bmaltais committed Jan 27, 2024
2 parents bfe8b06 + 36b666b commit 62fbae6
Show file tree
Hide file tree
Showing 32 changed files with 1,248 additions and 832 deletions.
2 changes: 1 addition & 1 deletion .release
Original file line number Diff line number Diff line change
@@ -1 +1 @@
v22.5.0
v22.6.0
136 changes: 58 additions & 78 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,6 @@ The GUI allows you to set the training parameters and generate and run the requi
- [ControlNet-LLLite](#controlnet-lllite)
- [Sample image generation during training](#sample-image-generation-during-training-1)
- [Change History](#change-history)
- [Jan 15, 2024 / 2024/1/15: v0.8.0](#jan-15-2024--2024115-v080)
- [Naming of LoRA](#naming-of-lora)
- [LoRAの名称について](#loraの名称について)
- [Sample image generation during training](#sample-image-generation-during-training-2)
- [Change History](#change-history-1)

## 🦒 Colab

Expand Down Expand Up @@ -109,7 +104,7 @@ Please note that the CUDNN 8.6 DLLs needed for this process cannot be hosted on

1. Unzip the downloaded file and place the `cudnn_windows` folder in the root directory of the `kohya_ss` repository.

2. Run .\setup.bat and select the option to install cudann.
2. Run .\setup.bat and select the option to install cudnn.

### Linux and macOS

Expand All @@ -122,7 +117,7 @@ To install the necessary dependencies on a Linux system, ensure that you fulfill
apt install python3.10-venv
```

- Install the cudaNN drivers by following the instructions provided in [this link](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64).
- Install the cudNN drivers by following the instructions provided in [this link](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64).

- Make sure you have Python version 3.10.6 or higher (but lower than 3.11.0) installed on your system.

Expand Down Expand Up @@ -484,78 +479,7 @@ save_file(state_dict, file)
ControlNet-LLLite, a novel method for ControlNet with SDXL, is added. See [documentation](./docs/train_lllite_README.md) for details.
<<<<<<< HEAD
### Sample image generation during training
=======
## Change History
### Jan 15, 2024 / 2024/1/15: v0.8.0
- Diffusers, Accelerate, Transformers and other related libraries have been updated. Please update the libraries with [Upgrade](#upgrade).
- Some model files (Text Encoder without position_id) based on the latest Transformers can be loaded.
- `torch.compile` is supported (experimental). PR [#1024](https://github.com/kohya-ss/sd-scripts/pull/1024) Thanks to p1atdev!
- This feature works only on Linux or WSL.
- Please specify `--torch_compile` option in each training script.
- You can select the backend with `--dynamo_backend` option. The default is `"inductor"`. `inductor` or `eager` seems to work.
- Please use `--spda` option instead of `--xformers` option.
- PyTorch 2.1 or later is recommended.
- Please see [PR](https://github.com/kohya-ss/sd-scripts/pull/1024) for details.
- The session name for wandb can be specified with `--wandb_run_name` option. PR [#1032](https://github.com/kohya-ss/sd-scripts/pull/1032) Thanks to hopl1t!
- IPEX library is updated. PR [#1030](https://github.com/kohya-ss/sd-scripts/pull/1030) Thanks to Disty0!
- Fixed a bug that Diffusers format model cannot be saved.
- Diffusers、Accelerate、Transformers 等の関連ライブラリを更新しました。[Upgrade](#upgrade) を参照し更新をお願いします。
- 最新の Transformers を前提とした一部のモデルファイル(Text Encoder が position_id を持たないもの)が読み込めるようになりました。
- `torch.compile` がサポートされしました(実験的)。 PR [#1024](https://github.com/kohya-ss/sd-scripts/pull/1024) p1atdev 氏に感謝します。
- Linux または WSL でのみ動作します。
- 各学習スクリプトで `--torch_compile` オプションを指定してください。
- `--dynamo_backend` オプションで使用される backend を選択できます。デフォルトは `"inductor"` です。 `inductor` または `eager` が動作するようです。
- `--xformers` オプションとは互換性がありません。 代わりに `--spda` オプションを使用してください。
- PyTorch 2.1以降を推奨します。
- 詳細は [PR](https://github.com/kohya-ss/sd-scripts/pull/1024) をご覧ください。
- wandb 保存時のセッション名が各学習スクリプトの `--wandb_run_name` オプションで指定できるようになりました。 PR [#1032](https://github.com/kohya-ss/sd-scripts/pull/1032) hopl1t 氏に感謝します。
- IPEX ライブラリが更新されました。[PR #1030](https://github.com/kohya-ss/sd-scripts/pull/1030) Disty0 氏に感謝します。
- Diffusers 形式でのモデル保存ができなくなっていた不具合を修正しました。
Please read [Releases](https://github.com/kohya-ss/sd-scripts/releases) for recent updates.
最近の更新情報は [Release](https://github.com/kohya-ss/sd-scripts/releases) をご覧ください。
### Naming of LoRA
The LoRA supported by `train_network.py` has been named to avoid confusion. The documentation has been updated. The following are the names of LoRA types in this repository.
1. __LoRA-LierLa__ : (LoRA for __Li__ n __e__ a __r__ __La__ yers)
LoRA for Linear layers and Conv2d layers with 1x1 kernel
2. __LoRA-C3Lier__ : (LoRA for __C__ olutional layers with __3__ x3 Kernel and __Li__ n __e__ a __r__ layers)
In addition to 1., LoRA for Conv2d layers with 3x3 kernel
LoRA-LierLa is the default LoRA type for `train_network.py` (without `conv_dim` network arg). LoRA-LierLa can be used with [our extension](https://github.com/kohya-ss/sd-webui-additional-networks) for AUTOMATIC1111's Web UI, or with the built-in LoRA feature of the Web UI.
To use LoRA-C3Lier with Web UI, please use our extension.
### LoRAの名称について
`train_network.py` がサポートするLoRAについて、混乱を避けるため名前を付けました。ドキュメントは更新済みです。以下は当リポジトリ内の独自の名称です。
1. __LoRA-LierLa__ : (LoRA for __Li__ n __e__ a __r__ __La__ yers、リエラと読みます)
Linear 層およびカーネルサイズ 1x1 の Conv2d 層に適用されるLoRA
2. __LoRA-C3Lier__ : (LoRA for __C__ olutional layers with __3__ x3 Kernel and __Li__ n __e__ a __r__ layers、セリアと読みます)
1.に加え、カーネルサイズ 3x3 の Conv2d 層に適用されるLoRA
LoRA-LierLa は[Web UI向け拡張](https://github.com/kohya-ss/sd-webui-additional-networks)、またはAUTOMATIC1111氏のWeb UIのLoRA機能で使用することができます。
LoRA-C3Lierを使いWeb UIで生成するには拡張を使用してください。
## Sample image generation during training
>>>>>>> 26d35794e3b858e7b5bd20d1e70547c378550b3d
A prompt file might look like this, for example
```
Expand All @@ -579,6 +503,62 @@ masterpiece, best quality, 1boy, in business suit, standing at street, looking b


## Change History
* 2024/01/27 (v22.6.0)
- Merge sd-scripts v0.8.3 code update
- Fixed a bug that the training crashes when `--fp8_base` is specified with `--save_state`. PR [#1079](https://github.com/kohya-ss/sd-scripts/pull/1079) Thanks to feffy380!
- `safetensors` is updated. Please see [Upgrade](#upgrade) and update the library.
- Fixed a bug that the training crashes when `network_multiplier` is specified with multi-GPU training. PR [#1084](https://github.com/kohya-ss/sd-scripts/pull/1084) Thanks to fireicewolf!
- Fixed a bug that the training crashes when training ControlNet-LLLite.

- Merge sd-scripts v0.8.2 code update
- [Experimental] The `--fp8_base` option is added to the training scripts for LoRA etc. The base model (U-Net, and Text Encoder when training modules for Text Encoder) can be trained with fp8. PR [#1057](https://github.com/kohya-ss/sd-scripts/pull/1057) Thanks to KohakuBlueleaf!
- Please specify `--fp8_base` in `train_network.py` or `sdxl_train_network.py`.
- PyTorch 2.1 or later is required.
- If you use xformers with PyTorch 2.1, please see [xformers repository](https://github.com/facebookresearch/xformers) and install the appropriate version according to your CUDA version.
- The sample image generation during training consumes a lot of memory. It is recommended to turn it off.

- [Experimental] The network multiplier can be specified for each dataset in the training scripts for LoRA etc.
- This is an experimental option and may be removed or changed in the future.
- For example, if you train with state A as `1.0` and state B as `-1.0`, you may be able to generate by switching between state A and B depending on the LoRA application rate.
- Also, if you prepare five states and train them as `0.2`, `0.4`, `0.6`, `0.8`, and `1.0`, you may be able to generate by switching the states smoothly depending on the application rate.
- Please specify `network_multiplier` in `[[datasets]]` in `.toml` file.

- Some options are added to `networks/extract_lora_from_models.py` to reduce the memory usage.
- `--load_precision` option can be used to specify the precision when loading the model. If the model is saved in fp16, you can reduce the memory usage by specifying `--load_precision fp16` without losing precision.
- `--load_original_model_to` option can be used to specify the device to load the original model. `--load_tuned_model_to` option can be used to specify the device to load the derived model. The default is `cpu` for both options, but you can specify `cuda` etc. You can reduce the memory usage by loading one of them to GPU. This option is available only for SDXL.

- The gradient synchronization in LoRA training with multi-GPU is improved. PR [#1064](https://github.com/kohya-ss/sd-scripts/pull/1064) Thanks to KohakuBlueleaf!

- The code for Intel IPEX support is improved. PR [#1060](https://github.com/kohya-ss/sd-scripts/pull/1060) Thanks to akx!

- Fixed a bug in multi-GPU Textual Inversion training.

- `.toml` example for network multiplier

```toml
[general]
[[datasets]]
resolution = 512
batch_size = 8
network_multiplier = 1.0
... subset settings ...
[[datasets]]
resolution = 512
batch_size = 8
network_multiplier = -1.0
... subset settings ...
```

- Merge sd-scripts v0.8.1 code update

- Fixed a bug that the VRAM usage without Text Encoder training is larger than before in training scripts for LoRA etc (`train_network.py`, `sdxl_train_network.py`).
- Text Encoders were not moved to CPU.

- Fixed typos. Thanks to akx! [PR #1053](https://github.com/kohya-ss/sd-scripts/pull/1053)

* 2024/01/15 (v22.5.0)
- Merged sd-scripts v0.8.0 updates
- Diffusers, Accelerate, Transformers and other related libraries have been updated. Please update the libraries with [Upgrade](#upgrade).
Expand Down
174 changes: 80 additions & 94 deletions README_中文教程.md
Original file line number Diff line number Diff line change
@@ -1,141 +1,127 @@
嗨!我把日语 README 文件的主要内容翻译成中文如下:
SDXL已得到支持。sdxl分支已合并到main分支。当更新仓库时,请执行升级步骤。由于accelerate版本也已经升级,请重新运行accelerate config。

## 关于这个仓库
有关SDXL训练的信息,请参见[此处](./README.md#sdxl-training)(英文)。

这个是用于Stable Diffusion模型训练、图像生成和其他脚本的仓库。
## 关于本仓库

[英文版 README](./README.md) <-- 更新信息在这里
用于Stable Diffusion的训练、图像生成和其他脚本的仓库。

GUI和PowerShell脚本等使其更易用的功能在[bmaltais的仓库](https://github.com/bmaltais/kohya_ss)(英语)中提供,一并参考。感谢bmaltais。
[英文README](./README.md) <- 更新信息在这里

[bmaltais的仓库](https://github.com/bmaltais/kohya_ss)中提供了GUI和PowerShell脚本等使其更易于使用的功能(英文),也请一并参阅。衷心感谢bmaltais。

包含以下脚本:

* 支持DreamBooth、U-Net和文本编码器的训练
* fine-tuning的支持
* 支持DreamBooth、U-Net和Text Encoder的训练
* 微调,同上
* 支持LoRA的训练
* 图像生成
* 模型转换(Stable Diffusion ckpt/safetensors 和 Diffusers之间的相互转换)

## 使用方法 (中国用户只需要按照这个安装教程操作)
- 进入kohya_ss文件夹根目录下,点击 setup.bat 启动安装程序 *(需要科学上网)
- 根据界面上给出的英文选项:
Kohya_ss GUI setup menu:

1. Install kohya_ss gui
2. (Optional) Install cudann files (avoid unless you really need it)
3. (Optional) Install specific bitsandbytes versions
4. (Optional) Manually configure accelerate
5. (Optional) Start Kohya_ss GUI in browser
6. Quit

Enter your choice: 1
* 模型转换(在Stable Diffision ckpt/safetensors与Diffusers之间转换)

1. Torch 1 (legacy, no longer supported. Will be removed in v21.9.x)
2. Torch 2 (recommended)
3. Cancel

Enter your choice: 2

开始安装环境依赖,接着再出来的选项,按照下列选项操作:
```txt
- This machine
- No distributed training
- NO
- NO
- NO
- all
- bf16
```
--------------------------------------------------------------------
这里都选择完毕,即可关闭终端窗口,直接点击 gui.bat或者 kohya中文启动器.bat 即可运行kohya
## 使用方法


当仓库内和note.com有相关文章,请参考那里。(未来可能全部移到这里)

* [关于训练,通用篇](./docs/train_README-zh.md): 数据准备和选项等
* [数据集设置](./docs/config_README-ja.md)
* [DreamBooth训练指南](./docs/train_db_README-zh.md)
* [fine-tuning指南](./docs/fine_tune_README_ja.md)
* [LoRA训练指南](./docs/train_network_README-zh.md)
* [文本反转训练指南](./docs/train_ti_README-ja.md)
* [通用部分的训练信息](./docs/train_README-ja.md): 数据准备和选项等
* [数据集设置](./docs/config_README-ja.md)
* [DreamBooth的训练信息](./docs/train_db_README-ja.md)
* [微调指南](./docs/fine_tune_README_ja.md)
* [LoRA的训练信息](./docs/train_network_README-ja.md)
* [Textual Inversion的训练信息](./docs/train_ti_README-ja.md)
* [图像生成脚本](./docs/gen_img_README-ja.md)
* note.com [模型转换脚本](https://note.com/kohya_ss/n/n374f316fe4ad)

## Windows环境所需程序
## Windows上需要的程序

需要Python 3.10.6和Git。

- Python 3.10.6: https://www.python.org/ftp/python/3.10.6/python-3.10.6-amd64.exe
- git: https://git-scm.com/download/win
- git: https://git-scm.com/download/win

如果要在PowerShell中使用venv,需要按以下步骤更改安全设置:
(不仅仅是venv,使脚本可以执行。请注意。)
如果要在PowerShell中使用,请按以下步骤更改安全设置以使用venv。
(不仅仅是venv,这使得脚本的执行成为可能,所以请注意。)

- 以管理员身份打开PowerShell
- 输入"Set-ExecutionPolicy Unrestricted",选择Y
- 关闭管理员PowerShell
- 以管理员身份打开PowerShell
- 输入Set-ExecutionPolicy Unrestricted”,并回答Y。
- 关闭管理员PowerShell

## 在Windows环境下安装

下例中安装的是PyTorch 1.12.1/CUDA 11.6版。如果要使用CUDA 11.3或PyTorch 1.13,请适当修改。
脚本已在PyTorch 2.0.1上通过测试。PyTorch 1.12.1也应该可以工作。

下例中,将安装PyTorch 2.0.1/CUDA 11.8版。如果使用CUDA 11.6版或PyTorch 1.12.1,请酌情更改。

(如果只显示"python",请将下例中的"python"改为"py")
(注意,如果python -m venv~这行只显示“python”,请将其更改为py -m venv~。)

在普通(非管理员)PowerShell中依次执行以下命令:
如果使用PowerShell,请打开常规(非管理员)PowerShell并按顺序执行以下操作:

```powershell
git clone https://github.com/kohya-ss/sd-scripts.git
git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts
python -m venv venv
.\venv\Scripts\activate
.\venv\Scripts\activate
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install --upgrade -r requirements.txt
pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
cp .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
pip install xformers==0.0.20
accelerate config
```

在命令提示符中:

```bat
git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts
在命令提示符下也相同。

python -m venv venv
.\venv\Scripts\activate
(注:由于 ``python -m venv venv````python -m venv --system-site-packages venv`` 更安全,已进行更改。如果global python中安装了package,后者会引发各种问题。)

pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install --upgrade -r requirements.txt
pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
在accelerate config的提示中,请按以下方式回答。(如果以bf16学习,最后一个问题回答bf16。)

copy /y .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
copy /y .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
copy /y .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
※从0.15.0开始,在日语环境中按方向键选择会崩溃(......)。请使用数字键0、1、2......进行选择。

accelerate config
```txt
- This machine
- No distributed training
- NO
- NO
- NO
- all
- fp16
```

accelerate config的问题请按以下回答:
(如果要用bf16训练,最后一个问题选择bf16)
※有时可能会出现 ``ValueError: fp16 mixed precision requires a GPU`` 错误。在这种情况下,对第6个问题 ``(What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:``
)回答“0”。(将使用id `0`的GPU)。

```
- 此计算机
- 不进行分布式训练
- 否
- 否
- 否
- 所有
- fp16
```
### 可选:``bitsandbytes``(8位优化器)

`bitsandbytes`现在是可选的。在Linux上,可以通过pip正常安装(推荐0.41.1或更高版本)。

在Windows上,推荐0.35.0或0.41.1。

- `bitsandbytes` 0.35.0: 似乎是稳定的版本。可以使用AdamW8bit,但不能使用其他一些8位优化器和`full_bf16`学习时的选项。
- `bitsandbytes` 0.41.1: 支持 Lion8bit、PagedAdamW8bit、PagedLion8bit。可以使用`full_bf16`

### PyTorch和xformers版本注意事项
注意:`bitsandbytes` 从0.35.0到0.41.0之间的版本似乎存在问题。 https://github.com/TimDettmers/bitsandbytes/issues/659

在其他版本中训练可能失败。如果没有特殊原因,请使用指定版本。
请按以下步骤安装`bitsandbytes`

### 使用0.35.0

以下是PowerShell的例子。在命令提示符中,请使用copy代替cp。

```powershell
cd sd-scripts
.\venv\Scripts\activate
pip install bitsandbytes==0.35.0
cp .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
```

### 使用0.41.1

请从[此处](https://github.com/jllllll/bitsandbytes-windows-webui)或其他地方安装jllllll发布的Windows whl文件。

```powershell
python -m pip install bitsandbytes==0.41.1 --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui
```


### 可选:使用Lion8bit
Expand Down
Loading

0 comments on commit 62fbae6

Please sign in to comment.