Skip to content

Commit

Permalink
Merge pull request #1429 from bmaltais/dev2
Browse files Browse the repository at this point in the history
v21.8.8
  • Loading branch information
bmaltais authored Aug 23, 2023
2 parents 463128c + 18beb14 commit 2853f4c
Show file tree
Hide file tree
Showing 5 changed files with 292 additions and 30 deletions.
5 changes: 1 addition & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -520,12 +520,9 @@ If you come across a `FileNotFoundError`, it is likely due to an installation is

* 2023/08/05 (v21.8.8)
- Fix issue with aiofiles: https://github.com/bmaltais/kohya_ss/issues/1359
- Merge sd-scripts updates as of Aug 18 2023
- Merge sd-scripts updates as of Aug 23 2023
- Add new blip2 caption processor tool
- Add dataset preparation tab to appropriate trainers
- Add GUI support for new block_lr lora network parameter
- Add support for experimental LoRA-FA network
- Fix LyCORIS extraction issue with code
* 2023/08/05 (v21.8.7)
- Add manual captioning option. Thanks to https://github.com/channelcat for this great contribution. (https://github.com/bmaltais/kohya_ss/pull/1352)
- Added support for `v_pred_like_loss` to the advanced training tab
124 changes: 121 additions & 3 deletions docs/train_lllite_README-ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ ComfyUIのカスタムノードを用意しています。: https://github.com/k
## モデルの学習

### データセットの準備
通常のdatasetに加え、`conditioning_data_dir` で指定したディレクトリにconditioning imageを格納してください。conditioning imageは学習用画像と同じbasenameを持つ必要があります。また、conditioning imageは学習用画像と同じサイズに自動的にリサイズされます。
通常のdatasetに加え、`conditioning_data_dir` で指定したディレクトリにconditioning imageを格納してください。conditioning imageは学習用画像と同じbasenameを持つ必要があります。また、conditioning imageは学習用画像と同じサイズに自動的にリサイズされます。conditioning imageにはキャプションファイルは不要です。

たとえば DreamBooth 方式でキャプションファイルを用いる場合の設定ファイルは以下のようになります。

```toml
[[datasets.subsets]]
Expand All @@ -32,9 +34,9 @@ conditioning_data_dir = "path/to/conditioning/image/dir"

現時点の制約として、random_cropは使用できません。

学習データとしては、元のモデルで生成した画像を学習用画像として、そこから加工した画像をconditioning imageとするのが良いようです。元モデルと異なる画風の画像を学習用画像とすると、制御に加えて、その画風についても学ぶ必要が生じます。ControlNet-LLLiteは容量が少ないため、画風学習には不向きです
学習データとしては、元のモデルで生成した画像を学習用画像として、そこから加工した画像をconditioning imageとした、合成によるデータセットを用いるのがもっとも簡単です(データセットの品質的には問題があるかもしれません)。具体的なデータセットの合成方法については後述します

もし生成画像以外を学習用画像とする場合には、後述の次元数を多めにしてください。
なお、元モデルと異なる画風の画像を学習用画像とすると、制御に加えて、その画風についても学ぶ必要が生じます。ControlNet-LLLiteは容量が少ないため、画風学習には不向きです。このような場合には、後述の次元数を多めにしてください。

### 学習
スクリプトで生成する場合は、`sdxl_train_control_net_lllite.py` を実行してください。`--cond_emb_dim` でconditioning image embeddingの次元数を指定できます。`--network_dim` でLoRA的モジュールのrankを指定できます。その他のオプションは`sdxl_train_network.py`に準じますが、`--network_module`の指定は不要です。
Expand All @@ -51,6 +53,122 @@ conditioning image embeddingの次元数は、サンプルのCannyでは32を指

`--guide_image_path`で推論に用いるconditioning imageを指定してください。なおpreprocessは行われないため、たとえばCannyならCanny処理を行った画像を指定してください(背景黒に白線)。`--control_net_preps`, `--control_net_weights`, `--control_net_ratios` には未対応です。

## データセットの合成方法

### 学習用画像の生成

学習のベースとなるモデルで画像生成を行います。Web UIやComfyUIなどで生成してください。画像サイズはモデルのデフォルトサイズで良いと思われます(1024x1024など)。bucketingを用いることもできます。その場合は適宜適切な解像度で生成してください。

生成時のキャプション等は、ControlNet-LLLiteの利用時に生成したい画像にあわせるのが良いと思われます。

生成した画像を任意のディレクトリに保存してください。このディレクトリをデータセットの設定ファイルで指定します。

当リポジトリ内の `sdxl_gen_img.py` でも生成できます。例えば以下のように実行します。

```dos
python sdxl_gen_img.py --ckpt path/to/model.safetensors --n_iter 1 --scale 10 --steps 36 --outdir path/to/output/dir --xformers --W 1024 --H 1024 --original_width 2048 --original_height 2048 --bf16 --sampler ddim --batch_size 4 --vae_batch_size 2 --images_per_prompt 512 --max_embeddings_multiples 1 --prompt "{portrait|digital art|anime screen cap|detailed illustration} of 1girl, {standing|sitting|walking|running|dancing} on {classroom|street|town|beach|indoors|outdoors}, {looking at viewer|looking away|looking at another}, {in|wearing} {shirt and skirt|school uniform|casual wear} { |, dynamic pose}, (solo), teen age, {0-1$$smile,|blush,|kind smile,|expression less,|happy,|sadness,} {0-1$$upper body,|full body,|cowboy shot,|face focus,} trending on pixiv, {0-2$$depth of fields,|8k wallpaper,|highly detailed,|pov,} {0-1$$summer, |winter, |spring, |autumn, } beautiful face { |, from below|, from above|, from side|, from behind|, from back} --n nsfw, bad face, lowres, low quality, worst quality, low effort, watermark, signature, ugly, poorly drawn"
```

VRAM 24GBの設定です。VRAMサイズにより`--batch_size` `--vae_batch_size`を調整してください。

`--prompt`でワイルドカードを利用してランダムに生成しています。適宜調整してください。

### 画像の加工

外部のプログラムを用いて、生成した画像を加工します。加工した画像を任意のディレクトリに保存してください。これらがconditioning imageになります。

加工にはたとえばCannyなら以下のようなスクリプトが使えます。

```python
import glob
import os
import random
import cv2
import numpy as np

IMAGES_DIR = "path/to/generated/images"
CANNY_DIR = "path/to/canny/images"

os.makedirs(CANNY_DIR, exist_ok=True)
img_files = glob.glob(IMAGES_DIR + "/*.png")
for img_file in img_files:
can_file = CANNY_DIR + "\\" + os.path.basename(img_file)
if os.path.exists(can_file):
print("Skip: " + img_file)
continue

print(img_file)

img = cv2.imread(img_file)

# random threshold
# while True:
# threshold1 = random.randint(0, 127)
# threshold2 = random.randint(128, 255)
# if threshold2 - threshold1 > 80:
# break

# fixed threshold
threshold1 = 100
threshold2 = 200

img = cv2.Canny(img, threshold1, threshold2)

cv2.imwrite(can_file, img)
```

### キャプションファイルの作成

学習用画像のbasenameと同じ名前で、それぞれの画像に対応したキャプションファイルを作成してください。生成時のプロンプトをそのまま利用すれば良いと思われます。

`sdxl_gen_img.py` で生成した場合は、画像内のメタデータに生成時のプロンプトが記録されていますので、以下のようなスクリプトで学習用画像と同じディレクトリにキャプションファイルを作成できます(拡張子 `.txt`)。

```python
import glob
import os
from PIL import Image

IMAGES_DIR = "path/to/generated/images"

img_files = glob.glob(IMAGES_DIR + "/*.png")
for img_file in img_files:
cap_file = img_file.replace(".png", ".txt")
if os.path.exists(cap_file):
print(f"Skip: {img_file}")
continue
print(img_file)

img = Image.open(img_file)
prompt = img.text["prompt"] if "prompt" in img.text else ""
if prompt == "":
print(f"Prompt not found in {img_file}")

with open(cap_file, "w") as f:
f.write(prompt + "\n")
```

### データセットの設定ファイルの作成

コマンドラインオプションからの指定も可能ですが、`.toml`ファイルを作成する場合は `conditioning_data_dir` に加工した画像を保存したディレクトリを指定します。

以下は設定ファイルの例です。

```toml
[general]
flip_aug = false
color_aug = false
resolution = [1024,1024]

[[datasets]]
batch_size = 8
enable_bucket = false

[[datasets.subsets]]
image_dir = "path/to/generated/image/dir"
caption_extension = ".txt"
conditioning_data_dir = "path/to/canny/image/dir"
```

## 謝辞

ControlNetの作者である lllyasviel 氏、実装上のアドバイスとトラブル解決へのご尽力をいただいた furusu 氏、ControlNetデータセットを実装していただいた ddPn08 氏に感謝いたします。
Expand Down
121 changes: 118 additions & 3 deletions docs/train_lllite_README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Due to the limitations of the inference environment, only CrossAttention (attn1

### Preparing the dataset

In addition to the normal dataset, please store the conditioning image in the directory specified by `conditioning_data_dir`. The conditioning image must have the same basename as the training image. The conditioning image will be automatically resized to the same size as the training image.
In addition to the normal dataset, please store the conditioning image in the directory specified by `conditioning_data_dir`. The conditioning image must have the same basename as the training image. The conditioning image will be automatically resized to the same size as the training image. The conditioning image does not require a caption file.

```toml
[[datasets.subsets]]
Expand All @@ -37,9 +37,9 @@ conditioning_data_dir = "path/to/conditioning/image/dir"

At the moment, random_crop cannot be used.

As a training data, it seems to be better to use the images generated by the original model as training images and the images processed from them as conditioning images. If you use images with a different style from the original model as training images, the model will have to learn not only the control but also the style. ControlNet-LLLite is not suitable for style learning because of its small capacity.
For training data, it is easiest to use a synthetic dataset with the original model-generated images as training images and processed images as conditioning images (the quality of the dataset may be problematic). See below for specific methods of synthesizing datasets.

If you use images other than the generated images as training images, please increase the dimension as described below.
Note that if you use an image with a different art style than the original model as a training image, the model will have to learn not only the control but also the art style. ControlNet-LLLite has a small capacity, so it is not suitable for learning art styles. In such cases, increase the number of dimensions as described below.

### Training

Expand All @@ -57,6 +57,121 @@ If you want to generate images with a script, run `sdxl_gen_img.py`. You can spe

Specify the conditioning image to be used for inference with `--guide_image_path`. Since preprocess is not performed, if it is Canny, specify an image processed with Canny (white line on black background). `--control_net_preps`, `--control_net_weights`, and `--control_net_ratios` are not supported.

## How to synthesize a dataset

### Generating training images

Generate images with the base model for training. Please generate them with Web UI or ComfyUI etc. The image size should be the default size of the model (1024x1024, etc.). You can also use bucketing. In that case, please generate it at an arbitrary resolution.

The captions and other settings when generating the images should be the same as when generating the images with the trained ControlNet-LLLite model.

Save the generated images in an arbitrary directory. Specify this directory in the dataset configuration file.


You can also generate them with `sdxl_gen_img.py` in this repository. For example, run as follows:

```dos
python sdxl_gen_img.py --ckpt path/to/model.safetensors --n_iter 1 --scale 10 --steps 36 --outdir path/to/output/dir --xformers --W 1024 --H 1024 --original_width 2048 --original_height 2048 --bf16 --sampler ddim --batch_size 4 --vae_batch_size 2 --images_per_prompt 512 --max_embeddings_multiples 1 --prompt "{portrait|digital art|anime screen cap|detailed illustration} of 1girl, {standing|sitting|walking|running|dancing} on {classroom|street|town|beach|indoors|outdoors}, {looking at viewer|looking away|looking at another}, {in|wearing} {shirt and skirt|school uniform|casual wear} { |, dynamic pose}, (solo), teen age, {0-1$$smile,|blush,|kind smile,|expression less,|happy,|sadness,} {0-1$$upper body,|full body,|cowboy shot,|face focus,} trending on pixiv, {0-2$$depth of fields,|8k wallpaper,|highly detailed,|pov,} {0-1$$summer, |winter, |spring, |autumn, } beautiful face { |, from below|, from above|, from side|, from behind|, from back} --n nsfw, bad face, lowres, low quality, worst quality, low effort, watermark, signature, ugly, poorly drawn"
```

This is a setting for VRAM 24GB. Adjust `--batch_size` and `--vae_batch_size` according to the VRAM size.

The images are generated randomly using wildcards in `--prompt`. Adjust as necessary.

### Processing images

Use an external program to process the generated images. Save the processed images in an arbitrary directory. These will be the conditioning images.

For example, you can use the following script to process the images with Canny.

```python
import glob
import os
import random
import cv2
import numpy as np

IMAGES_DIR = "path/to/generated/images"
CANNY_DIR = "path/to/canny/images"

os.makedirs(CANNY_DIR, exist_ok=True)
img_files = glob.glob(IMAGES_DIR + "/*.png")
for img_file in img_files:
can_file = CANNY_DIR + "\\" + os.path.basename(img_file)
if os.path.exists(can_file):
print("Skip: " + img_file)
continue

print(img_file)

img = cv2.imread(img_file)

# random threshold
# while True:
# threshold1 = random.randint(0, 127)
# threshold2 = random.randint(128, 255)
# if threshold2 - threshold1 > 80:
# break

# fixed threshold
threshold1 = 100
threshold2 = 200

img = cv2.Canny(img, threshold1, threshold2)

cv2.imwrite(can_file, img)
```

### Creating caption files

Create a caption file for each image with the same basename as the training image. It is fine to use the same caption as the one used when generating the image.

If you generated the images with `sdxl_gen_img.py`, you can use the following script to create the caption files (`*.txt`) from the metadata in the generated images.

```python
import glob
import os
from PIL import Image

IMAGES_DIR = "path/to/generated/images"

img_files = glob.glob(IMAGES_DIR + "/*.png")
for img_file in img_files:
cap_file = img_file.replace(".png", ".txt")
if os.path.exists(cap_file):
print(f"Skip: {img_file}")
continue
print(img_file)

img = Image.open(img_file)
prompt = img.text["prompt"] if "prompt" in img.text else ""
if prompt == "":
print(f"Prompt not found in {img_file}")

with open(cap_file, "w") as f:
f.write(prompt + "\n")
```

### Creating a dataset configuration file

You can use the command line arguments of `sdxl_train_control_net_lllite.py` to specify the conditioning image directory. However, if you want to use a `.toml` file, specify the conditioning image directory in `conditioning_data_dir`.

```toml
[general]
flip_aug = false
color_aug = false
resolution = [1024,1024]

[[datasets]]
batch_size = 8
enable_bucket = false

[[datasets.subsets]]
image_dir = "path/to/generated/image/dir"
caption_extension = ".txt"
conditioning_data_dir = "path/to/canny/image/dir"
```

## Credit

I would like to thank lllyasviel, the author of ControlNet, furusu, who provided me with advice on implementation and helped me solve problems, and ddPn08, who implemented the ControlNet dataset.
Expand Down
24 changes: 20 additions & 4 deletions networks/control_net_lllite.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,14 +33,15 @@


class LLLiteModule(torch.nn.Module):
def __init__(self, depth, cond_emb_dim, name, org_module, mlp_dim, dropout=None):
def __init__(self, depth, cond_emb_dim, name, org_module, mlp_dim, dropout=None, multiplier=1.0):
super().__init__()

self.is_conv2d = org_module.__class__.__name__ == "Conv2d"
self.lllite_name = name
self.cond_emb_dim = cond_emb_dim
self.org_module = [org_module]
self.dropout = dropout
self.multiplier = multiplier

if self.is_conv2d:
in_dim = org_module.in_channels
Expand Down Expand Up @@ -119,6 +120,10 @@ def set_cond_image(self, cond_image):
中でモデルを呼び出すので必要ならwith torch.no_grad()で囲む
/ call the model inside, so if necessary, surround it with torch.no_grad()
"""
if cond_image is None:
self.cond_emb = None
return

# timestepごとに呼ばれないので、あらかじめ計算しておく / it is not called for each timestep, so calculate it in advance
# print(f"C {self.lllite_name}, cond_image.shape={cond_image.shape}")
cx = self.conditioning1(cond_image)
Expand All @@ -141,6 +146,9 @@ def forward(self, x):
学習用の便利forward。元のモジュールのforwardを呼び出す
/ convenient forward for training. call the forward of the original module
"""
if self.multiplier == 0.0 or self.cond_emb is None:
return self.org_forward(x)

cx = self.cond_emb

if not self.batch_cond_only and x.shape[0] // 2 == cx.shape[0]: # inference only
Expand All @@ -160,11 +168,13 @@ def forward(self, x):
if self.dropout is not None and self.training:
cx = torch.nn.functional.dropout(cx, p=self.dropout)

cx = self.up(cx)
cx = self.up(cx) * self.multiplier

# residua (x) lを加算して元のforwardを呼び出す / add residual (x) and call the original forward
# residual (x) を加算して元のforwardを呼び出す / add residual (x) and call the original forward
if self.batch_cond_only:
cx = torch.zeros_like(x)[1::2] + cx
zx = torch.zeros_like(x)
zx[1::2] += cx
cx = zx

x = self.org_forward(x + cx) # ここで元のモジュールを呼び出す / call the original module here
return x
Expand All @@ -181,6 +191,7 @@ def __init__(
mlp_dim: int = 16,
dropout: Optional[float] = None,
varbose: Optional[bool] = False,
multiplier: Optional[float] = 1.0,
) -> None:
super().__init__()
# self.unets = [unet]
Expand Down Expand Up @@ -264,6 +275,7 @@ def create_modules(
child_module,
mlp_dim,
dropout=dropout,
multiplier=multiplier,
)
modules.append(module)
return modules
Expand Down Expand Up @@ -291,6 +303,10 @@ def set_batch_cond_only(self, cond_only, zeros):
for module in self.unet_modules:
module.set_batch_cond_only(cond_only, zeros)

def set_multiplier(self, multiplier):
for module in self.unet_modules:
module.multiplier = multiplier

def load_weights(self, file):
if os.path.splitext(file)[1] == ".safetensors":
from safetensors.torch import load_file
Expand Down
Loading

0 comments on commit 2853f4c

Please sign in to comment.