diff --git a/README-ja.md b/README-ja.md index e052d6bf1..6f0e574e2 100644 --- a/README-ja.md +++ b/README-ja.md @@ -16,13 +16,13 @@ GUIやPowerShellスクリプトなど、より使いやすくする機能が[bma 当リポジトリ内およびnote.comに記事がありますのでそちらをご覧ください(将来的にはすべてこちらへ移すかもしれません)。 -* [学習について、共通編](./train_README-ja.md) : データ整備やオプションなど - * [データセット設定](./config_README-ja.md) -* [DreamBoothの学習について](./train_db_README-ja.md) -* [fine-tuningのガイド](./fine_tune_README_ja.md): -* [LoRAの学習について](./train_network_README-ja.md) -* [Textual Inversionの学習について](./train_ti_README-ja.md) -* note.com [画像生成スクリプト](https://note.com/kohya_ss/n/n2693183a798e) +* [学習について、共通編](./docs/train_README-ja.md) : データ整備やオプションなど + * [データセット設定](./docs/config_README-ja.md) +* [DreamBoothの学習について](./docs/train_db_README-ja.md) +* [fine-tuningのガイド](./docs/fine_tune_README_ja.md): +* [LoRAの学習について](./docs/train_network_README-ja.md) +* [Textual Inversionの学習について](./docs/train_ti_README-ja.md) +* [画像生成スクリプト](./docs/gen_img_README-ja.md) * note.com [モデル変換スクリプト](https://note.com/kohya_ss/n/n374f316fe4ad) ## Windowsでの動作に必要なプログラム diff --git a/README.md b/README.md index 0f60e8923..833209503 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,15 @@ This repository provides a Windows-focused Gradio GUI for [Kohya's Stable Diffus ### Table of Contents - [Tutorials](#tutorials) +* [Training guide - common](./docs/train_README-ja.md) : data preparation, options etc... + * [Chinese version](./docs/train_README-zh.md) +* [Dataset config](./docs/config_README-ja.md) +* [DreamBooth training guide](./docs/train_db_README-ja.md) +* [Step by Step fine-tuning guide](./docs/fine_tune_README_ja.md): +* [Training LoRA](./docs/train_network_README-ja.md) +* [training Textual Inversion](./docs/train_ti_README-ja.md) +* [Image generation](./docs/gen_img_README-ja.md) +* note.com [Model conversion](https://note.com/kohya_ss/n/n374f316fe4ad) - [Required Dependencies](#required-dependencies) - [Linux/macOS](#linux-and-macos-dependencies) - [Installation](#installation) @@ -282,7 +291,7 @@ The LoRA supported by `train_network.py` has been named to avoid confusion. The LoRA-LierLa is the default LoRA type for `train_network.py` (without `conv_dim` network arg). LoRA-LierLa can be used with [our extension](https://github.com/kohya-ss/sd-webui-additional-networks) for AUTOMATIC1111's Web UI, or with the built-in LoRA feature of the Web UI. -To use LoRA-C3Liar with Web UI, please use our extension. +To use LoRA-C3Lier with Web UI, please use our extension. ## Sample image generation during training A prompt file might look like this, for example @@ -334,6 +343,18 @@ This will store a backup file with your current locally installed pip packages a * 2023/04/07 (v21.5.10) - Fix issue https://github.com/bmaltais/kohya_ss/issues/734 + - The documentation has been moved to the `docs` folder. If you have links, please change them. + - DAdaptAdaGrad, DAdaptAdan, and DAdaptSGD are now supported by DAdaptation. [PR#455](https://github.com/kohya-ss/sd-scripts/pull/455) Thanks to sdbds! + - DAdaptation needs to be installed. Also, depending on the optimizer, DAdaptation may need to be updated. Please update with `pip install --upgrade dadaptation`. + - Added support for pre-calculation of LoRA weights in image generation scripts. Specify `--network_pre_calc`. + - The prompt option `--am` is available. Also, it is disabled when Regional LoRA is used. + - Added Adaptive noise scale to each training script. Specify a number with `--adaptive_noise_scale` to enable it. + - __Experimental option. It may be removed or changed in the future.__ + - This is an original implementation that automatically adjusts the value of the noise offset according to the absolute value of the mean of each channel of the latents. It is expected that appropriate noise offsets will be set for bright and dark images, respectively. + - Specify it together with `--noise_offset`. + - The actual value of the noise offset is calculated as `noise_offset + abs(mean(latents, dim=(2,3))) * adaptive_noise_scale`. Since the latent is close to a normal distribution, it may be a good idea to specify a value of about 1/10 to the same as the noise offset. + - Negative values can also be specified, in which case the noise offset will be clipped to 0 or more. + - Other minor fixes. * 2023/04/06 (v21.5.9) - Inplement headless mode to enable easier support under headless services like vast.ai. To make use of it start the gui with the `--headless` argument like: diff --git a/config_README-ja.md b/docs/config_README-ja.md similarity index 100% rename from config_README-ja.md rename to docs/config_README-ja.md diff --git a/fine_tune_README_ja.md b/docs/fine_tune_README_ja.md similarity index 100% rename from fine_tune_README_ja.md rename to docs/fine_tune_README_ja.md diff --git a/gen_img_README-ja.md b/docs/gen_img_README-ja.md similarity index 98% rename from gen_img_README-ja.md rename to docs/gen_img_README-ja.md index b8864026c..cf35f1df7 100644 --- a/gen_img_README-ja.md +++ b/docs/gen_img_README-ja.md @@ -153,7 +153,9 @@ python gen_img_diffusers.py --ckpt <モデル名> --outdir <画像出力先> - `--network_mul`:使用する追加ネットワークの重みを何倍にするかを指定します。デフォルトは`1`です。`--network_mul 0.8`のように指定します。複数のLoRAを使用する場合は`--network_mul 0.4 0.5 0.7`のように指定します。引数の数は`--network_module`で指定した数と同じにしてください。 -- `--network_merge`:使用する追加ネットワークの重みを`--network_mul`に指定した重みであらかじめマージします。プロンプトオプションの`--am`は使用できなくなりますが、LoRA未使用時と同じ程度まで生成が高速化されます。 +- `--network_merge`:使用する追加ネットワークの重みを`--network_mul`に指定した重みであらかじめマージします。`--network_pre_calc` と同時に使用できません。プロンプトオプションの`--am`、およびRegional LoRAは使用できなくなりますが、LoRA未使用時と同じ程度まで生成が高速化されます。 + +- `--network_pre_calc`:使用する追加ネットワークの重みを生成ごとにあらかじめ計算します。プロンプトオプションの`--am`が使用できます。LoRA未使用時と同じ程度まで生成は高速化されますが、生成前に重みを計算する時間が必要で、またメモリ使用量も若干増加します。Regional LoRA使用時は無効になります 。 # 主なオプションの指定例 diff --git a/train_README-ja.md b/docs/train_README-ja.md similarity index 93% rename from train_README-ja.md rename to docs/train_README-ja.md index a155febd9..f27c5c654 100644 --- a/train_README-ja.md +++ b/docs/train_README-ja.md @@ -463,27 +463,6 @@ masterpiece, best quality, 1boy, in business suit, standing at street, looking b xformersオプションを指定するとxformersのCrossAttentionを用います。xformersをインストールしていない場合やエラーとなる場合(環境にもよりますが `mixed_precision="no"` の場合など)、代わりに `mem_eff_attn` オプションを指定すると省メモリ版CrossAttentionを使用します(xformersよりも速度は遅くなります)。 -- `--save_precision` - - 保存時のデータ精度を指定します。save_precisionオプションにfloat、fp16、bf16のいずれかを指定すると、その形式でモデルを保存します(DreamBooth、fine tuningでDiffusers形式でモデルを保存する場合は無効です)。モデルのサイズを削減したい場合などにお使いください。 - -- `--save_every_n_epochs` / `--save_state` / `--resume` - save_every_n_epochsオプションに数値を指定すると、そのエポックごとに学習途中のモデルを保存します。 - - save_stateオプションを同時に指定すると、optimizer等の状態も含めた学習状態を合わせて保存します(保存したモデルからも学習再開できますが、それに比べると精度の向上、学習時間の短縮が期待できます)。保存先はフォルダになります。 - - 学習状態は保存先フォルダに `-??????-state`(??????はエポック数)という名前のフォルダで出力されます。長時間にわたる学習時にご利用ください。 - - 保存された学習状態から学習を再開するにはresumeオプションを使います。学習状態のフォルダ(`output_dir` ではなくその中のstateのフォルダ)を指定してください。 - - なおAcceleratorの仕様により、エポック数、global stepは保存されておらず、resumeしたときにも1からになりますがご容赦ください。 - -- `--save_model_as` (DreamBooth, fine tuning のみ) - - モデルの保存形式を`ckpt, safetensors, diffusers, diffusers_safetensors` から選べます。 - - `--save_model_as=safetensors` のように指定します。Stable Diffusion形式(ckptまたはsafetensors)を読み込み、Diffusers形式で保存する場合、不足する情報はHugging Faceからv1.5またはv2.1の情報を落としてきて補完します。 - - `--clip_skip` `2` を指定すると、Text Encoder (CLIP) の後ろから二番目の層の出力を用います。1またはオプション省略時は最後の層を用います。 @@ -502,6 +481,12 @@ masterpiece, best quality, 1boy, in business suit, standing at street, looking b clip_skipと同様に、モデルの学習状態と異なる長さで学習するには、ある程度の教師データ枚数、長めの学習時間が必要になると思われます。 +- `--weighted_captions` + + 指定するとAutomatic1111氏のWeb UIと同様の重み付きキャプションが有効になります。「Textual Inversion と XTI」以外の学習に使用できます。キャプションだけでなく DreamBooth 手法の token string でも有効です。 + + 重みづけキャプションの記法はWeb UIとほぼ同じで、(abc)や[abc]、(abc:1.23)などが使用できます。入れ子も可能です。括弧内にカンマを含めるとプロンプトのshuffle/dropoutで括弧の対応付けがおかしくなるため、括弧内にはカンマを含めないでください。 + - `--persistent_data_loader_workers` Windows環境で指定するとエポック間の待ち時間が大幅に短縮されます。 @@ -527,12 +512,28 @@ masterpiece, best quality, 1boy, in business suit, standing at street, looking b その後ブラウザを開き、http://localhost:6006/ へアクセスすると表示されます。 +- `--log_with` / `--log_tracker_name` + + 学習ログの保存に関するオプションです。`tensorboard` だけでなく `wandb`への保存が可能です。詳細は [PR#428](https://github.com/kohya-ss/sd-scripts/pull/428)をご覧ください。 + - `--noise_offset` こちらの記事の実装になります: https://www.crosslabs.org//blog/diffusion-with-offset-noise 全体的に暗い、明るい画像の生成結果が良くなる可能性があるようです。LoRA学習でも有効なようです。`0.1` 程度の値を指定するとよいようです。 +- `--adaptive_noise_scale` (実験的オプション) + + Noise offsetの値を、latentsの各チャネルの平均値の絶対値に応じて自動調整するオプションです。`--noise_offset` と同時に指定することで有効になります。Noise offsetの値は `noise_offset + abs(mean(latents, dim=(2,3))) * adaptive_noise_scale` で計算されます。latentは正規分布に近いためnoise_offsetの1/10~同程度の値を指定するとよいかもしれません。 + + 負の値も指定でき、その場合はnoise offsetは0以上にclipされます。 + +- `--multires_noise_iterations` / `--multires_noise_discount` + + Multi resolution noise (pyramid noise)の設定です。詳細は [PR#471](https://github.com/kohya-ss/sd-scripts/pull/471) およびこちらのページ [Multi-Resolution Noise for Diffusion Model Training](https://wandb.ai/johnowhitaker/multires_noise/reports/Multi-Resolution-Noise-for-Diffusion-Model-Training--VmlldzozNjYyOTU2) を参照してください。 + + `--multires_noise_iterations` に数値を指定すると有効になります。6~10程度の値が良いようです。`--multires_noise_discount` に0.1~0.3 程度の値(LoRA学習等比較的データセットが小さい場合のPR作者の推奨)、ないしは0.8程度の値(元記事の推奨)を指定してください(デフォルトは 0.3)。 + - `--debug_dataset` このオプションを付けることで学習を行う前に事前にどのような画像データ、キャプションで学習されるかを確認できます。Escキーを押すと終了してコマンドラインに戻ります。`S`キーで次のステップ(バッチ)、`E`キーで次のエポックに進みます。 @@ -545,14 +546,62 @@ masterpiece, best quality, 1boy, in business suit, standing at street, looking b DreamBoothおよびfine tuningでは、保存されるモデルはこのVAEを組み込んだものになります。 -- `--cache_latents` +- `--cache_latents` / `--cache_latents_to_disk` 使用VRAMを減らすためVAEの出力をメインメモリにキャッシュします。`flip_aug` 以外のaugmentationは使えなくなります。また全体の学習速度が若干速くなります。 + cache_latents_to_diskを指定するとキャッシュをディスクに保存します。スクリプトを終了し、再度起動した場合もキャッシュが有効になります。 + - `--min_snr_gamma` Min-SNR Weighting strategyを指定します。詳細は[こちら](https://github.com/kohya-ss/sd-scripts/pull/308)を参照してください。論文では`5`が推奨されています。 +## モデルの保存に関する設定 + +- `--save_precision` + + 保存時のデータ精度を指定します。save_precisionオプションにfloat、fp16、bf16のいずれかを指定すると、その形式でモデルを保存します(DreamBooth、fine tuningでDiffusers形式でモデルを保存する場合は無効です)。モデルのサイズを削減したい場合などにお使いください。 + +- `--save_every_n_epochs` / `--save_state` / `--resume` + + save_every_n_epochsオプションに数値を指定すると、そのエポックごとに学習途中のモデルを保存します。 + + save_stateオプションを同時に指定すると、optimizer等の状態も含めた学習状態を合わせて保存します(保存したモデルからも学習再開できますが、それに比べると精度の向上、学習時間の短縮が期待できます)。保存先はフォルダになります。 + + 学習状態は保存先フォルダに `-??????-state`(??????はエポック数)という名前のフォルダで出力されます。長時間にわたる学習時にご利用ください。 + + 保存された学習状態から学習を再開するにはresumeオプションを使います。学習状態のフォルダ(`output_dir` ではなくその中のstateのフォルダ)を指定してください。 + + なおAcceleratorの仕様により、エポック数、global stepは保存されておらず、resumeしたときにも1からになりますがご容赦ください。 + +- `--save_every_n_steps` + + save_every_n_stepsオプションに数値を指定すると、そのステップごとに学習途中のモデルを保存します。save_every_n_epochsと同時に指定できます。 + +- `--save_model_as` (DreamBooth, fine tuning のみ) + + モデルの保存形式を`ckpt, safetensors, diffusers, diffusers_safetensors` から選べます。 + + `--save_model_as=safetensors` のように指定します。Stable Diffusion形式(ckptまたはsafetensors)を読み込み、Diffusers形式で保存する場合、不足する情報はHugging Faceからv1.5またはv2.1の情報を落としてきて補完します。 + +- `--huggingface_repo_id` 等 + + huggingface_repo_idが指定されているとモデル保存時に同時にHuggingFaceにアップロードします。アクセストークンの取り扱いに注意してください(HuggingFaceのドキュメントを参照してください)。 + + 他の引数をたとえば以下のように指定してください。 + + - `--huggingface_repo_id "your-hf-name/your-model" --huggingface_path_in_repo "path" --huggingface_repo_type model --huggingface_repo_visibility private --huggingface_token hf_YourAccessTokenHere` + + huggingface_repo_visibilityに`public`を指定するとリポジトリが公開されます。省略時または`private`(などpublic以外)を指定すると非公開になります。 + + `--save_state`オプション指定時に`--save_state_to_huggingface`を指定するとstateもアップロードします。 + + `--resume`オプション指定時に`--resume_from_huggingface`を指定するとHuggingFaceからstateをダウンロードして再開します。その時の --resumeオプションは `--resume {repo_id}/{path_in_repo}:{revision}:{repo_type}`になります。 + + 例: `--resume_from_huggingface --resume your-hf-name/your-model/path/test-000002-state:main:model` + + `--async_upload`オプションを指定するとアップロードを非同期で行います。 + ## オプティマイザ関係 - `--optimizer_type` @@ -566,7 +615,10 @@ masterpiece, best quality, 1boy, in business suit, standing at street, looking b - Lion8bit : 引数は同上 - SGDNesterov : [torch.optim.SGD](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html), nesterov=True - SGDNesterov8bit : 引数は同上 - - DAdaptation : https://github.com/facebookresearch/dadaptation + - DAdaptation(DAdaptAdam) : https://github.com/facebookresearch/dadaptation + - DAdaptAdaGrad : 引数は同上 + - DAdaptAdan : 引数は同上 + - DAdaptSGD : 引数は同上 - AdaFactor : [Transformers AdaFactor](https://huggingface.co/docs/transformers/main_classes/optimizer_schedules) - 任意のオプティマイザ diff --git a/train_README-zh.md b/docs/train_README-zh.md similarity index 100% rename from train_README-zh.md rename to docs/train_README-zh.md diff --git a/train_db_README-ja.md b/docs/train_db_README-ja.md similarity index 100% rename from train_db_README-ja.md rename to docs/train_db_README-ja.md diff --git a/train_db_README-zh.md b/docs/train_db_README-zh.md similarity index 100% rename from train_db_README-zh.md rename to docs/train_db_README-zh.md diff --git a/train_network_README-ja.md b/docs/train_network_README-ja.md similarity index 100% rename from train_network_README-ja.md rename to docs/train_network_README-ja.md diff --git a/train_network_README-zh.md b/docs/train_network_README-zh.md similarity index 100% rename from train_network_README-zh.md rename to docs/train_network_README-zh.md diff --git a/train_ti_README-ja.md b/docs/train_ti_README-ja.md similarity index 100% rename from train_ti_README-ja.md rename to docs/train_ti_README-ja.md diff --git a/dreambooth_gui.py b/dreambooth_gui.py index 0bbe9b3bb..2acc9157a 100644 --- a/dreambooth_gui.py +++ b/dreambooth_gui.py @@ -104,7 +104,7 @@ def save_configuration( caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, sample_every_n_steps, @@ -222,7 +222,7 @@ def open_configuration( caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, sample_every_n_steps, @@ -323,7 +323,7 @@ def train_model( caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, sample_every_n_steps, @@ -387,15 +387,15 @@ def train_model( ) lr_warmup = '0' - if float(noise_offset) > 0 and ( - multires_noise_iterations > 0 or multires_noise_discount > 0 - ): - output_message( - msg="noise offset and multires_noise can't be set at the same time. Only use one or the other.", - title='Error', - headless=headless_bool, - ) - return + # if float(noise_offset) > 0 and ( + # multires_noise_iterations > 0 or multires_noise_discount > 0 + # ): + # output_message( + # msg="noise offset and multires_noise can't be set at the same time. Only use one or the other.", + # title='Error', + # headless=headless_bool, + # ) + # return # Get a list of all subfolders in train_data_dir, excluding hidden folders subfolders = [ @@ -477,6 +477,7 @@ def train_model( math.ceil( float(total_steps) / int(train_batch_size) + / int(gradient_accumulation_steps) * int(epoch) * int(reg_factor) ) @@ -582,7 +583,9 @@ def train_model( bucket_reso_steps=bucket_reso_steps, caption_dropout_every_n_epochs=caption_dropout_every_n_epochs, caption_dropout_rate=caption_dropout_rate, + noise_offset_type=noise_offset_type, noise_offset=noise_offset, + adaptive_noise_scale=adaptive_noise_scale, multires_noise_iterations=multires_noise_iterations, multires_noise_discount=multires_noise_discount, additional_parameters=additional_parameters, @@ -805,7 +808,7 @@ def dreambooth_tab( bucket_reso_steps, caption_dropout_every_n_epochs, caption_dropout_rate, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, additional_parameters, @@ -912,7 +915,7 @@ def dreambooth_tab( caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, sample_every_n_steps, diff --git a/fine_tune.py b/fine_tune.py index 442bd1326..057615013 100644 --- a/fine_tune.py +++ b/fine_tune.py @@ -21,7 +21,7 @@ BlueprintGenerator, ) import library.custom_train_functions as custom_train_functions -from library.custom_train_functions import apply_snr_weight, get_weighted_text_embeddings, pyramid_noise_like +from library.custom_train_functions import apply_snr_weight, get_weighted_text_embeddings, pyramid_noise_like, apply_noise_offset def train(args): @@ -305,8 +305,7 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module): # Sample noise that we'll add to the latents noise = torch.randn_like(latents, device=latents.device) if args.noise_offset: - # https://www.crosslabs.org//blog/diffusion-with-offset-noise - noise += args.noise_offset * torch.randn((latents.shape[0], latents.shape[1], 1, 1), device=latents.device) + noise = apply_noise_offset(latents, noise, args.noise_offset, args.adaptive_noise_scale) elif args.multires_noise_iterations: noise = pyramid_noise_like(noise, latents.device, args.multires_noise_iterations, args.multires_noise_discount) @@ -381,7 +380,7 @@ def fn_recursive_set_mem_eff(module: torch.nn.Module): current_loss = loss.detach().item() # 平均なのでbatch sizeは関係ないはず if args.logging_dir is not None: logs = {"loss": current_loss, "lr": float(lr_scheduler.get_last_lr()[0])} - if args.optimizer_type.lower() == "DAdaptation".lower(): # tracking d*lr value + if args.optimizer_type.lower().startswith("DAdapt".lower()): # tracking d*lr value logs["lr/d*lr"] = ( lr_scheduler.optimizers[0].param_groups[0]["d"] * lr_scheduler.optimizers[0].param_groups[0]["lr"] ) diff --git a/finetune_gui.py b/finetune_gui.py index 7c00400eb..525d62a3b 100644 --- a/finetune_gui.py +++ b/finetune_gui.py @@ -103,7 +103,7 @@ def save_configuration( caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, sample_every_n_steps, @@ -227,7 +227,7 @@ def open_configuration( caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, sample_every_n_steps, @@ -334,7 +334,7 @@ def train_model( caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, sample_every_n_steps, @@ -358,15 +358,15 @@ def train_model( ): return - if float(noise_offset) > 0 and ( - multires_noise_iterations > 0 or multires_noise_discount > 0 - ): - output_message( - msg="noise offset and multires_noise can't be set at the same time. Only use one or the other.", - title='Error', - headless=headless_bool, - ) - return + # if float(noise_offset) > 0 and ( + # multires_noise_iterations > 0 or multires_noise_discount > 0 + # ): + # output_message( + # msg="noise offset and multires_noise can't be set at the same time. Only use one or the other.", + # title='Error', + # headless=headless_bool, + # ) + # return if optimizer == 'Adafactor' and lr_warmup != '0': output_message( @@ -440,7 +440,12 @@ def train_model( # calculate max_train_steps max_train_steps = int( - math.ceil(float(repeats) / int(train_batch_size) * int(epoch)) + math.ceil( + float(repeats) + / int(train_batch_size) + / int(gradient_accumulation_steps) + * int(epoch) + ) ) # Divide by two because flip augmentation create two copied of the source images @@ -532,7 +537,9 @@ def train_model( bucket_reso_steps=bucket_reso_steps, caption_dropout_every_n_epochs=caption_dropout_every_n_epochs, caption_dropout_rate=caption_dropout_rate, + noise_offset_type=noise_offset_type, noise_offset=noise_offset, + adaptive_noise_scale=adaptive_noise_scale, multires_noise_iterations=multires_noise_iterations, multires_noise_discount=multires_noise_discount, additional_parameters=additional_parameters, @@ -771,7 +778,7 @@ def finetune_tab(headless=False): bucket_reso_steps, caption_dropout_every_n_epochs, caption_dropout_rate, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, additional_parameters, @@ -871,7 +878,7 @@ def finetune_tab(headless=False): caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, sample_every_n_steps, diff --git a/gen_img_diffusers.py b/gen_img_diffusers.py index 988eae754..60a249726 100644 --- a/gen_img_diffusers.py +++ b/gen_img_diffusers.py @@ -2262,6 +2262,8 @@ def __getattr__(self, item): if args.network_module: networks = [] network_default_muls = [] + network_pre_calc=args.network_pre_calc + for i, network_module in enumerate(args.network_module): print("import network module:", network_module) imported_module = importlib.import_module(network_module) @@ -2298,11 +2300,11 @@ def __getattr__(self, item): if network is None: return - mergiable = hasattr(network, "merge_to") - if args.network_merge and not mergiable: + mergeable = network.is_mergeable() + if args.network_merge and not mergeable: print("network is not mergiable. ignore merge option.") - if not args.network_merge or not mergiable: + if not args.network_merge or not mergeable: network.apply_to(text_encoder, unet) info = network.load_state_dict(weights_sd, False) # network.load_weightsを使うようにするとよい print(f"weights are loaded: {info}") @@ -2311,6 +2313,10 @@ def __getattr__(self, item): network.to(memory_format=torch.channels_last) network.to(dtype).to(device) + if network_pre_calc: + print("backup original weights") + network.backup_weights() + networks.append(network) else: network.merge_to(text_encoder, unet, weights_sd, dtype, device) @@ -2815,11 +2821,19 @@ def process_batch(batch: List[BatchData], highres_fix, highres_1st=False): # generate if networks: + # 追加ネットワークの処理 shared = {} for n, m in zip(networks, network_muls if network_muls else network_default_muls): n.set_multiplier(m) if regional_network: n.set_current_generation(batch_size, num_sub_prompts, width, height, shared) + + if not regional_network and network_pre_calc: + for n in networks: + n.restore_weights() + for n in networks: + n.pre_calculation() + print("pre-calculation... done") images = pipe( prompts, @@ -3204,6 +3218,7 @@ def setup_parser() -> argparse.ArgumentParser: ) parser.add_argument("--network_show_meta", action="store_true", help="show metadata of network model / ネットワークモデルのメタデータを表示する") parser.add_argument("--network_merge", action="store_true", help="merge network weights to original model / ネットワークの重みをマージする") + parser.add_argument("--network_pre_calc", action="store_true", help="pre-calculate network for generation / ネットワークのあらかじめ計算して生成する") parser.add_argument( "--textual_inversion_embeddings", type=str, diff --git a/library/common_gui.py b/library/common_gui.py index 71c3ce5df..e9ba737a7 100644 --- a/library/common_gui.py +++ b/library/common_gui.py @@ -785,6 +785,9 @@ def gradio_training( 'AdamW8bit', 'Adafactor', 'DAdaptation', + 'DAdaptAdaGrad', + 'DAdaptAdan', + 'DAdaptSGD', 'Lion', 'Lion8bit', 'SGDNesterov', @@ -818,52 +821,75 @@ def gradio_training( def run_cmd_training(**kwargs): - options = [ - f' --learning_rate="{kwargs.get("learning_rate", "")}"' - if kwargs.get('learning_rate') - else '', - f' --lr_scheduler="{kwargs.get("lr_scheduler", "")}"' - if kwargs.get('lr_scheduler') - else '', - f' --lr_warmup_steps="{kwargs.get("lr_warmup_steps", "")}"' - if kwargs.get('lr_warmup_steps') - else '', - f' --train_batch_size="{kwargs.get("train_batch_size", "")}"' - if kwargs.get('train_batch_size') - else '', - f' --max_train_steps="{kwargs.get("max_train_steps", "")}"' - if kwargs.get('max_train_steps') - else '', - f' --save_every_n_epochs="{int(kwargs.get("save_every_n_epochs", 1))}"' - if int(kwargs.get('save_every_n_epochs')) - else '', - f' --mixed_precision="{kwargs.get("mixed_precision", "")}"' - if kwargs.get('mixed_precision') - else '', - f' --save_precision="{kwargs.get("save_precision", "")}"' - if kwargs.get('save_precision') - else '', - f' --seed="{kwargs.get("seed", "")}"' - if kwargs.get('seed') != '' - else '', - f' --caption_extension="{kwargs.get("caption_extension", "")}"' - if kwargs.get('caption_extension') - else '', - ' --cache_latents' if kwargs.get('cache_latents') else '', - ' --cache_latents_to_disk' - if kwargs.get('cache_latents_to_disk') - else '', - # ' --use_lion_optimizer' if kwargs.get('optimizer') == 'Lion' else '', - f' --optimizer_type="{kwargs.get("optimizer", "AdamW")}"', - f' --optimizer_args {kwargs.get("optimizer_args", "")}' - if not kwargs.get('optimizer_args') == '' - else '', - ] - run_cmd = ''.join(options) + run_cmd = '' + + learning_rate = kwargs.get("learning_rate", "") + if learning_rate: + run_cmd += f' --learning_rate="{learning_rate}"' + + lr_scheduler = kwargs.get("lr_scheduler", "") + if lr_scheduler: + run_cmd += f' --lr_scheduler="{lr_scheduler}"' + + lr_warmup_steps = kwargs.get("lr_warmup_steps", "") + if lr_warmup_steps: + if lr_scheduler == 'constant': + print('Can\'t use LR warmup with LR Scheduler constant... ignoring...') + else: + run_cmd += f' --lr_warmup_steps="{lr_warmup_steps}"' + + train_batch_size = kwargs.get("train_batch_size", "") + if train_batch_size: + run_cmd += f' --train_batch_size="{train_batch_size}"' + + max_train_steps = kwargs.get("max_train_steps", "") + if max_train_steps: + run_cmd += f' --max_train_steps="{max_train_steps}"' + + save_every_n_epochs = kwargs.get("save_every_n_epochs") + if save_every_n_epochs: + run_cmd += f' --save_every_n_epochs="{int(save_every_n_epochs)}"' + + mixed_precision = kwargs.get("mixed_precision", "") + if mixed_precision: + run_cmd += f' --mixed_precision="{mixed_precision}"' + + save_precision = kwargs.get("save_precision", "") + if save_precision: + run_cmd += f' --save_precision="{save_precision}"' + + seed = kwargs.get("seed", "") + if seed != '': + run_cmd += f' --seed="{seed}"' + + caption_extension = kwargs.get("caption_extension", "") + if caption_extension: + run_cmd += f' --caption_extension="{caption_extension}"' + + cache_latents = kwargs.get('cache_latents') + if cache_latents: + run_cmd += ' --cache_latents' + + cache_latents_to_disk = kwargs.get('cache_latents_to_disk') + if cache_latents_to_disk: + run_cmd += ' --cache_latents_to_disk' + + optimizer_type = kwargs.get("optimizer", "AdamW") + run_cmd += f' --optimizer_type="{optimizer_type}"' + + optimizer_args = kwargs.get("optimizer_args", "") + if optimizer_args != '': + run_cmd += f' --optimizer_args {optimizer_args}' + return run_cmd def gradio_advanced_training(headless=False): + def noise_offset_type_change(noise_offset_type): + if noise_offset_type == 'Original': + return (gr.Group.update(visible=True), gr.Group.update(visible=False)) + else: + return (gr.Group.update(visible=False), gr.Group.update(visible=True)) with gr.Row(): additional_parameters = gr.Textbox( label='Additional parameters', @@ -939,30 +965,54 @@ def gradio_advanced_training(headless=False): random_crop = gr.Checkbox( label='Random crop instead of center crop', value=False ) + with gr.Row(): - noise_offset = gr.Slider( - label='Noise offset', - value=0, - minimum=0, - maximum=1, - step=0.01, - info='recommended values are 0.05 - 0.15', - ) - multires_noise_iterations = gr.Slider( - label='Multires noise iterations', - value=0, - minimum=0, - maximum=64, - step=1, - info='enable multires noise (recommended values are 6-10)', + noise_offset_type = gr.Dropdown( + label='Noise offset type', + choices=[ + 'Original', + 'Multires', + ], + value='Original', ) - multires_noise_discount = gr.Slider( - label='Multires noise discount', - value=0, - minimum=0, - maximum=1, - step=0.01, - info='recommended values are 0.8. For LoRAs with small datasets, 0.1-0.3', + with gr.Row(visible=True) as noise_offset_original: + noise_offset = gr.Slider( + label='Noise offset', + value=0, + minimum=0, + maximum=1, + step=0.01, + info='recommended values are 0.05 - 0.15', + ) + adaptive_noise_scale = gr.Slider( + label='Adaptive noise scale', + value=0, + minimum=-1, + maximum=1, + step=0.001, + info='(Experimental, Optional) Since the latent is close to a normal distribution, it may be a good idea to specify a value around 1/10 the noise offset.', + ) + with gr.Row(visible=False) as noise_offset_multires: + multires_noise_iterations = gr.Slider( + label='Multires noise iterations', + value=0, + minimum=0, + maximum=64, + step=1, + info='enable multires noise (recommended values are 6-10)', + ) + multires_noise_discount = gr.Slider( + label='Multires noise discount', + value=0, + minimum=0, + maximum=1, + step=0.01, + info='recommended values are 0.8. For LoRAs with small datasets, 0.1-0.3', + ) + noise_offset_type.change( + noise_offset_type_change, + inputs=[noise_offset_type], + outputs=[noise_offset_original, noise_offset_multires] ) with gr.Row(): caption_dropout_every_n_epochs = gr.Number( @@ -1031,7 +1081,9 @@ def gradio_advanced_training(headless=False): bucket_reso_steps, caption_dropout_every_n_epochs, caption_dropout_rate, + noise_offset_type, noise_offset, + adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, additional_parameters, @@ -1046,80 +1098,135 @@ def gradio_advanced_training(headless=False): def run_cmd_advanced_training(**kwargs): - options = [ - f' --max_train_epochs="{kwargs.get("max_train_epochs", "")}"' - if kwargs.get('max_train_epochs') - else '', - f' --max_data_loader_n_workers="{kwargs.get("max_data_loader_n_workers", "")}"' - if kwargs.get('max_data_loader_n_workers') - else '', - f' --max_token_length={kwargs.get("max_token_length", "")}' - if int(kwargs.get('max_token_length', 75)) > 75 - else '', - f' --clip_skip={kwargs.get("clip_skip", "")}' - if int(kwargs.get('clip_skip', 1)) > 1 - else '', - f' --resume="{kwargs.get("resume", "")}"' - if kwargs.get('resume') - else '', - f' --keep_tokens="{kwargs.get("keep_tokens", "")}"' - if int(kwargs.get('keep_tokens', 0)) > 0 - else '', - f' --caption_dropout_every_n_epochs="{int(kwargs.get("caption_dropout_every_n_epochs", 0))}"' - if int(kwargs.get('caption_dropout_every_n_epochs', 0)) > 0 - else '', - f' --caption_dropout_rate="{float(kwargs.get("caption_dropout_rate", 0))}"' - if float(kwargs.get('caption_dropout_rate', 0)) > 0 - else '', - f' --vae_batch_size="{kwargs.get("vae_batch_size", 0)}"' - if int(kwargs.get('vae_batch_size', 0)) > 0 - else '', - f' --bucket_reso_steps={int(kwargs.get("bucket_reso_steps", 1))}' - if int(kwargs.get('bucket_reso_steps', 64)) >= 1 - else '', - f' --save_every_n_steps="{int(kwargs.get("save_every_n_steps", 0))}"' - if int(kwargs.get('save_every_n_steps')) > 0 - else '', - f' --save_last_n_steps="{int(kwargs.get("save_last_n_steps", 0))}"' - if int(kwargs.get('save_last_n_steps')) > 0 - else '', - f' --save_last_n_steps_state="{int(kwargs.get("save_last_n_steps_state", 0))}"' - if int(kwargs.get('save_last_n_steps_state')) > 0 - else '', - f' --min_snr_gamma={int(kwargs.get("min_snr_gamma", 0))}' - if int(kwargs.get('min_snr_gamma', 0)) >= 1 - else '', - ' --save_state' if kwargs.get('save_state') else '', - ' --mem_eff_attn' if kwargs.get('mem_eff_attn') else '', - ' --color_aug' if kwargs.get('color_aug') else '', - ' --flip_aug' if kwargs.get('flip_aug') else '', - ' --shuffle_caption' if kwargs.get('shuffle_caption') else '', - ' --gradient_checkpointing' - if kwargs.get('gradient_checkpointing') - else '', - ' --full_fp16' if kwargs.get('full_fp16') else '', - ' --xformers' if kwargs.get('xformers') else '', - # ' --use_8bit_adam' if kwargs.get('use_8bit_adam') else '', - ' --persistent_data_loader_workers' - if kwargs.get('persistent_data_loader_workers') - else '', - ' --bucket_no_upscale' if kwargs.get('bucket_no_upscale') else '', - ' --random_crop' if kwargs.get('random_crop') else '', - f' --multires_noise_iterations="{int(kwargs.get("multires_noise_iterations", 0))}"' - if kwargs.get('multires_noise_iterations', 0) > 0 - else '', - f' --multires_noise_discount="{float(kwargs.get("multires_noise_discount", 0.0))}"' - if kwargs.get('multires_noise_discount', 0) > 0 - else '', - f' --noise_offset={float(kwargs.get("noise_offset", 0))}' - if kwargs.get('noise_offset') > 0 - else '', - f' {kwargs.get("additional_parameters", "")}', - ' --log_with wandb' if kwargs.get('use_wandb') else '', - f' --wandb_api_key="{kwargs.get("wandb_api_key", "")}"' - if kwargs.get('wandb_api_key') - else '', - ] - - run_cmd = ''.join(options) + run_cmd = '' + + max_train_epochs = kwargs.get("max_train_epochs", "") + if max_train_epochs: + run_cmd += f' --max_train_epochs={max_train_epochs}' + + max_data_loader_n_workers = kwargs.get("max_data_loader_n_workers", "") + if max_data_loader_n_workers: + run_cmd += f' --max_data_loader_n_workers="{max_data_loader_n_workers}"' + + max_token_length = int(kwargs.get("max_token_length", 75)) + if max_token_length > 75: + run_cmd += f' --max_token_length={max_token_length}' + + clip_skip = int(kwargs.get("clip_skip", 1)) + if clip_skip > 1: + run_cmd += f' --clip_skip={clip_skip}' + + resume = kwargs.get("resume", "") + if resume: + run_cmd += f' --resume="{resume}"' + + keep_tokens = int(kwargs.get("keep_tokens", 0)) + if keep_tokens > 0: + run_cmd += f' --keep_tokens="{keep_tokens}"' + + caption_dropout_every_n_epochs = int(kwargs.get("caption_dropout_every_n_epochs", 0)) + if caption_dropout_every_n_epochs > 0: + run_cmd += f' --caption_dropout_every_n_epochs="{caption_dropout_every_n_epochs}"' + + caption_dropout_rate = float(kwargs.get("caption_dropout_rate", 0)) + if caption_dropout_rate > 0: + run_cmd += f' --caption_dropout_rate="{caption_dropout_rate}"' + + vae_batch_size = int(kwargs.get("vae_batch_size", 0)) + if vae_batch_size > 0: + run_cmd += f' --vae_batch_size="{vae_batch_size}"' + + bucket_reso_steps = int(kwargs.get("bucket_reso_steps", 64)) + run_cmd += f' --bucket_reso_steps={bucket_reso_steps}' + + save_every_n_steps = int(kwargs.get("save_every_n_steps", 0)) + if save_every_n_steps > 0: + run_cmd += f' --save_every_n_steps="{save_every_n_steps}"' + + save_last_n_steps = int(kwargs.get("save_last_n_steps", 0)) + if save_last_n_steps > 0: + run_cmd += f' --save_last_n_steps="{save_last_n_steps}"' + + save_last_n_steps_state = int(kwargs.get("save_last_n_steps_state", 0)) + if save_last_n_steps_state > 0: + run_cmd += f' --save_last_n_steps_state="{save_last_n_steps_state}"' + + min_snr_gamma = int(kwargs.get("min_snr_gamma", 0)) + if min_snr_gamma >= 1: + run_cmd += f' --min_snr_gamma={min_snr_gamma}' + + save_state = kwargs.get('save_state') + if save_state: + run_cmd += ' --save_state' + + mem_eff_attn = kwargs.get('mem_eff_attn') + if mem_eff_attn: + run_cmd += ' --mem_eff_attn' + + color_aug = kwargs.get('color_aug') + if color_aug: + run_cmd += ' --color_aug' + + flip_aug = kwargs.get('flip_aug') + if flip_aug: + run_cmd += ' --flip_aug' + + shuffle_caption = kwargs.get('shuffle_caption') + if shuffle_caption: + run_cmd += ' --shuffle_caption' + + gradient_checkpointing = kwargs.get('gradient_checkpointing') + if gradient_checkpointing: + run_cmd += ' --gradient_checkpointing' + + full_fp16 = kwargs.get('full_fp16') + if full_fp16: + run_cmd += ' --full_fp16' + + xformers = kwargs.get('xformers') + if xformers: + run_cmd += ' --xformers' + + persistent_data_loader_workers = kwargs.get('persistent_data_loader_workers') + if persistent_data_loader_workers: + run_cmd += ' --persistent_data_loader_workers' + + bucket_no_upscale = kwargs.get('bucket_no_upscale') + if bucket_no_upscale: + run_cmd += ' --bucket_no_upscale' + + random_crop = kwargs.get('random_crop') + if random_crop: + run_cmd += ' --random_crop' + + noise_offset_type = kwargs.get('noise_offset_type', 'Original') + if noise_offset_type == 'Original': + noise_offset = float(kwargs.get("noise_offset", 0)) + if noise_offset > 0: + run_cmd += f' --noise_offset={noise_offset}' + + adaptive_noise_scale = float(kwargs.get("adaptive_noise_scale", 0)) + if adaptive_noise_scale != 0 and noise_offset > 0: + run_cmd += f' --adaptive_noise_scale={adaptive_noise_scale}' + else: + multires_noise_iterations = int(kwargs.get("multires_noise_iterations", 0)) + if multires_noise_iterations > 0: + run_cmd += f' --multires_noise_iterations="{multires_noise_iterations}"' + + multires_noise_discount = float(kwargs.get("multires_noise_discount", 0)) + if multires_noise_discount > 0: + run_cmd += f' --multires_noise_discount="{multires_noise_discount}"' + + additional_parameters = kwargs.get("additional_parameters", "") + if additional_parameters: + run_cmd += f' {additional_parameters}' + + use_wandb = kwargs.get('use_wandb') + if use_wandb: + run_cmd += ' --log_with wandb' + + wandb_api_key = kwargs.get("wandb_api_key", "") + if wandb_api_key: + run_cmd += f' --wandb_api_key="{wandb_api_key}"' + return run_cmd diff --git a/library/custom_train_functions.py b/library/custom_train_functions.py index aa268ae30..eb5a91d45 100644 --- a/library/custom_train_functions.py +++ b/library/custom_train_functions.py @@ -348,10 +348,28 @@ def get_weighted_text_embeddings( # https://wandb.ai/johnowhitaker/multires_noise/reports/Multi-Resolution-Noise-for-Diffusion-Model-Training--VmlldzozNjYyOTU2 def pyramid_noise_like(noise, device, iterations=6, discount=0.3): b, c, w, h = noise.shape - u = torch.nn.Upsample(size=(w, h), mode='bilinear').to(device) + u = torch.nn.Upsample(size=(w, h), mode="bilinear").to(device) for i in range(iterations): - r = random.random()*2+2 # Rather than always going 2x, - w, h = max(1, int(w/(r**i))), max(1, int(h/(r**i))) + r = random.random() * 2 + 2 # Rather than always going 2x, + w, h = max(1, int(w / (r**i))), max(1, int(h / (r**i))) noise += u(torch.randn(b, c, w, h).to(device)) * discount**i - if w==1 or h==1: break # Lowest resolution is 1x1 - return noise/noise.std() # Scaled back to roughly unit variance + if w == 1 or h == 1: + break # Lowest resolution is 1x1 + return noise / noise.std() # Scaled back to roughly unit variance + + +# https://www.crosslabs.org//blog/diffusion-with-offset-noise +def apply_noise_offset(latents, noise, noise_offset, adaptive_noise_scale): + if noise_offset is None: + return noise + if adaptive_noise_scale is not None: + # latent shape: (batch_size, channels, height, width) + # abs mean value for each channel + latent_mean = torch.abs(latents.mean(dim=(2, 3), keepdim=True)) + + # multiply adaptive noise scale to the mean value and add it to the noise offset + noise_offset = noise_offset + adaptive_noise_scale * latent_mean + noise_offset = torch.clamp(noise_offset, 0.0, None) # in case of adaptive noise scale is negative + + noise = noise + noise_offset * torch.randn((latents.shape[0], latents.shape[1], 1, 1), device=latents.device) + return noise diff --git a/library/train_util.py b/library/train_util.py index ad139c06b..9a4218082 100644 --- a/library/train_util.py +++ b/library/train_util.py @@ -1885,7 +1885,7 @@ def add_optimizer_arguments(parser: argparse.ArgumentParser): "--optimizer_type", type=str, default="", - help="Optimizer to use / オプティマイザの種類: AdamW (default), AdamW8bit, Lion, Lion8bit,SGDNesterov, SGDNesterov8bit, DAdaptation, AdaFactor", + help="Optimizer to use / オプティマイザの種類: AdamW (default), AdamW8bit, Lion8bit, Lion, SGDNesterov, SGDNesterov8bit, DAdaptation(DAdaptAdam), DAdaptAdaGrad, DAdaptAdan, DAdaptSGD, AdaFactor", ) # backward compatibility @@ -2133,6 +2133,12 @@ def add_training_arguments(parser: argparse.ArgumentParser, support_dreambooth: default=0.3, help="set discount value for multires noise (has no effect without --multires_noise_iterations) / Multires noiseのdiscount値を設定する(--multires_noise_iterations指定時のみ有効)", ) + parser.add_argument( + "--adaptive_noise_scale", + type=float, + default=None, + help="add `latent mean absolute value * this value` to noise_offset (disabled if None, default) / latentの平均値の絶対値 * この値をnoise_offsetに加算する(Noneの場合は無効、デフォルト)", + ) parser.add_argument( "--lowram", action="store_true", @@ -2210,6 +2216,11 @@ def verify_training_args(args: argparse.Namespace): "noise_offset and multires_noise_iterations cannot be enabled at the same time / noise_offsetとmultires_noise_iterationsを同時に有効にすることはできません" ) + if args.adaptive_noise_scale is not None and args.noise_offset is None: + raise ValueError( + "adaptive_noise_scale requires noise_offset / adaptive_noise_scaleを使用するにはnoise_offsetが必要です" + ) + def add_dataset_arguments( parser: argparse.ArgumentParser, support_dreambooth: bool, support_caption: bool, support_caption_dropout: bool @@ -2467,7 +2478,7 @@ def task(): def get_optimizer(args, trainable_params): - # "Optimizer to use: AdamW, AdamW8bit, Lion, Lion8bit, SGDNesterov, SGDNesterov8bit, DAdaptation, Adafactor" + # "Optimizer to use: AdamW, AdamW8bit, Lion, SGDNesterov, SGDNesterov8bit, Lion8bit, DAdaptation, DAdaptation(DAdaptAdam), DAdaptAdaGrad, DAdaptAdan, DAdaptSGD, Adafactor" optimizer_type = args.optimizer_type if args.use_8bit_adam: @@ -2570,13 +2581,15 @@ def get_optimizer(args, trainable_params): optimizer_class = torch.optim.SGD optimizer = optimizer_class(trainable_params, lr=lr, nesterov=True, **optimizer_kwargs) - elif optimizer_type == "DAdaptation".lower(): + elif optimizer_type.startswith("DAdapt".lower()): + # DAdaptation family + # check dadaptation is installed try: import dadaptation except ImportError: raise ImportError("No dadaptation / dadaptation がインストールされていないようです") - print(f"use D-Adaptation Adam optimizer | {optimizer_kwargs}") + # check lr and lr_count, and print warning actual_lr = lr lr_count = 1 if type(trainable_params) == list and type(trainable_params[0]) == dict: @@ -2596,7 +2609,22 @@ def get_optimizer(args, trainable_params): f"when multiple learning rates are specified with dadaptation (e.g. for Text Encoder and U-Net), only the first one will take effect / D-Adaptationで複数の学習率を指定した場合(Text EncoderとU-Netなど)、最初の学習率のみが有効になります: lr={actual_lr}" ) - optimizer_class = dadaptation.DAdaptAdam + # set optimizer + if optimizer_type == "DAdaptation".lower() or optimizer_type == "DAdaptAdam".lower(): + optimizer_class = dadaptation.DAdaptAdam + print(f"use D-Adaptation Adam optimizer | {optimizer_kwargs}") + elif optimizer_type == "DAdaptAdaGrad".lower(): + optimizer_class = dadaptation.DAdaptAdaGrad + print(f"use D-Adaptation AdaGrad optimizer | {optimizer_kwargs}") + elif optimizer_type == "DAdaptAdan".lower(): + optimizer_class = dadaptation.DAdaptAdan + print(f"use D-Adaptation Adan optimizer | {optimizer_kwargs}") + elif optimizer_type == "DAdaptSGD".lower(): + optimizer_class = dadaptation.DAdaptSGD + print(f"use D-Adaptation SGD optimizer | {optimizer_kwargs}") + else: + raise ValueError(f"Unknown optimizer type: {optimizer_type}") + optimizer = optimizer_class(trainable_params, lr=lr, **optimizer_kwargs) elif optimizer_type == "Adafactor".lower(): @@ -3327,7 +3355,7 @@ def sample_images( os.makedirs(save_dir, exist_ok=True) rng_state = torch.get_rng_state() - cuda_rng_state = torch.cuda.get_rng_state() + cuda_rng_state = torch.cuda.get_rng_state() if torch.cuda.is_available() else None with torch.no_grad(): with accelerator.autocast(): @@ -3434,7 +3462,8 @@ def sample_images( torch.cuda.empty_cache() torch.set_rng_state(rng_state) - torch.cuda.set_rng_state(cuda_rng_state) + if cuda_rng_state is not None: + torch.cuda.set_rng_state(cuda_rng_state) vae.to(org_vae_device) diff --git a/lora_gui.py b/lora_gui.py index f0eb3f923..bd5660ff2 100644 --- a/lora_gui.py +++ b/lora_gui.py @@ -119,7 +119,7 @@ def save_configuration( caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, LoRA_type, @@ -256,7 +256,7 @@ def open_configuration( caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, LoRA_type, @@ -385,7 +385,7 @@ def train_model( caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, LoRA_type, @@ -466,15 +466,15 @@ def train_model( ) return - if float(noise_offset) > 0 and ( - multires_noise_iterations > 0 or multires_noise_discount > 0 - ): - output_message( - msg="noise offset and multires_noise can't be set at the same time. Only use one or the other.", - title='Error', - headless=headless_bool, - ) - return + # if float(noise_offset) > 0 and ( + # multires_noise_iterations > 0 or multires_noise_discount > 0 + # ): + # output_message( + # msg="noise offset and multires_noise can't be set at the same time. Only use one or the other.", + # title='Error', + # headless=headless_bool, + # ) + # return if not os.path.exists(output_dir): os.makedirs(output_dir) @@ -563,6 +563,7 @@ def train_model( math.ceil( float(total_steps) / int(train_batch_size) + / int(gradient_accumulation_steps) * int(epoch) * int(reg_factor) ) @@ -768,7 +769,9 @@ def train_model( bucket_reso_steps=bucket_reso_steps, caption_dropout_every_n_epochs=caption_dropout_every_n_epochs, caption_dropout_rate=caption_dropout_rate, + noise_offset_type=noise_offset_type, noise_offset=noise_offset, + adaptive_noise_scale=adaptive_noise_scale, multires_noise_iterations=multires_noise_iterations, multires_noise_discount=multires_noise_discount, additional_parameters=additional_parameters, @@ -1201,7 +1204,7 @@ def update_LoRA_settings(LoRA_type): bucket_reso_steps, caption_dropout_every_n_epochs, caption_dropout_rate, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, additional_parameters, @@ -1328,7 +1331,7 @@ def update_LoRA_settings(LoRA_type): caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, LoRA_type, diff --git a/networks/lora.py b/networks/lora.py index 1a3935368..298b0d5e1 100644 --- a/networks/lora.py +++ b/networks/lora.py @@ -68,6 +68,39 @@ def apply_to(self): self.org_module.forward = self.forward del self.org_module + def forward(self, x): + return self.org_forward(x) + self.lora_up(self.lora_down(x)) * self.multiplier * self.scale + + +class LoRAInfModule(LoRAModule): + def __init__(self, lora_name, org_module: torch.nn.Module, multiplier=1.0, lora_dim=4, alpha=1): + super().__init__(lora_name, org_module, multiplier, lora_dim, alpha) + + self.org_module_ref = [org_module] # 後から参照できるように + self.enabled = True + + # check regional or not by lora_name + self.text_encoder = False + if lora_name.startswith("lora_te_"): + self.regional = False + self.use_sub_prompt = True + self.text_encoder = True + elif "attn2_to_k" in lora_name or "attn2_to_v" in lora_name: + self.regional = False + self.use_sub_prompt = True + elif "time_emb" in lora_name: + self.regional = False + self.use_sub_prompt = False + else: + self.regional = True + self.use_sub_prompt = False + + self.network: LoRANetwork = None + + def set_network(self, network): + self.network = network + + # freezeしてマージする def merge_to(self, sd, dtype, device): # get up/down weight up_weight = sd["lora_up.weight"].to(torch.float).to(device) @@ -99,44 +132,45 @@ def merge_to(self, sd, dtype, device): org_sd["weight"] = weight.to(dtype) self.org_module.load_state_dict(org_sd) - def set_region(self, region): - self.region = region - self.region_mask = None - - def forward(self, x): - return self.org_forward(x) + self.lora_up(self.lora_down(x)) * self.multiplier * self.scale - + # 復元できるマージのため、このモジュールのweightを返す + def get_weight(self, multiplier=None): + if multiplier is None: + multiplier = self.multiplier -class LoRAInfModule(LoRAModule): - def __init__(self, lora_name, org_module: torch.nn.Module, multiplier=1.0, lora_dim=4, alpha=1): - super().__init__(lora_name, org_module, multiplier, lora_dim, alpha) + # get up/down weight from module + up_weight = self.lora_up.weight.to(torch.float) + down_weight = self.lora_down.weight.to(torch.float) - # check regional or not by lora_name - self.text_encoder = False - if lora_name.startswith("lora_te_"): - self.regional = False - self.use_sub_prompt = True - self.text_encoder = True - elif "attn2_to_k" in lora_name or "attn2_to_v" in lora_name: - self.regional = False - self.use_sub_prompt = True - elif "time_emb" in lora_name: - self.regional = False - self.use_sub_prompt = False + # pre-calculated weight + if len(down_weight.size()) == 2: + # linear + weight = self.multiplier * (up_weight @ down_weight) * self.scale + elif down_weight.size()[2:4] == (1, 1): + # conv2d 1x1 + weight = ( + self.multiplier + * (up_weight.squeeze(3).squeeze(2) @ down_weight.squeeze(3).squeeze(2)).unsqueeze(2).unsqueeze(3) + * self.scale + ) else: - self.regional = True - self.use_sub_prompt = False + # conv2d 3x3 + conved = torch.nn.functional.conv2d(down_weight.permute(1, 0, 2, 3), up_weight).permute(1, 0, 2, 3) + weight = self.multiplier * conved * self.scale - self.network: LoRANetwork = None + return weight - def set_network(self, network): - self.network = network + def set_region(self, region): + self.region = region + self.region_mask = None def default_forward(self, x): # print("default_forward", self.lora_name, x.size()) return self.org_forward(x) + self.lora_up(self.lora_down(x)) * self.multiplier * self.scale def forward(self, x): + if not self.enabled: + return self.org_forward(x) + if self.network is None or self.network.sub_prompt_index is None: return self.default_forward(x) if not self.regional and not self.use_sub_prompt: @@ -770,6 +804,10 @@ def apply_to(self, text_encoder, unet, apply_text_encoder=True, apply_unet=True) lora.apply_to() self.add_module(lora.lora_name, lora) + # マージできるかどうかを返す + def is_mergeable(self): + return True + # TODO refactor to common function with apply_to def merge_to(self, text_encoder, unet, weights_sd, dtype, device): apply_text_encoder = apply_unet = False @@ -956,3 +994,40 @@ def resize_add(mh, mw): w = (w + 1) // 2 self.mask_dic = mask_dic + + def backup_weights(self): + # 重みのバックアップを行う + loras: List[LoRAInfModule] = self.text_encoder_loras + self.unet_loras + for lora in loras: + org_module = lora.org_module_ref[0] + if not hasattr(org_module, "_lora_org_weight"): + sd = org_module.state_dict() + org_module._lora_org_weight = sd["weight"].detach().clone() + org_module._lora_restored = True + + def restore_weights(self): + # 重みのリストアを行う + loras: List[LoRAInfModule] = self.text_encoder_loras + self.unet_loras + for lora in loras: + org_module = lora.org_module_ref[0] + if not org_module._lora_restored: + sd = org_module.state_dict() + sd["weight"] = org_module._lora_org_weight + org_module.load_state_dict(sd) + org_module._lora_restored = True + + def pre_calculation(self): + # 事前計算を行う + loras: List[LoRAInfModule] = self.text_encoder_loras + self.unet_loras + for lora in loras: + org_module = lora.org_module_ref[0] + sd = org_module.state_dict() + + org_weight = sd["weight"] + lora_weight = lora.get_weight().to(org_weight.device, dtype=org_weight.dtype) + sd["weight"] = org_weight + lora_weight + assert sd["weight"].shape == org_weight.shape + org_module.load_state_dict(sd) + + org_module._lora_restored = False + lora.enabled = False diff --git a/setup.bat b/setup.bat index d118b043b..e1846a8b6 100644 --- a/setup.bat +++ b/setup.bat @@ -44,13 +44,15 @@ if %choice%==1 ( ) else ( pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 --extra-index-url https://download.pytorch.org/whl/cu118 pip install --use-pep517 --upgrade -r requirements.txt - pip install --upgrade xformers==0.0.17 + pip install --upgrade xformers==0.0.19 rem pip install -U -I --no-deps https://files.pythonhosted.org/packages/d6/f7/02662286419a2652c899e2b3d1913c47723fc164b4ac06a85f769c291013/xformers-0.0.17rc482-cp310-cp310-win_amd64.whl pip install https://huggingface.co/r4ziel/xformers_pre_built/resolve/main/triton-2.0.0-cp310-cp310-win_amd64.whl ) -copy /y .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\ -copy /y .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py -copy /y .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py +python.exe .\tools\update_bitsandbytes.py + +@REM copy /y .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\ +@REM copy /y .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py +@REM copy /y .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py accelerate config diff --git a/textual_inversion_gui.py b/textual_inversion_gui.py index d0c8f3761..9bb068d83 100644 --- a/textual_inversion_gui.py +++ b/textual_inversion_gui.py @@ -110,7 +110,7 @@ def save_configuration( caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, sample_every_n_steps, @@ -233,7 +233,7 @@ def open_configuration( caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, sample_every_n_steps, @@ -339,7 +339,7 @@ def train_model( caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, sample_every_n_steps, @@ -405,15 +405,15 @@ def train_model( ): return - if float(noise_offset) > 0 and ( - multires_noise_iterations > 0 or multires_noise_discount > 0 - ): - output_message( - msg="noise offset and multires_noise can't be set at the same time. Only use one or the other.", - title='Error', - headless=headless_bool, - ) - return + # if float(noise_offset) > 0 and ( + # multires_noise_iterations > 0 or multires_noise_discount > 0 + # ): + # output_message( + # msg="noise offset and multires_noise can't be set at the same time. Only use one or the other.", + # title='Error', + # headless=headless_bool, + # ) + # return if optimizer == 'Adafactor' and lr_warmup != '0': output_message( @@ -475,6 +475,7 @@ def train_model( math.ceil( float(total_steps) / int(train_batch_size) + / int(gradient_accumulation_steps) * int(epoch) * int(reg_factor) ) @@ -579,7 +580,9 @@ def train_model( bucket_reso_steps=bucket_reso_steps, caption_dropout_every_n_epochs=caption_dropout_every_n_epochs, caption_dropout_rate=caption_dropout_rate, + noise_offset_type=noise_offset_type, noise_offset=noise_offset, + adaptive_noise_scale=adaptive_noise_scale, multires_noise_iterations=multires_noise_iterations, multires_noise_discount=multires_noise_discount, additional_parameters=additional_parameters, @@ -860,7 +863,7 @@ def ti_tab( bucket_reso_steps, caption_dropout_every_n_epochs, caption_dropout_rate, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, additional_parameters, @@ -973,7 +976,7 @@ def ti_tab( caption_dropout_rate, optimizer, optimizer_args, - noise_offset, + noise_offset_type,noise_offset,adaptive_noise_scale, multires_noise_iterations, multires_noise_discount, sample_every_n_steps, diff --git a/tools/update_bitsandbytes.py b/tools/update_bitsandbytes.py new file mode 100644 index 000000000..ee8b2ae60 --- /dev/null +++ b/tools/update_bitsandbytes.py @@ -0,0 +1,49 @@ +import os +import sysconfig +import filecmp +import shutil + +def sync_bits_and_bytes_files(): + """ + Check for "different" bitsandbytes Files and copy only if necessary. + This function is specific for Windows OS. + """ + + # Only execute on Windows + if os.name != "nt": + print("This function is only applicable to Windows OS.") + return + + try: + # Define source and destination directories + source_dir = os.path.join(os.getcwd(), "bitsandbytes_windows") + + dest_dir_base = os.path.join(sysconfig.get_paths()["purelib"], "bitsandbytes") + + # Clear file comparison cache + filecmp.clear_cache() + + # Iterate over each file in source directory + for file in os.listdir(source_dir): + source_file_path = os.path.join(source_dir, file) + + # Decide the destination directory based on file name + if file in ("main.py", "paths.py"): + dest_dir = os.path.join(dest_dir_base, "cuda_setup") + else: + dest_dir = dest_dir_base + + # Copy file from source to destination, maintaining original file's metadata + print(f'Copy {source_file_path} to {dest_dir}') + shutil.copy2(source_file_path, dest_dir) + + except FileNotFoundError as fnf_error: + print(f"File not found error: {fnf_error}") + except PermissionError as perm_error: + print(f"Permission error: {perm_error}") + except Exception as e: + print(f"An unexpected error occurred: {e}") + + +if __name__ == "__main__": + sync_bits_and_bytes_files() \ No newline at end of file diff --git a/train_db.py b/train_db.py index 90ee1bb18..55e2ababe 100644 --- a/train_db.py +++ b/train_db.py @@ -23,7 +23,7 @@ BlueprintGenerator, ) import library.custom_train_functions as custom_train_functions -from library.custom_train_functions import apply_snr_weight, get_weighted_text_embeddings, pyramid_noise_like +from library.custom_train_functions import apply_snr_weight, get_weighted_text_embeddings, pyramid_noise_like, apply_noise_offset def train(args): @@ -271,8 +271,7 @@ def train(args): # Sample noise that we'll add to the latents noise = torch.randn_like(latents, device=latents.device) if args.noise_offset: - # https://www.crosslabs.org//blog/diffusion-with-offset-noise - noise += args.noise_offset * torch.randn((latents.shape[0], latents.shape[1], 1, 1), device=latents.device) + noise = apply_noise_offset(latents, noise, args.noise_offset, args.adaptive_noise_scale) elif args.multires_noise_iterations: noise = pyramid_noise_like(noise, latents.device, args.multires_noise_iterations, args.multires_noise_discount) @@ -367,7 +366,7 @@ def train(args): current_loss = loss.detach().item() if args.logging_dir is not None: logs = {"loss": current_loss, "lr": float(lr_scheduler.get_last_lr()[0])} - if args.optimizer_type.lower() == "DAdaptation".lower(): # tracking d*lr value + if args.optimizer_type.lower().startswith("DAdapt".lower()): # tracking d*lr value logs["lr/d*lr"] = ( lr_scheduler.optimizers[0].param_groups[0]["d"] * lr_scheduler.optimizers[0].param_groups[0]["lr"] ) diff --git a/train_network.py b/train_network.py index 4c4cc2816..43f70225d 100644 --- a/train_network.py +++ b/train_network.py @@ -25,7 +25,7 @@ ) import library.huggingface_util as huggingface_util import library.custom_train_functions as custom_train_functions -from library.custom_train_functions import apply_snr_weight, get_weighted_text_embeddings, pyramid_noise_like +from library.custom_train_functions import apply_snr_weight, get_weighted_text_embeddings, pyramid_noise_like, apply_noise_offset # TODO 他のスクリプトと共通化する @@ -43,7 +43,7 @@ def generate_step_logs(args: argparse.Namespace, current_loss, avr_loss, lr_sche logs["lr/textencoder"] = float(lrs[0]) logs["lr/unet"] = float(lrs[-1]) # may be same to textencoder - if args.optimizer_type.lower() == "DAdaptation".lower(): # tracking d*lr value of unet. + if args.optimizer_type.lower().startswith("DAdapt".lower()): # tracking d*lr value of unet. logs["lr/d*lr"] = lr_scheduler.optimizers[-1].param_groups[0]["d"] * lr_scheduler.optimizers[-1].param_groups[0]["lr"] else: idx = 0 @@ -53,7 +53,7 @@ def generate_step_logs(args: argparse.Namespace, current_loss, avr_loss, lr_sche for i in range(idx, len(lrs)): logs[f"lr/group{i}"] = float(lrs[i]) - if args.optimizer_type.lower() == "DAdaptation".lower(): + if args.optimizer_type.lower().startswith("DAdapt".lower()): logs[f"lr/d*lr/group{i}"] = ( lr_scheduler.optimizers[-1].param_groups[i]["d"] * lr_scheduler.optimizers[-1].param_groups[i]["lr"] ) @@ -277,7 +277,7 @@ def train(args): else: unet.eval() text_encoder.eval() - + network.prepare_grad_etc(text_encoder, unet) if not cache_latents: @@ -585,11 +585,11 @@ def remove_model(old_ckpt_name): else: input_ids = batch["input_ids"].to(accelerator.device) encoder_hidden_states = train_util.get_hidden_states(args, input_ids, tokenizer, text_encoder, weight_dtype) + # Sample noise that we'll add to the latents noise = torch.randn_like(latents, device=latents.device) if args.noise_offset: - # https://www.crosslabs.org//blog/diffusion-with-offset-noise - noise += args.noise_offset * torch.randn((latents.shape[0], latents.shape[1], 1, 1), device=latents.device) + noise = apply_noise_offset(latents, noise, args.noise_offset, args.adaptive_noise_scale) elif args.multires_noise_iterations: noise = pyramid_noise_like(noise, latents.device, args.multires_noise_iterations, args.multires_noise_discount) @@ -713,7 +713,7 @@ def remove_model(old_ckpt_name): if is_main_process: ckpt_name = train_util.get_last_ckpt_name(args, "." + args.save_model_as) save_model(ckpt_name, network, global_step, num_train_epochs, force_sync_upload=True) - + print("model saved.") diff --git a/train_textual_inversion.py b/train_textual_inversion.py index 301aae7ae..8da204777 100644 --- a/train_textual_inversion.py +++ b/train_textual_inversion.py @@ -20,7 +20,7 @@ BlueprintGenerator, ) import library.custom_train_functions as custom_train_functions -from library.custom_train_functions import apply_snr_weight, pyramid_noise_like +from library.custom_train_functions import apply_snr_weight, pyramid_noise_like, apply_noise_offset imagenet_templates_small = [ "a photo of a {}", @@ -387,8 +387,7 @@ def remove_model(old_ckpt_name): # Sample noise that we'll add to the latents noise = torch.randn_like(latents, device=latents.device) if args.noise_offset: - # https://www.crosslabs.org//blog/diffusion-with-offset-noise - noise += args.noise_offset * torch.randn((latents.shape[0], latents.shape[1], 1, 1), device=latents.device) + noise = apply_noise_offset(latents, noise, args.noise_offset, args.adaptive_noise_scale) elif args.multires_noise_iterations: noise = pyramid_noise_like(noise, latents.device, args.multires_noise_iterations, args.multires_noise_discount) @@ -465,7 +464,7 @@ def remove_model(old_ckpt_name): current_loss = loss.detach().item() if args.logging_dir is not None: logs = {"loss": current_loss, "lr": float(lr_scheduler.get_last_lr()[0])} - if args.optimizer_type.lower() == "DAdaptation".lower(): # tracking d*lr value + if args.optimizer_type.lower().startswith("DAdapt".lower()): # tracking d*lr value logs["lr/d*lr"] = ( lr_scheduler.optimizers[0].param_groups[0]["d"] * lr_scheduler.optimizers[0].param_groups[0]["lr"] ) diff --git a/train_textual_inversion_XTI.py b/train_textual_inversion_XTI.py index 2aa6cd7fe..358748029 100644 --- a/train_textual_inversion_XTI.py +++ b/train_textual_inversion_XTI.py @@ -20,7 +20,7 @@ BlueprintGenerator, ) import library.custom_train_functions as custom_train_functions -from library.custom_train_functions import apply_snr_weight, pyramid_noise_like +from library.custom_train_functions import apply_snr_weight, pyramid_noise_like, apply_noise_offset from XTI_hijack import unet_forward_XTI, downblock_forward_XTI, upblock_forward_XTI imagenet_templates_small = [ @@ -426,8 +426,7 @@ def remove_model(old_ckpt_name): # Sample noise that we'll add to the latents noise = torch.randn_like(latents, device=latents.device) if args.noise_offset: - # https://www.crosslabs.org//blog/diffusion-with-offset-noise - noise += args.noise_offset * torch.randn((latents.shape[0], latents.shape[1], 1, 1), device=latents.device) + noise = apply_noise_offset(latents, noise, args.noise_offset, args.adaptive_noise_scale) elif args.multires_noise_iterations: noise = pyramid_noise_like(noise, latents.device, args.multires_noise_iterations, args.multires_noise_discount) @@ -504,7 +503,7 @@ def remove_model(old_ckpt_name): current_loss = loss.detach().item() if args.logging_dir is not None: logs = {"loss": current_loss, "lr": float(lr_scheduler.get_last_lr()[0])} - if args.optimizer_type.lower() == "DAdaptation".lower(): # tracking d*lr value + if args.optimizer_type.lower().startswith("DAdapt".lower()): # tracking d*lr value logs["lr/d*lr"] = ( lr_scheduler.optimizers[0].param_groups[0]["d"] * lr_scheduler.optimizers[0].param_groups[0]["lr"] )