Release v22.6.0 · bmaltais/kohya_ss

2024/01/27 (v22.6.0)

Merge sd-scripts v0.8.3 code update
- Fixed a bug that the training crashes when --fp8_base is specified with --save_state. PR #1079 Thanks to feffy380!
  - safetensors is updated. Please see Upgrade and update the library.
- Fixed a bug that the training crashes when network_multiplier is specified with multi-GPU training. PR #1084 Thanks to fireicewolf!
- Fixed a bug that the training crashes when training ControlNet-LLLite.
Merge sd-scripts v0.8.2 code update
- [Experimental] The --fp8_base option is added to the training scripts for LoRA etc. The base model (U-Net, and Text Encoder when training modules for Text Encoder) can be trained with fp8. PR #1057 Thanks to KohakuBlueleaf!
  - Please specify --fp8_base in train_network.py or sdxl_train_network.py.
  - PyTorch 2.1 or later is required.
  - If you use xformers with PyTorch 2.1, please see xformers repository and install the appropriate version according to your CUDA version.
  - The sample image generation during training consumes a lot of memory. It is recommended to turn it off.
- [Experimental] The network multiplier can be specified for each dataset in the training scripts for LoRA etc.
  - This is an experimental option and may be removed or changed in the future.
  - For example, if you train with state A as 1.0 and state B as -1.0, you may be able to generate by switching between state A and B depending on the LoRA application rate.
  - Also, if you prepare five states and train them as 0.2, 0.4, 0.6, 0.8, and 1.0, you may be able to generate by switching the states smoothly depending on the application rate.
  - Please specify network_multiplier in [[datasets]] in .toml file.
- Some options are added to networks/extract_lora_from_models.py to reduce the memory usage.
  - --load_precision option can be used to specify the precision when loading the model. If the model is saved in fp16, you can reduce the memory usage by specifying --load_precision fp16 without losing precision.
  - --load_original_model_to option can be used to specify the device to load the original model. --load_tuned_model_to option can be used to specify the device to load the derived model. The default is cpu for both options, but you can specify cuda etc. You can reduce the memory usage by loading one of them to GPU. This option is available only for SDXL.
- The gradient synchronization in LoRA training with multi-GPU is improved. PR #1064 Thanks to KohakuBlueleaf!
- The code for Intel IPEX support is improved. PR #1060 Thanks to akx!
- Fixed a bug in multi-GPU Textual Inversion training.
- .toml example for network multiplier
```
[general]
[[datasets]]
resolution = 512
batch_size = 8
network_multiplier = 1.0

... subset settings ...

[[datasets]]
resolution = 512
batch_size = 8
network_multiplier = -1.0

... subset settings ...
```
Merge sd-scripts v0.8.1 code update
- Fixed a bug that the VRAM usage without Text Encoder training is larger than before in training scripts for LoRA etc (train_network.py, sdxl_train_network.py).
  - Text Encoders were not moved to CPU.
- Fixed typos. Thanks to akx! PR #1053

What's Changed

Update Chinese Documentation by @boombbo in #1896
Change cudann to cuDNN by @EugeoSynthesisThirtyTwo in #1902
v22.6.0 by @bmaltais in #1907

New Contributors

@EugeoSynthesisThirtyTwo made their first contribution in #1902

Full Changelog: v22.5.0...v22.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v22.6.0

What's Changed

New Contributors

Contributors