I can't get training to start #687

uuct95 · 2023-04-29T16:06:47Z

``System Information:
System: Windows, Release: 10, Version: 10.0.22621, Machine: AMD64, Processor: Intel64 Family 6 Model 191 Stepping 2, GenuineIntel

Python Information:
Version: 3.10.9, Implementation: CPython, Compiler: MSC v.1934 64 bit (AMD64)

Virtual Environment Information:
Path: D:\AI\kohya_ss\venv

GPU Information:
Name: NVIDIA GeForce RTX 3070, VRAM: 8192 MiB

Validating that requirements are satisfied.
All requirements satisfied.
Load CSS...
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Folder 100_Yuechan: 16 images found
Folder 100_Yuechan: 1600 steps
max_train_steps = 800
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="D:/AI/Yuechan_Lora/image" --resolution=768,768 --output_dir="D:/AI/Yuechan_Lora/model" --logging_dir="D:/AI/Yuechan_Lora/log" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=128 --output_name="Yuechan" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="800" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --mem_eff_attn --xformers --bucket_no_upscale
prepare tokenizer
Use DreamBooth method.
prepare images.
found directory D:\AI\Yuechan_Lora\image\100_Yuechan contains 16 image files
1600 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 2
resolution: (768, 768)
enable_bucket: False

[Subset 0 of Dataset 0]
image_dir: "D:\AI\Yuechan_Lora\image\100_Yuechan"
image_count: 16
num_repeats: 100
shuffle_caption: False
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: Yuechan
caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 5335.42it/s]
prepare dataset
prepare accelerator
Using accelerator 0.15.0 or above.
loading model for process 0/1
load Diffusers pretrained models
safety_checker\model.safetensors not found
Fetching 19 files: 100%|███████████████████████████████████████████████████████████████████████| 19/19 [00:00<?, ?it/s]
D:\AI\kohya_ss\venv\lib\site-packages\transformers\models\clip\feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
warnings.warn(
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at huggingface/diffusers#254 .
Replace CrossAttention.forward to use FlashAttention (not xformers)
[Dataset 0]
caching latents.
0%| | 0/16 [00:01<?, ?it/s]
Traceback (most recent call last):
File "D:\AI\kohya_ss\train_network.py", line 773, in
train(args)
File "D:\AI\kohya_ss\train_network.py", line 175, in train
train_dataset_group.cache_latents(vae, args.vae_batch_size, args.cache_latents_to_disk, accelerator.is_main_process)
File "D:\AI\kohya_ss\library\train_util.py", line 1391, in cache_latents
dataset.cache_latents(vae, vae_batch_size, cache_to_disk, is_main_process)
File "D:\AI\kohya_ss\library\train_util.py", line 805, in cache_latents
latents = vae.encode(img_tensors).latent_dist.sample().to("cpu")
File "D:\AI\kohya_ss\venv\lib\site-packages\diffusers\models\vae.py", line 566, in encode
h = self.encoder(x)
File "D:\AI\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\AI\kohya_ss\venv\lib\site-packages\diffusers\models\vae.py", line 130, in forward
sample = self.conv_in(sample)
File "D:\AI\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\AI\kohya_ss\venv\lib\site-packages\torch\nn\modules\conv.py", line 457, in forward
return self._conv_forward(input, self.weight, self.bias)
File "D:\AI\kohya_ss\venv\lib\site-packages\torch\nn\modules\conv.py", line 453, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
Traceback (most recent call last):
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\AI\kohya_ss\venv\Scripts\accelerate.exe_main.py", line 7, in
File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\AI\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=D:/AI/Yuechan_Lora/image', '--resolution=768,768', '--output_dir=D:/AI/Yuechan_Lora/model', '--logging_dir=D:/AI/Yuechan_Lora/log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=Yuechan', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=800', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--mem_eff_attn', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

The text was updated successfully, but these errors were encountered:

XSilverHostX · 2023-05-16T16:26:23Z

Hello,I have the same problem for days, can you solve it?

uuct95 · 2023-05-24T07:35:46Z

Reduce the number of training steps (for example, from 100_ to 10_) and increase the virtual memory size (for example, from the default 8GB of virtual memory to 16GB).

support ckpt without position id in sd v1 #687

uuct95 closed this as completed Apr 30, 2023

bmaltais pushed a commit that referenced this issue Jul 29, 2023

support ckpt without position id in sd v1 #687

1e4512b

bmaltais pushed a commit that referenced this issue Jul 29, 2023

Merge pull request #694 from kohya-ss/dev

fb1054b

support ckpt without position id in sd v1 #687

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I can't get training to start #687

I can't get training to start #687

uuct95 commented Apr 29, 2023

XSilverHostX commented May 16, 2023

uuct95 commented May 24, 2023

I can't get training to start #687

I can't get training to start #687

Comments

uuct95 commented Apr 29, 2023

XSilverHostX commented May 16, 2023

uuct95 commented May 24, 2023