Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I can't get training to start #687

Closed
uuct95 opened this issue Apr 29, 2023 · 2 comments
Closed

I can't get training to start #687

uuct95 opened this issue Apr 29, 2023 · 2 comments

Comments

@uuct95
Copy link

uuct95 commented Apr 29, 2023

``System Information:
System: Windows, Release: 10, Version: 10.0.22621, Machine: AMD64, Processor: Intel64 Family 6 Model 191 Stepping 2, GenuineIntel

Python Information:
Version: 3.10.9, Implementation: CPython, Compiler: MSC v.1934 64 bit (AMD64)

Virtual Environment Information:
Path: D:\AI\kohya_ss\venv

GPU Information:
Name: NVIDIA GeForce RTX 3070, VRAM: 8192 MiB

Validating that requirements are satisfied.
All requirements satisfied.
Load CSS...
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Folder 100_Yuechan: 16 images found
Folder 100_Yuechan: 1600 steps
max_train_steps = 800
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="D:/AI/Yuechan_Lora/image" --resolution=768,768 --output_dir="D:/AI/Yuechan_Lora/model" --logging_dir="D:/AI/Yuechan_Lora/log" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=128 --output_name="Yuechan" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="800" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --mem_eff_attn --xformers --bucket_no_upscale
prepare tokenizer
Use DreamBooth method.
prepare images.
found directory D:\AI\Yuechan_Lora\image\100_Yuechan contains 16 image files
1600 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 2
resolution: (768, 768)
enable_bucket: False

[Subset 0 of Dataset 0]
image_dir: "D:\AI\Yuechan_Lora\image\100_Yuechan"
image_count: 16
num_repeats: 100
shuffle_caption: False
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: Yuechan
caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 5335.42it/s]
prepare dataset
prepare accelerator
Using accelerator 0.15.0 or above.
loading model for process 0/1
load Diffusers pretrained models
safety_checker\model.safetensors not found
Fetching 19 files: 100%|███████████████████████████████████████████████████████████████████████| 19/19 [00:00<?, ?it/s]
D:\AI\kohya_ss\venv\lib\site-packages\transformers\models\clip\feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
warnings.warn(
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at huggingface/diffusers#254 .
Replace CrossAttention.forward to use FlashAttention (not xformers)
[Dataset 0]
caching latents.
0%| | 0/16 [00:01<?, ?it/s]
Traceback (most recent call last):
File "D:\AI\kohya_ss\train_network.py", line 773, in
train(args)
File "D:\AI\kohya_ss\train_network.py", line 175, in train
train_dataset_group.cache_latents(vae, args.vae_batch_size, args.cache_latents_to_disk, accelerator.is_main_process)
File "D:\AI\kohya_ss\library\train_util.py", line 1391, in cache_latents
dataset.cache_latents(vae, vae_batch_size, cache_to_disk, is_main_process)
File "D:\AI\kohya_ss\library\train_util.py", line 805, in cache_latents
latents = vae.encode(img_tensors).latent_dist.sample().to("cpu")
File "D:\AI\kohya_ss\venv\lib\site-packages\diffusers\models\vae.py", line 566, in encode
h = self.encoder(x)
File "D:\AI\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\AI\kohya_ss\venv\lib\site-packages\diffusers\models\vae.py", line 130, in forward
sample = self.conv_in(sample)
File "D:\AI\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\AI\kohya_ss\venv\lib\site-packages\torch\nn\modules\conv.py", line 457, in forward
return self._conv_forward(input, self.weight, self.bias)
File "D:\AI\kohya_ss\venv\lib\site-packages\torch\nn\modules\conv.py", line 453, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
Traceback (most recent call last):
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\AI\kohya_ss\venv\Scripts\accelerate.exe_main
.py", line 7, in
File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\AI\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=D:/AI/Yuechan_Lora/image', '--resolution=768,768', '--output_dir=D:/AI/Yuechan_Lora/model', '--logging_dir=D:/AI/Yuechan_Lora/log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=Yuechan', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=800', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--mem_eff_attn', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

@uuct95 uuct95 closed this as completed Apr 30, 2023
@XSilverHostX
Copy link

Hello,I have the same problem for days, can you solve it?

@uuct95
Copy link
Author

uuct95 commented May 24, 2023

Reduce the number of training steps (for example, from 100_ to 10_) and increase the virtual memory size (for example, from the default 8GB of virtual memory to 16GB).

bmaltais pushed a commit that referenced this issue Jul 29, 2023
support ckpt without position id in sd v1 #687
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants