"non-zero exit status 1" issue #243

jiuxiaojian · 2023-02-27T13:13:44Z

l have unchecked the "use 8bit adam" and set "AdamW" in the "Optimizer", but it still returned this error, is there anthing else I can do besides these to use dreambooth LORA without error appearing？

loading text encoder: <All keys matched successfully>
Replace CrossAttention.forward to use xformers
caching latents.
100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:03<00:00,  2.43it/s]
import network module: networks.lora
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
use AdamW optimizer | {}
Traceback (most recent call last):
  File "D:\AI Drawing\Lora\kohya_ss\train_network.py", line 507, in <module>
    train(args)
  File "D:\AI Drawing\Lora\kohya_ss\train_network.py", line 176, in train
    unet, text_encoder, network, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
  File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 876, in prepare
    result = tuple(
  File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 877, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 741, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 912, in prepare_model
    model = model.to(self.device)
  File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\transformers\modeling_utils.py", line 1749, in to
    return super().to(*args, **kwargs)
  File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 927, in to
    return self._apply(convert)
  File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
    module._apply(fn)
  File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
    module._apply(fn)
  File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
    module._apply(fn)
  [Previous line repeated 3 more times]
  File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 602, in _apply
    param_applied = fn(param)
  File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 925, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.42 GiB already allocated; 0 bytes free; 3.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
  File "C:\Users\25424\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\25424\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\AI Drawing\Lora\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\AI Drawing\\Lora\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:/AI Drawing/stable-diffusion-webui/models/Stable-diffusion/chilloutmix_NiPrunedFp16Fix.safetensors', '--train_data_dir=D:/AI Drawing/Lora/Lora_database/shiya/image', '--resolution=512,512', '--output_dir=D:/AI Drawing/Lora/Lora_database/shiya/model', '--logging_dir=D:/AI Drawing/Lora/Lora_database/shiya/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=90', '--train_batch_size=1', '--max_train_steps=900', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=AdamW', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

The text was updated successfully, but these errors were encountered:

Thund3rPat · 2023-02-27T15:04:55Z

You simply run out of Memory. You can try to use AdamW8bit optimizer, but with 4 GiB total capacity, it will be very difficult.

Elconite · 2023-02-27T21:35:54Z

I have the same/similar error after it was working fine for a couple weeks. Not sure if an upgrade borked it...

Load CSS...
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Loading config...
Loading config...
Folder 100_Katherine22723: 2000 steps
max_train_steps = 1000
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --pretrained_model_name_or_path="F:\AIStable12423\stable-diffusion-webui\models\Stable-diffusion\illuminati_diffusion_v1.0.safetensors" --train_data_dir="F:/LORA/KatherineLora/img" --resolution=512,512 --output_dir="F:/LORA/KatherineLora/model" --logging_dir="F:/LORA/KatherineLora/log" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=128 --output_name="Katherine22723" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="1000" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --use_8bit_adam --bucket_no_upscale
prepare tokenizer
Use DreamBooth method.
prepare train images.
found directory 100_Katherine22723 contains 20 image files
2000 train images with repeating.
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 2319.15it/s]
prepare dataset
prepare accelerator
Using accelerator 0.15.0 or above.
load StableDiffusion checkpoint
Traceback (most recent call last):
File "F:\LORA\kohya_ss\train_network.py", line 507, in
train(args)
File "F:\LORA\kohya_ss\train_network.py", line 96, in train
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype)
File "F:\LORA\kohya_ss\library\train_util.py", line 1860, in load_target_model
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(args.v2, name_or_path)
File "F:\LORA\kohya_ss\library\model_util.py", line 880, in load_models_from_stable_diffusion_checkpoint
info = unet.load_state_dict(converted_unet_checkpoint)
File "F:\LORA\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
size mismatch for down_blocks.0.attentions.0.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for down_blocks.0.attentions.0.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for down_blocks.0.attentions.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for down_blocks.0.attentions.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for down_blocks.1.attentions.0.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for down_blocks.1.attentions.0.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for down_blocks.1.attentions.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for down_blocks.1.attentions.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for down_blocks.2.attentions.0.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for down_blocks.2.attentions.0.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for down_blocks.2.attentions.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for down_blocks.2.attentions.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for up_blocks.1.attentions.0.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for up_blocks.1.attentions.0.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for up_blocks.1.attentions.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for up_blocks.1.attentions.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for up_blocks.1.attentions.2.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for up_blocks.1.attentions.2.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for up_blocks.2.attentions.0.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for up_blocks.2.attentions.0.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for up_blocks.2.attentions.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for up_blocks.2.attentions.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for up_blocks.2.attentions.2.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for up_blocks.2.attentions.2.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for up_blocks.3.attentions.0.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for up_blocks.3.attentions.0.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for up_blocks.3.attentions.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for up_blocks.3.attentions.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for up_blocks.3.attentions.2.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for up_blocks.3.attentions.2.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for mid_block.attentions.0.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for mid_block.attentions.0.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
Traceback (most recent call last):
File "C:\Users\ElconOne\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\ElconOne\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "F:\LORA\kohya_ss\venv\Scripts\accelerate.exe_main.py", line 7, in
File "F:\LORA\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "F:\LORA\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "F:\LORA\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['F:\LORA\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--pretrained_model_name_or_path=F:\AIStable12423\stable-diffusion-webui\models\Stable-diffusion\illuminati_diffusion_v1.0.safetensors', '--train_data_dir=F:/LORA/KatherineLora/img', '--resolution=512,512', '--output_dir=F:/LORA/KatherineLora/model', '--logging_dir=F:/LORA/KatherineLora/log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=Katherine22723', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=1000', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--use_8bit_adam', '--bucket_no_upscale']' returned non-zero exit status 1.

DieserBobby · 2023-02-28T11:04:14Z

I have a similar looking error - at least it ends with the same "returned non-zero exit status".
I started from same settings in kohya_ss/gui.bat,
that last week have worked fine.
In between I had updated python/cuda/xformers etc. because xformers wasn`t working,
now in stable diffusion xformer is working and everything seemed to be alright...
but training a new LORA is not working.
I was reading a lot here...
I tried the following several ideas to work it out,
but unfortunately I couldn't solve the "returned non-zero exit status" :

1. using Lion instead of adam
2. using adam 8bit instead of adam
3. substituting all 3 ocurrencies of the train_util.py
4. changing resolution
5. updating with upgrade.ps1

Any new or better ideas for me to get it running?

GPU: I am using a Nvidia 3060 with 12GB

Here the end of my error code:

Traceback (most recent call last):
File "C:\Users\bobby\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\bobby\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "F:\Stable-Diffusion\kohya\kohya_ss\venv\Scripts\accelerate.exe_main.py", line 7, in
File "F:\Stable-Diffusion\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "F:\Stable-Diffusion\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "F:\Stable-Diffusion\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['F:\Stable-Diffusion\kohya\kohya_ss\venv\Scripts\python.exe', 'train_db.py', '--pretrained_model_name_or_path=F:/Stable-Diffusion/stable-diffusion-webui/models/Stable-diffusion/liberty_main.safetensors', '--train_data_dir=F:/Stable-Diffusion/kohya/Bilder_Training/lora/image', '--resolution=512,512', '--output_dir=F:/Stable-Diffusion/stable-diffusion-webui/models/Lora', '--logging_dir=F:/Stable-Diffusion/kohya/Bilder_Training/lora/log', '--save_model_as=safetensors', '--output_name=dazpose', '--max_data_loader_n_workers=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=2300', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--use_8bit_adam', '--bucket_no_upscale']' returned non-zero exit status 1.

Can someone recognize the problem?
What could I try to solve it?
Thank you for your time.

syanatan1997 · 2023-02-28T16:09:15Z

I think The "non-zero exit status" error is common error which casued by various reason.
You should double check your options before run "Train model".
There is hint on your command prompt.

In my case, I "Don't upscale bucket resolution"(--bucket_no_upscale) option was cause of error.

DieserBobby · 2023-02-28T20:50:30Z

I got it running again (checking very very many combinations):
In my case: "Memory efficient attention" should be on (some days before there hadn't been the need to)
AND "use 8bit adam" in the advanced section shouldn't be checked.

Koronos · 2023-03-05T05:48:17Z

The problem is that the UI enables --use_8bit_adam and --optimizer_type=AdamW8bit

indeed this has an asset that tell you this.

I tried to modify directly the json to disable the checkbox, and it works... well not at all, now I have a memory problem, but not exit status 1

DieserBobby · 2023-03-05T12:32:49Z

Have you tried to check "Memory efficient attention" additionally? In my case it helped and the result works fine.

Enable ability to resize lora dim based off sv ratios

Pecatum1 · 2024-02-14T00:44:53Z

In my case, it solved by checking "Memory efficient attention" while unchecking "Enable buckets"

Cauldrath pushed a commit to Cauldrath/kohya_ss that referenced this issue Apr 5, 2023

Merge pull request bmaltais#243 from mgz-dev/dynamic-dim-lora-resize

1932c31

Enable ability to resize lora dim based off sv ratios

bmaltais closed this as completed Apr 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"non-zero exit status 1" issue #243

"non-zero exit status 1" issue #243

jiuxiaojian commented Feb 27, 2023

Thund3rPat commented Feb 27, 2023

Elconite commented Feb 27, 2023

DieserBobby commented Feb 28, 2023 •

edited

Loading

syanatan1997 commented Feb 28, 2023

DieserBobby commented Feb 28, 2023 •

edited

Loading

Koronos commented Mar 5, 2023

DieserBobby commented Mar 5, 2023

Pecatum1 commented Feb 14, 2024

"non-zero exit status 1" issue #243

"non-zero exit status 1" issue #243

Comments

jiuxiaojian commented Feb 27, 2023

Thund3rPat commented Feb 27, 2023

Elconite commented Feb 27, 2023

DieserBobby commented Feb 28, 2023 • edited Loading

syanatan1997 commented Feb 28, 2023

DieserBobby commented Feb 28, 2023 • edited Loading

Koronos commented Mar 5, 2023

DieserBobby commented Mar 5, 2023

Pecatum1 commented Feb 14, 2024

DieserBobby commented Feb 28, 2023 •

edited

Loading

DieserBobby commented Feb 28, 2023 •

edited

Loading