"Raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) " happened when starting studing #433

Teriss · 2023-03-23T10:20:27Z

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link

CUDA SETUP: Loading binary D:\kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
use 8-bit AdamW optimizer | {}
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 1500
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 1500
num epochs / epoch数: 1
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 1500
steps: 0%| | 0/1500 [00:00<?, ?it/s]epoch 1/1
Traceback (most recent call last):
File "D:\kohya\kohya_ss\python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\kohya\kohya_ss\python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\kohya\kohya_ss\venv\Scripts\accelerate.exe_main.py", line 7, in
File "D:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\kohya\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=C:/Users/PC/Desktop/test/v1-5-pruned-emaonly.safetensors', '--train_data_dir=C:/Users/PC/Desktop/test/input', '--resolution=512,512', '--output_dir=C:/Users/PC/Desktop/test/output', '--logging_dir=C:/Users/PC/Desktop/test/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=150', '--train_batch_size=1', '--max_train_steps=1500', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=AdamW8bit', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 3221225477.

Teriss · 2023-03-23T10:41:35Z

when i restart and train again,a new error happen.
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link

CUDA SETUP: Loading binary D:\kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
use 8-bit AdamW optimizer | {}
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 1500
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 1500
num epochs / epoch数: 1
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 1500
steps: 0%| | 0/1500 [00:00<?, ?it/s]epoch 1/1
Traceback (most recent call last):
Traceback (most recent call last):
File "D:\kohya\kohya_ss\train_network.py", line 659, in
File "", line 1, in
File "D:\kohya\kohya_ss\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
train(args)
File "D:\kohya\kohya_ss\train_network.py", line 488, in train
exitcode = _main(fd, parent_sentinel)
File "D:\kohya\kohya_ss\Python310\lib\multiprocessing\spawn.py", line 126, in _main
for step, batch in enumerate(train_dataloader):
File "D:\kohya\kohya_ss\venv\lib\site-packages\accelerate\data_loader.py", line 372, in iter
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
dataloader_iter = super().iter()
File "D:\kohya\kohya_ss\venv\lib\site-packages\torch\utils\data\dataloader.py", line 444, in iter
return self._get_iterator()
File "D:\kohya\kohya_ss\venv\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "D:\kohya\kohya_ss\venv\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in init
w.start()
File "D:\kohya\kohya_ss\Python310\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "D:\kohya\kohya_ss\Python310\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\kohya\kohya_ss\Python310\lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
File "D:\kohya\kohya_ss\Python310\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "D:\kohya\kohya_ss\Python310\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
MemoryError
steps: 0%| | 0/1500 [00:45<?, ?it/s]
Traceback (most recent call last):
File "D:\kohya\kohya_ss\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\kohya\kohya_ss\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\kohya\kohya_ss\venv\scripts\accelerate.exe_main.py", line 7, in
File "D:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\kohya\kohya_ss\venv\scripts\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=C:/Users/PC/Desktop/test/v1-5-pruned-emaonly.safetensors', '--train_data_dir=D:/kohya/test/input', '--resolution=512,512', '--output_dir=D:/kohya/test/output', '--logging_dir=D:/kohya/test/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=150', '--train_batch_size=1', '--max_train_steps=1500', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=AdamW8bit', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

bmaltais · 2023-03-23T13:39:32Z

Try training with AdamW instead of AdamW8bit. I think your car'd can use the bitsandbytes module required for AdamW8bit.

Teriss · 2023-03-24T03:27:06Z

Try training with AdamW instead of AdamW8bit. I think your car'd can use the bitsandbytes module required for AdamW8bit.

Thanks for the suggestion, but it didn't work.
I tried restarting the computer and running it again and it worked. But after the computer was on standby overnight, it got this error again, sometimes a new one : "Memory allocation failure" or "Out Of Momery".My card is RTX 3080 with 12GB momery.It seems to use only 6GB, yet it says it is OOM.

bmaltais · 2023-03-24T08:45:35Z

So something related to windows and possibly windows drivers... Hard to fix those

Teriss · 2023-03-24T09:36:27Z

So something related to windows and possibly windows drivers... Hard to fix those

Hi, I found that the error occurred when loading the data. So I try to changed the parameter "num_workers" in the function "torch.utils.data.DataLoader" to turn it down, it work. And when I set it to 0, the training is the fastest…I think python's multiprocessing may not be very efficient in windows OS.

bmaltais · 2023-03-24T12:02:04Z

So something related to windows and possibly windows drivers... Hard to fix those

Hi, I found that the error occurred when loading the data. So I try to changed the parameter "num_workers" in the function "torch.utils.data.DataLoader" to turn it down, it work. And when I set it to 0, the training is the fastest…I think python's multiprocessing may not be very efficient in windows OS.

Thank you for the update. I will update the default value in the GUI to set it to 0 to avoid similar issues for other users!

Big-ANGELO · 2023-03-25T10:20:01Z

So something related to windows and possibly windows drivers... Hard to fix those

Hi, I found that the error occurred when loading the data. So I try to changed the parameter "num_workers" in the function "torch.utils.data.DataLoader" to turn it down, it work. And when I set it to 0, the training is the fastest…I think python's multiprocessing may not be very efficient in windows OS.

Could you tell me how you solve this problem in a detailed way? Thx!

Teriss · 2023-03-27T02:18:21Z

So something related to windows and possibly windows drivers... Hard to fix those

Hi, I found that the error occurred when loading the data. So I try to changed the parameter "num_workers" in the function "torch.utils.data.DataLoader" to turn it down, it work. And when I set it to 0, the training is the fastest…I think python's multiprocessing may not be very efficient in windows OS.

Could you tell me how you solve this problem in a detailed way? Thx!

You can fix it by updating to the latest version now, the author has put this setting in the GUI.

fix no logging command line arg

bmaltais mentioned this issue Mar 25, 2023

v21.3.4 #450

Merged

HBspud mentioned this issue Mar 26, 2023

[WinError 1455] 页面文件太小，无法完成操作 #449

Closed

Teriss closed this as completed Mar 27, 2023

bmaltais pushed a commit that referenced this issue Apr 24, 2023

Merge pull request #433 from sALTaccount/main

ed15f68

fix no logging command line arg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) " happened when starting studing #433

"Raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) " happened when starting studing #433

Teriss commented Mar 23, 2023

Teriss commented Mar 23, 2023

bmaltais commented Mar 23, 2023

Teriss commented Mar 24, 2023 •

edited

Loading

bmaltais commented Mar 24, 2023 •

edited

Loading

Teriss commented Mar 24, 2023 •

edited

Loading

bmaltais commented Mar 24, 2023

Big-ANGELO commented Mar 25, 2023

Teriss commented Mar 27, 2023

"Raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) " happened when starting studing #433

"Raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) " happened when starting studing #433

Comments

Teriss commented Mar 23, 2023

Teriss commented Mar 23, 2023

bmaltais commented Mar 23, 2023

Teriss commented Mar 24, 2023 • edited Loading

bmaltais commented Mar 24, 2023 • edited Loading

Teriss commented Mar 24, 2023 • edited Loading

bmaltais commented Mar 24, 2023

Big-ANGELO commented Mar 25, 2023

Teriss commented Mar 27, 2023

Teriss commented Mar 24, 2023 •

edited

Loading

bmaltais commented Mar 24, 2023 •

edited

Loading

Teriss commented Mar 24, 2023 •

edited

Loading