Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA out of memory. Tried to allocate 146.00 MiB (GPU 0; 8.00 GiB total capacity; 7.21 GiB already allocated; 0 bytes free; 7.32 GiB reserved in total by PyTorch) #623

Closed
Cynaxia opened this issue Apr 15, 2023 · 2 comments

Comments

@Cynaxia
Copy link

Cynaxia commented Apr 15, 2023

Folder 100_Cynaxia : 1500 steps
max_train_steps = 1500
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_db.py" --pretrained_model_name_or_path="E:/stable-diffusion/stable-diffusion-webui/models/Stable-diffusion/WaifuDiffusion.ckpt" --train_data_dir="E:/LORA Training/Cynaxia Live2D w Captions/Cynaxia Live2D LoRA/image" --resolution=512,512 --output_dir="E:/LORA Training/Cynaxia Live2D w Captions/Cynaxia Live2D LoRA/model" --logging_dir="E:/LORA Training/Cynaxia Live2D w Captions/Cynaxia Live2D LoRA/model" --save_model_as=safetensors --output_name="Cynaxialive2d" --max_data_loader_n_workers="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="1500" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --mem_eff_attn --gradient_checkpointing --xformers --bucket_no_upscale
prepare tokenizer
prepare images.
found directory E:\LORA Training\Cynaxia Live2D w Captions\Cynaxia Live2D LoRA\image\100_Cynaxia contains 15 image files1500 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 1
resolution: (512, 512)
enable_bucket: False

[Subset 0 of Dataset 0]
image_dir: "E:\LORA Training\Cynaxia Live2D w Captions\Cynaxia Live2D LoRA\image\100_Cynaxia"
image_count: 15
num_repeats: 100
shuffle_caption: False
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: Cynaxia
caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 2499.19it/s]
prepare dataset
prepare accelerator
Using accelerator 0.15.0 or above.
load StableDiffusion checkpoint
loading u-net:
loading vae:
loading text encoder:
Replace CrossAttention.forward to use FlashAttention (not xformers)
[Dataset 0]
caching latents.
100%|██████████████████████████████████████████████████████████████████████████████████| 15/15 [00:03<00:00, 3.79it/s]
prepare optimizer, data loader etc.
use AdamW optimizer | {}
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 1500
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 1500
num epochs / epoch数: 1
batch size per device / バッチサイズ: 1
total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ(並列学習、勾配合計含む): 1
gradient ccumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 1500
steps: 0%| | 0/1500 [00:00<?, ?it/s]epoch 1/1
Traceback (most recent call last):
File "E:\Kohya\kohya_ss\train_db.py", line 435, in
train(args)
File "E:\Kohya\kohya_ss\train_db.py", line 315, in train
accelerator.backward(loss)
File "E:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 1314, in backward
self.scaler.scale(loss).backward(**kwargs)
File "E:\Kohya\kohya_ss\venv\lib\site-packages\torch_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "E:\Kohya\kohya_ss\venv\lib\site-packages\torch\autograd_init_.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA out of memory. Tried to allocate 146.00 MiB (GPU 0; 8.00 GiB total capacity; 7.21 GiB already allocated; 0 bytes free; 7.32 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
steps: 0%| | 0/1500 [00:05<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\Cynax\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Cynax\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "E:\Kohya\kohya_ss\venv\Scripts\accelerate.exe_main
.py", line 7, in
File "E:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "E:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "E:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['E:\Kohya\kohya_ss\venv\Scripts\python.exe', 'train_db.py', '--pretrained_model_name_or_path=E:/stable-diffusion/stable-diffusion-webui/models/Stable-diffusion/WaifuDiffusion.ckpt', '--train_data_dir=E:/LORA Training/Cynaxia Live2D w Captions/Cynaxia Live2D LoRA/image', '--resolution=512,512', '--output_dir=E:/LORA Training/Cynaxia Live2D w Captions/Cynaxia Live2D LoRA/model', '--logging_dir=E:/LORA Training/Cynaxia Live2D w Captions/Cynaxia Live2D LoRA/model', '--save_model_as=safetensors', '--output_name=Cynaxialive2d', '--max_data_loader_n_workers=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=1500', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--mem_eff_attn', '--gradient_checkpointing', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

Struggling to fix this issue, I've managed to run 768x768 training the first time , it completed 15%ish , and I've closed CMD because I didn't have enough time at that point to wait for it
When I launched exactly the same operation later , cuda memory error appeared and training won't even start
Tried going for lower res 512x512 , same thing won't even start
I've read posts on stackoverflow suggesting running command
import torch
torch.cuda.empty_cache()
But I'm a newbie and don't really know how/where to do it
any suggestions/help ?
Thanks in advance !

@electricbee
Copy link

Looks like you're trying to train a LORA but accidentally started the Dreambooth trainer instead
I have done the same thing an embarrassing amount of times

@Cynaxia
Copy link
Author

Cynaxia commented Apr 15, 2023

Looks like you're trying to train a LORA but accidentally started the Dreambooth trainer instead I have done the same thing an embarrassing amount of times

Yeah apparently so, late night rush to fix the problem did it's thing (:
Made sure to run LORA this time and it worked first try , thanks for your help !

@Cynaxia Cynaxia closed this as completed Apr 15, 2023
bmaltais pushed a commit that referenced this issue Aug 4, 2023
…windows (#623)

* ADD libbitsandbytes.dll for 0.38.1

* Delete libbitsandbytes_cuda116.dll

* Delete cextension.py

* add main.py

* Update requirements.txt for bitsandbytes 0.38.1

* Update README.md for bitsandbytes-windows

* Update README-ja.md  for bitsandbytes 0.38.1

* Update main.py for return cuda118

* Update train_util.py for lion8bit

* Update train_README-ja.md for lion8bit

* Update train_util.py for add DAdaptAdan and DAdaptSGD

* Update train_util.py for DAdaptadam

* Update train_network.py for dadapt

* Update train_README-ja.md for DAdapt

* Update train_util.py for DAdapt

* Update train_network.py for DAdaptAdaGrad

* Update train_db.py for DAdapt

* Update fine_tune.py for DAdapt

* Update train_textual_inversion.py for DAdapt

* Update train_textual_inversion_XTI.py for DAdapt

* Revert "Merge branch 'qinglong' into main"

This reverts commit b65c023083d6d1e8a30eb42eddd603d1aac97650, reversing
changes made to f6fda20caf5e773d56bcfb5c4575c650bb85362b.

* Revert "Update requirements.txt for bitsandbytes 0.38.1"

This reverts commit 83abc60dfaddb26845f54228425b98dd67997528.

* Revert "Delete cextension.py"

This reverts commit 3ba4dfe046874393f2a022a4cbef3628ada35391.

* Revert "Update README.md for bitsandbytes-windows"

This reverts commit 4642c52086b5e9791233007e2fdfd97f832cd897.

* Revert "Update README-ja.md  for bitsandbytes 0.38.1"

This reverts commit fa6d7485ac067ebc49e6f381afdb8dd2f12caa8f.

* Update train_util.py for DAdaptLion

* Update train_README-zh.md for dadaptlion

* Update train_README-ja.md for DAdaptLion

* add DAdatpt V3

* Alignment

* Update train_util.py for experimental

* Update train_util.py V3

* Update train_util.py

* Update requirements.txt

* Update train_README-zh.md

* Update train_README-ja.md

* Update train_util.py fix

* Update train_util.py

* support Prodigy

* add lower

* Update main.py

* support PagedAdamW8bit/PagedLion8bit

* Update requirements.txt

* update for PageAdamW8bit and PagedLion8bit

* Revert

* revert main

* Update train_util.py

* update for bitsandbytes 0.39.1

* Update requirements.txt

* vram leak fix

---------

Co-authored-by: Pam <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants