Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not enough gpu memory even though there is? #1278

Closed
AvramDarius opened this issue Dec 8, 2023 · 46 comments
Closed

Not enough gpu memory even though there is? #1278

AvramDarius opened this issue Dec 8, 2023 · 46 comments
Labels
bug Something isn't working question Further information is requested

Comments

@AvramDarius
Copy link

I have an 6700XT, it has more than enough vram even though I'm getting this error and I did that fix where I allocated 8gb instead of 1

[Fooocus Model Management] Moving model(s) has taken 59.70 seconds
0%| | 0/30 [00:07<?, ?it/s]
Traceback (most recent call last):
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\modules\async_worker.py", line 803, in worker
handler(task)
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\modules\async_worker.py", line 735, in handler
imgs = pipeline.process_diffusion(
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\modules\default_pipeline.py", line 361, in process_diffusion
sampled_latent = core.ksampler(
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\modules\core.py", line 315, in ksampler
samples = fcbh.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\sample.py", line 100, in sample
samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\samplers.py", line 711, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\modules\sample_hijack.py", line 151, in sample_hacked
samples = sampler.sample(model_wrap, sigmas, extra_args, callback_wrap, noise, latent_image, denoise_mask, disable_pbar)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\samplers.py", line 556, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 701, in sample_dpmpp_2m_sde_gpu
return sample_dpmpp_2m_sde(model, x, sigmas, extra_args=extra_args, callback=callback, disable=disable, eta=eta, s_noise=s_noise, noise_sampler=noise_sampler, solver_type=solver_type)
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 613, in sample_dpmpp_2m_sde
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\modules\patch.py", line 329, in patched_KSamplerX0Inpaint_forward
out = self.inner_model(x, sigma,
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in call_impl
return forward_call(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\samplers.py", line 267, in forward
return self.apply_model(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\samplers.py", line 264, in apply_model
out = sampling_function(self.inner_model, x, timestep, uncond, cond, cond_scale, model_options=model_options, seed=seed)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\samplers.py", line 252, in sampling_function
cond, uncond = calc_cond_uncond_batch(model, cond, uncond, x, timestep, model_options)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\samplers.py", line 230, in calc_cond_uncond_batch
output = model.apply_model(input_x, timestep
, **c).chunk(batch_chunks)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\model_base.py", line 68, in apply_model
model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\modules\patch.py", line 459, in patched_unet_forward
h = forward_timestep_embed(module, h, emb, context, transformer_options, output_shape)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\diffusionmodules\openaimodel.py", line 37, in forward_timestep_embed
x = layer(x, context, transformer_options)
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 560, in forward
x = block(x, context=context[i], transformer_options=transformer_options)
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 390, in forward
return checkpoint(self._forward, (x, context, transformer_options), self.parameters(), self.checkpoint)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\diffusionmodules\util.py", line 123, in checkpoint
return func(*inputs)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 455, in _forward
n = self.attn1(n, context=context_attn1, value=value_attn1)
File "E:\AI\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 366, in forward
out = optimized_attention(q, k, v, self.heads)
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 177, in attention_sub_quad
hidden_states = efficient_dot_product_attention(
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\sub_quadratic_attention.py", line 244, in efficient_dot_product_attention
res = torch.cat([
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\sub_quadratic_attention.py", line 245, in
compute_query_chunk_attn(
File "E:\AI\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\ldm\modules\sub_quadratic_attention.py", line 160, in _get_attention_scores_no_kv_chunking
attn_probs = attn_scores.softmax(dim=-1)
RuntimeError: Could not allocate tensor with 165150720 bytes. There is not enough GPU video memory available!
Total time: 77.58 seconds

@stainz2004
Copy link

Same issue

@AlexeyJersey
Copy link

AlexeyJersey commented Dec 8, 2023

same here, got 5700TX 8gb

@xjbar
Copy link

xjbar commented Dec 8, 2023

AMD cards are running into a memory loop issue. I think the devs are aware, if they're going to do anything about it? Not sure.

@acvcleitao
Copy link

Try this version of the app -> https://github.com/lllyasviel/Fooocus/tree/9660daff94b4d0f282567b96b3d387817818a4b3
Worked for me.

@TDola
Copy link

TDola commented Dec 8, 2023

Try this version of the app -> https://github.com/lllyasviel/Fooocus/tree/9660daff94b4d0f282567b96b3d387817818a4b3 Worked for me.

It's been broken for 2 months? That's disappointing.

Something somewhere almost seems hard coded to that memory amount, I get the same error. You can't even tell it to use CPU mode. And if you try --lowvram, it thinks you want nVidia again.

@AlexeyJersey
Copy link

AlexeyJersey commented Dec 8, 2023

Try this version of the app -> https://github.com/lllyasviel/Fooocus/tree/9660daff94b4d0f282567b96b3d387817818a4b3 Worked for me.

RX5700TX 8gb

RuntimeError: Could not allocate tensor with 26214400 bytes. There is not enough GPU video memory available!
Total time: 67.72 seconds

@acvcleitao
Copy link

acvcleitao commented Dec 9, 2023

There have been some posts related to this issue. Some versions have this corrected some don't. Rolling back to the commit I mentioned solved it for me.
Supposedly this was solved on 2.1.695 and then again in 2.1.703 which were both realesed arround the 18th of October. I'm having the same issues with exactly the same setup as the guy in the #700 (RTX 2060 6G). I don't know what to do.

Also something that's different bettween your error and mine is that my image generates almost fully. I only get the error after the image generation completes. It doesn't save it cause it tries to move the model and then crashes. You can't even get past the generation process right? It just straight up crashes.
So my issue is probably due to CUDA and yours due to memory allocation for the start of the image generation.

@tobiasklnn
Copy link

Try this version of the app -> https://github.com/lllyasviel/Fooocus/tree/9660daff94b4d0f282567b96b3d387817818a4b3 Worked for me.

Sorry for the stupid question but how do i roll back fooocus?

@heltonteixeira
Copy link

Same issue here

@PierreLepagnol
Copy link

Try this version of the app -> https://github.com/lllyasviel/Fooocus/tree/9660daff94b4d0f282567b96b3d387817818a4b3 Worked for me.

Sorry for the stupid question but how do i roll back fooocus?

You may use these lines :

git checkout 9660daff94b4d0f282567b96b3d387817818a4b3
python entry_with_update.py 

@grendahl06
Copy link

when I try this branch on a clean install, I get

File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\cuda_init_.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

@lllyasviel
Copy link
Owner

lllyasviel commented Dec 10, 2023

the latest version is tested with 2060
if it crash, check if you have enough system swap and latest nvidia driver.
if it still does not work, paste full logs.

@grendahl06
Copy link

grendahl06 commented Dec 10, 2023

Torch not compiled with CUDA enabled

thank you for the response. I think most of the people in this list have Radeon GPUs.

if it helps to have the full message, this is what I am seeing:

Traceback (most recent call last):
File "threading.py", line 1016, in bootstrap_inner
File "threading.py", line 953, in run
File "D:\Fooocus_win64\Fooocus\modules\async_worker.py", line 18, in worker
import modules.default_pipeline as pipeline
File "D:\Fooocus_win64\Fooocus\modules\default_pipeline.py", line 258, in
refresh_everything(
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\Fooocus\modules\default_pipeline.py", line 253, in refresh_everything
prepare_text_encoder(async_call=True)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\Fooocus\modules\default_pipeline.py", line 217, in prepare_text_encoder
fcbh.model_management.load_models_gpu([final_clip.patcher, final_expansion.patcher])
File "D:\Fooocus_win64\Fooocus\modules\patch.py", line 479, in patched_load_models_gpu
y = fcbh.model_management.load_models_gpu_origin(*args, **kwargs)
File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\model_management.py", line 402, in load_models_gpu
cur_loaded_model = loaded_model.model_load(lowvram_model_memory)
File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\model_management.py", line 294, in model_load
accelerate.dispatch_model(self.real_model, device_map=device_map, main_device=self.device)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\accelerate\big_modeling.py", line 371, in dispatch_model
attach_align_device_hook_on_blocks(
File "D:\Fooocus_win64\python_embeded\lib\site-packages\accelerate\hooks.py", line 536, in attach_align_device_hook_on_blocks
attach_align_device_hook_on_blocks(
File "D:\Fooocus_win64\python_embeded\lib\site-packages\accelerate\hooks.py", line 536, in attach_align_device_hook_on_blocks
attach_align_device_hook_on_blocks(
File "D:\Fooocus_win64\python_embeded\lib\site-packages\accelerate\hooks.py", line 506, in attach_align_device_hook_on_blocks
add_hook_to_module(module, hook)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\accelerate\hooks.py", line 155, in add_hook_to_module
module = hook.init_hook(module)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\accelerate\hooks.py", line 253, in init_hook
set_module_tensor_to_device(module, name, self.execution_device)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\accelerate\utils\modeling.py", line 292, in set_module_tensor_to_device
new_value = old_value.to(device)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\cuda_init
.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

@lllyasviel
Copy link
Owner

"Torch not compiled with CUDA enabled"
means user mistake and users do not follow official installation guide.

@PierreLepagnol
Copy link

I do have a NVIDIA GeForce RTX 2060, with 6Gb do you think I can run the model ?

I miss 20 mb... is there a way to quantize unet weigths ?

@lllyasviel
Copy link
Owner

the latest version is tested with 2060
if it crash, check if you have enough system swap and latest nvidia driver.
if it still does not work, paste full logs.

@grendahl06
Copy link

reverted all of my local changes, re-ran the setup. I am back to my original error message, if this helps:
Radeon 6650XT 8GB, 32GB ram, AMD 7950

Traceback (most recent call last):
File "D:\Fooocus_win64\Fooocus\modules\async_worker.py", line 803, in worker
handler(task)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\Fooocus\modules\async_worker.py", line 735, in handler
imgs = pipeline.process_diffusion(
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\Fooocus\modules\default_pipeline.py", line 361, in process_diffusion
sampled_latent = core.ksampler(
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\Fooocus\modules\core.py", line 315, in ksampler
samples = fcbh.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\sample.py", line 93, in sample
real_model, positive_copy, negative_copy, noise_mask, models = prepare_sampling(model, noise.shape, positive, negative, noise_mask)
File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\sample.py", line 86, in prepare_sampling
fcbh.model_management.load_models_gpu([model] + models, model.memory_required(noise_shape) + inference_memory)
File "D:\Fooocus_win64\Fooocus\modules\patch.py", line 494, in patched_load_models_gpu
y = fcbh.model_management.load_models_gpu_origin(*args, **kwargs)
File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\model_management.py", line 410, in load_models_gpu
cur_loaded_model = loaded_model.model_load(lowvram_model_memory)
File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\model_management.py", line 293, in model_load
raise e
File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\model_management.py", line 289, in model_load
self.real_model = self.model.patch_model(device_to=patch_model_to) #TODO: do something with loras and offloading to CPU
File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\model_patcher.py", line 191, in patch_model
temp_weight = fcbh.model_management.cast_to_device(weight, device_to, torch.float32, copy=True)
File "D:\Fooocus_win64\Fooocus\backend\headless\fcbh\model_management.py", line 532, in cast_to_device
return tensor.to(device, copy=copy).to(dtype)
RuntimeError: Could not allocate tensor with 117964800 bytes. There is not enough GPU video memory available!
Total time: 24.52 seconds

@PierreLepagnol
Copy link

(fooocus) [pierre@archlinux Fooocus]$ python entry_with_update.py --gpu-only --bf16-unet --bf16-vae --use-pytorch-cross-attention
Update failed.
'refs/heads/HEAD'
Update succeeded.
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
Fooocus version: 2.1.703
Running on local URL: http://127.0.0.1:7860

Thanks for being a Gradio user! If you have questions or feedback, please join our Discord server and chat with us: https://discord.gg/feTf9x3ZSB

To create a public link, set share=True in launch().
Opening in existing browser session.
Total VRAM 5927 MB, total RAM 15861 MB
Set vram state to: HIGH_VRAM
Device: cuda:0 NVIDIA GeForce RTX 2060 : native
VAE dtype: torch.bfloat16
Using pytorch cross attention
[Fooocus] Disabling smart memory
model_type EPS
adm 2560
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
missing {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
loaded straight to GPU
Requested to load SDXLRefiner
Loading 1 new model
Refiner model loaded: /home/pierre/Documents/Fooocus/models/checkpoints/sd_xl_refiner_1.0_0.9vae.safetensors
Exception in thread Thread-2 (worker):
Traceback (most recent call last):
File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/pierre/Documents/Fooocus/modules/async_worker.py", line 18, in worker
import modules.default_pipeline as pipeline
File "/home/pierre/Documents/Fooocus/modules/default_pipeline.py", line 258, in
refresh_everything(
File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/pierre/Documents/Fooocus/modules/default_pipeline.py", line 233, in refresh_everything
refresh_base_model(base_model_name)
File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/pierre/Documents/Fooocus/modules/default_pipeline.py", line 96, in refresh_base_model
xl_base = core.load_model(filename)
File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/pierre/Documents/Fooocus/modules/core.py", line 69, in load_model
unet, clip, vae, clip_vision = load_checkpoint_guess_config(ckpt_filename, embedding_directory=embeddings_path)
File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/sd.py", line 427, in load_checkpoint_guess_config
model = model_config.get_model(sd, "model.diffusion_model.", device=inital_load_device)
File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/supported_models.py", line 156, in get_model
out = model_base.SDXL(self, model_type=self.model_type(state_dict, prefix), device=device)
File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/model_base.py", line 189, in init
super().init(model_config, model_type, device=device)
File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/model_base.py", line 24, in init
self.diffusion_model = UNetModel(**unet_config, device=device)
File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/diffusionmodules/openaimodel.py", line 446, in init
layers.append(SpatialTransformer(
File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 507, in init
[BasicTransformerBlock(inner_dim, n_heads, d_head, dropout=dropout, context_dim=context_dim[d],
File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 507, in
[BasicTransformerBlock(inner_dim, n_heads, d_head, dropout=dropout, context_dim=context_dim[d],
File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 353, in init
self.ff = FeedForward(dim, dropout=dropout, glu=gated_ff, dtype=dtype, device=device, operations=operations)
File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 73, in init
) if not glu else GEGLU(dim, inner_dim, dtype=dtype, device=device, operations=operations)
File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 58, in init
self.proj = operations.Linear(dim_in, dim_out * 2, dtype=dtype, device=device)
File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ops.py", line 11, in init
self.weight = torch.nn.Parameter(torch.empty((out_features, in_features), **factory_kwargs))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacty of 5.79 GiB of which 19.81 MiB is free. Including non-PyTorch memory, this process has 5.76 GiB memory in use. Of the allocated memory 5.59 GiB is allocated by PyTorch, and 80.92 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@lllyasviel
Copy link
Owner

if any AMD card says "Could not allocate tensor with X bytes. There is not enough GPU video memory available"
then it means this amd card is not enough for run sdxl unfortunatly.

current amd is still experimental and does not works as good as nvidia.

however we promise best support across all software, that is, if you are able to use same AMD device in automatic1111 or comfyui or invoke or sdnext or etc, to run sdxl successfully, please let us know and we will support it.

however, if all software fail to run sdxl on your device, then we have no method to make it work.

also, the linux version of fooocus uses rocm, and may have better support for amd.

@grendahl06
Copy link

if any AMD card says "Could not allocate tensor with X bytes. There is not enough GPU video memory available" then it means this amd card is not enough for run sdxl unfortunatly.

current amd is still experimental and does not works as good as nvidia.

however we promise best support across all software, that is, if you are able to use same AMD device in automatic1111 or comfyui or invoke or sdnext or etc, to run sdxl successfully, please let us know and we will support it.

however, if all software fail to run sdxl on your device, then we have no method to make it work.

also, the linux version of fooocus uses rocm, and may have better support for amd.

thank you for the answer. I will try to create a Linux VM later today or tomorrow. Do you recommend any specific flavors of linux?

Great work. I'm looking forward to being able to use it successfully.

@lllyasviel
Copy link
Owner

(fooocus) [pierre@archlinux Fooocus]$ python entry_with_update.py --gpu-only --bf16-unet --bf16-vae --use-pytorch-cross-attention Update failed. 'refs/heads/HEAD' Update succeeded. Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] Fooocus version: 2.1.703 Running on local URL: http://127.0.0.1:7860

Thanks for being a Gradio user! If you have questions or feedback, please join our Discord server and chat with us: https://discord.gg/feTf9x3ZSB

To create a public link, set share=True in launch(). Opening in existing browser session. Total VRAM 5927 MB, total RAM 15861 MB Set vram state to: HIGH_VRAM Device: cuda:0 NVIDIA GeForce RTX 2060 : native VAE dtype: torch.bfloat16 Using pytorch cross attention [Fooocus] Disabling smart memory model_type EPS adm 2560 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE missing {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'} loaded straight to GPU Requested to load SDXLRefiner Loading 1 new model Refiner model loaded: /home/pierre/Documents/Fooocus/models/checkpoints/sd_xl_refiner_1.0_0.9vae.safetensors Exception in thread Thread-2 (worker): Traceback (most recent call last): File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/home/pierre/Documents/Fooocus/modules/async_worker.py", line 18, in worker import modules.default_pipeline as pipeline File "/home/pierre/Documents/Fooocus/modules/default_pipeline.py", line 258, in refresh_everything( File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/Documents/Fooocus/modules/default_pipeline.py", line 233, in refresh_everything refresh_base_model(base_model_name) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/Documents/Fooocus/modules/default_pipeline.py", line 96, in refresh_base_model xl_base = core.load_model(filename) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/miniconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/pierre/Documents/Fooocus/modules/core.py", line 69, in load_model unet, clip, vae, clip_vision = load_checkpoint_guess_config(ckpt_filename, embedding_directory=embeddings_path) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/sd.py", line 427, in load_checkpoint_guess_config model = model_config.get_model(sd, "model.diffusion_model.", device=inital_load_device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/supported_models.py", line 156, in get_model out = model_base.SDXL(self, model_type=self.model_type(state_dict, prefix), device=device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/model_base.py", line 189, in init super().init(model_config, model_type, device=device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/model_base.py", line 24, in init self.diffusion_model = UNetModel(**unet_config, device=device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/diffusionmodules/openaimodel.py", line 446, in init layers.append(SpatialTransformer( File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 507, in init [BasicTransformerBlock(inner_dim, n_heads, d_head, dropout=dropout, context_dim=context_dim[d], File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 507, in [BasicTransformerBlock(inner_dim, n_heads, d_head, dropout=dropout, context_dim=context_dim[d], File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 353, in init self.ff = FeedForward(dim, dropout=dropout, glu=gated_ff, dtype=dtype, device=device, operations=operations) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 73, in init ) if not glu else GEGLU(dim, inner_dim, dtype=dtype, device=device, operations=operations) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ldm/modules/attention.py", line 58, in init self.proj = operations.Linear(dim_in, dim_out * 2, dtype=dtype, device=device) File "/home/pierre/Documents/Fooocus/backend/headless/fcbh/ops.py", line 11, in init self.weight = torch.nn.Parameter(torch.empty((out_features, in_features), **factory_kwargs)) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacty of 5.79 GiB of which 19.81 MiB is free. Including non-PyTorch memory, this process has 5.76 GiB memory in use. Of the allocated memory 5.59 GiB is allocated by PyTorch, and 80.92 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

This log indicates that you are mislead by some bad turorials, and you have already broken your envs with wrong command flags.
Please only trust official installation guide and try fresh install again.

@itoch
Copy link

itoch commented Dec 11, 2023

Sorry, but can someone tell me how to uninstall fooocus from windows? It is enough to just delete the files i downloaded?

@TDola
Copy link

TDola commented Dec 11, 2023

I sure hope this gets fixed for AMD. With nVidia leaving the consumer graphics card space, we are left with AMD and Intel.

@grendahl06
Copy link

I sure hope this gets fixed for AMD. With nVidia leaving the consumer graphics card space, we are left with AMD and Intel.

same. In the mean time, I've used the --cpu switch, which takes roughly 50s/it with an AMD 7950. This ends up being ~25 minutes to find out how the code interprets my prompt.

@TDola
Copy link

TDola commented Dec 12, 2023

I went down the rabbit hole on this. SDXL claims they do not support AMD cards on Windows. And the Linux version is reportedly very broken, requiring you to use specific versions based on your graphics card.
Automatic1111 claims it does work on AMD, I have not tried it. ComfyUI also makes this claim but the instructions are equally vague and reports that it's broken too. So I think we just have to wait for SDXL to fix their bugs. And it seems they have little incentive to do so, they after all can afford an nVidia card.
All hope is not lost however, just yesterday AMD released a DirectML update. It didn't fix this issue either, but it shows work is being done.

@Stefan-Mayer
Copy link

Sorry, but can someone tell me how to uninstall fooocus from windows? It is enough to just delete the files i downloaded?

I would like to know that too. A nice clean uninstall :)

@lllyasviel
Copy link
Owner

hi deleting the folder is a clean uninstall if users follow the official installation guide

@grendahl06
Copy link

I think i've missed a step in the setup where the Windows AMD says see previous in a recursive reference....

at any rate, from the Linux AMD section it says to run this command:
pip uninstall torch torchvision torchaudio torchtext functorch xformers
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6

when I enter the first, it shows I have the CU builds, which is clearly part of the problem.

When I run the second, it says there is no version matching "none":
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
ERROR: No matching distribution found for torch

what do I need to add to this second command to get these dependencies?

Thank you

@lllyasviel
Copy link
Owner

Hi,

for Windows AMD, please follow the section "Windows(AMD GPUs)"

for Linux AMD, please follow the section "Linux (AMD GPUs)"

if you see "Could not find a version that satisfies the requirement torch", then you are using linux guide for windows, and that will not work.

@grendahl06
Copy link

grendahl06 commented Dec 12, 2023

Hi,

for Windows AMD, please follow the section "Windows(AMD GPUs)"

for Linux AMD, please follow the section "Linux (AMD GPUs)"

if you see "Could not find a version that satisfies the requirement torch", then you are using linux guide for windows, and that will not work.

thank you for the fast answer. when I run the command listed:
.\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
.\python_embeded\python.exe -m pip install torch-directml
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml --always-no-vram
pause

it still tells me it is looking for Cuda.

In my reading of the Windows AMD install, it says download and then run the commands copied above. If I do not need to download these extra WHL files, can you tell me what step I've missed in order for the code to not look for Cuda dependencies?

the "--always-no-vram" switch appears to be new but always triggers the Cuda message

Thank you

@lllyasviel
Copy link
Owner

no. you do not need to run any commands.

The instruction of “Windows (AMD GPUs)” is edit the “run.bat” and double click the “run.bat”.

Please follow the guide exactly, and if it does not work, paste the full log.

@grendahl06
Copy link

when run with a "--cpu" switch (which was working last night to bypass the fact AMD GPUs do not work), I now gets this message

D:\Fooocus_win64>.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml --preset anime --cpu
Already up-to-date
Update succeeded.
[System ARGV] ['Fooocus\entry_with_update.py', '--directml', '--preset', 'anime', '--cpu']
usage: entry_with_update.py [-h] [--listen [IP]] [--port PORT] [--disable-header-check [ORIGIN]]
[--web-upload-size WEB_UPLOAD_SIZE] [--external-working-path PATH [PATH ...]]
[--output-path OUTPUT_PATH] [--temp-path TEMP_PATH] [--cache-path CACHE_PATH]
[--in-browser] [--disable-in-browser] [--gpu-device-id DEVICE_ID]
[--async-cuda-allocation | --disable-async-cuda-allocation] [--disable-attention-upcast]
[--all-in-fp32 | --all-in-fp16]
[--unet-in-bf16 | --unet-in-fp16 | --unet-in-fp8-e4m3fn | --unet-in-fp8-e5m2]
[--vae-in-fp16 | --vae-in-fp32 | --vae-in-bf16]
[--clip-in-fp8-e4m3fn | --clip-in-fp8-e5m2 | --clip-in-fp16 | --clip-in-fp32]
[--directml [DIRECTML_DEVICE]] [--disable-ipex-hijack]
[--preview-option [none,auto,fast,taesd]]
[--attention-split | --attention-quad | --attention-pytorch] [--disable-xformers]
[--always-gpu | --always-high-vram | --always-normal-vram | --always-low-vram | --always-no-vram | --always-cpu]
[--always-offload-from-vram] [--disable-server-log] [--debug-mode]
[--is-windows-embedded-python] [--disable-server-info] [--share] [--preset PRESET]
[--language LANGUAGE] [--disable-offload-from-vram] [--theme THEME] [--disable-image-log]
[--disable-analytics]
entry_with_update.py: error: unrecognized arguments: --cpu

@grendahl06
Copy link

grendahl06 commented Dec 12, 2023

when run without the switch, I see a memory exception on a 8GB Radeon 6650XT

Traceback (most recent call last):
File "D:\Fooocus_win64\Fooocus\modules\async_worker.py", line 803, in worker
handler(task)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\Fooocus\modules\async_worker.py", line 735, in handler
imgs = pipeline.process_diffusion(
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\Fooocus\modules\default_pipeline.py", line 430, in process_diffusion
sampled_latent = core.ksampler(
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Fooocus_win64\Fooocus\modules\core.py", line 315, in ksampler
samples = ldm_patched.modules.sample.sample(model,
File "D:\Fooocus_win64\Fooocus\ldm_patched\modules\sample.py", line 93, in sample
real_model, positive_copy, negative_copy, noise_mask, models = prepare_sampling(model, noise.shape, positive, negative, noise_mask)
File "D:\Fooocus_win64\Fooocus\ldm_patched\modules\sample.py", line 86, in prepare_sampling
ldm_patched.modules.model_management.load_models_gpu([model] + models, model.memory_required([noise_shape[0] * 2] + list(noise_shape[1:])) + inference_memory)
File "D:\Fooocus_win64\Fooocus\modules\patch.py", line 469, in patched_load_models_gpu
y = ldm_patched.modules.model_management.load_models_gpu_origin(*args, **kwargs)
File "D:\Fooocus_win64\Fooocus\ldm_patched\modules\model_management.py", line 410, in load_models_gpu
cur_loaded_model = loaded_model.model_load(lowvram_model_memory)
File "D:\Fooocus_win64\Fooocus\ldm_patched\modules\model_management.py", line 293, in model_load
raise e
File "D:\Fooocus_win64\Fooocus\ldm_patched\modules\model_management.py", line 289, in model_load
self.real_model = self.model.patch_model(device_to=patch_model_to) #TODO: do something with loras and offloading to CPU
File "D:\Fooocus_win64\Fooocus\ldm_patched\modules\model_patcher.py", line 191, in patch_model
temp_weight = ldm_patched.modules.model_management.cast_to_device(weight, device_to, torch.float32, copy=True)
File "D:\Fooocus_win64\Fooocus\ldm_patched\modules\model_management.py", line 583, in cast_to_device
return tensor.to(device, copy=copy, non_blocking=non_blocking).to(dtype, non_blocking=non_blocking)
RuntimeError: Could not allocate tensor with 117964800 bytes. There is not enough GPU video memory available!
Total time: 39.92 seconds

the task manager shows
image

@lllyasviel
Copy link
Owner

hi

if any AMD card says "Could not allocate tensor with X bytes. There is not enough GPU video memory available"
then it means this amd card is not enough for run sdxl unfortunatly.

current amd is still experimental and does not works as good as nvidia.

however we promise best support across all software, that is, if you are able to use same AMD device in automatic1111 or comfyui or invoke or sdnext or etc, to run sdxl successfully, please let us know and we will support it.

however, if all software fail to run sdxl on your device, then we have no method to make it work.

also, the linux version of fooocus uses rocm, and may have better support for amd.

@grendahl06
Copy link

I am a complete newbie on this. After you mentioned testing other SDXL projects, currently this one runs on my machine:
https://github.com/vladmandic/automatic/wiki/Installation

it appears to be a project that split from automatic1111, and it is able to use my GPU without crashing.

Is that a helpful data point?

@lllyasviel
Copy link
Owner

sure. what configuation you are using with it? which SDXL checkpoint?

@grendahl06
Copy link

the default setting were working. It used something called v1-5-pruned-emaonly. Where should I check to find the checkpoint? (I had assumed it would be latest)

When I turned on additional things (none of which I understand), it got 80% into rendering but did give an out of memory.

@lllyasviel
Copy link
Owner

lllyasviel commented Dec 12, 2023

v1-5-pruned-emaonly is not SDXL. it is a SD1.5 model. SDXL is 3x larger and requires 3x more resources to get better results.

however if SD1.5 works on your device, you may just stick to that UI and use it.

@DolphinIQ
Copy link

@lllyasviel I have Windows11 and 12GB AMD GPU producing the same error with not being able to allocate enough memory
RuntimeError: Could not allocate tensor with 165150720 bytes. There is not enough GPU video memory available! Total time: 77.84 seconds

However I was able to run AUTOMATIC1111 SD webui SDXL (6+GBs) models with these settings inside the bat file:
set COMMANDLINE_ARGS=--opt-sub-quad-attention --lowvram --disable-nan-check --no-half

  • First three flags were a must for non-SDXL models, making sure my GPU usage oscillates between 90-95% usage, instead of full 100% and trying to get even more in the case of Fooocus, eventually running out of memory
  • The last argument --no-half was essential for running SDXL models as otherwise, despite working seemingly correct (could see preview generated), the end result was always full black image (note that even without this flag, it would not crash and run out of gpu memory, only the end result was comprimised)

Its important to note that the SDXL models I successfully used in sd-webui, were the same models Fooocus uses, copied and pasted! My machine can run the same Fooocus models, but in a different project. Meaning <16GB vram AMD GPUs CAN handle SDXL. It seems to be a matter of some tricky memory settings.

I am a programmer and could try to provide some help, although this is not at all my area of expertise. I'd love to be able to run Fooocus as I've heard very good things about it, producing great results and is much simpler to use. Love your project ❤️

@mashb1t mashb1t added bug Something isn't working help wanted Extra attention is needed labels Dec 29, 2023
@GZGavinZhao
Copy link

GZGavinZhao commented Jan 4, 2024

ROCm on Windows (either via DirectML or via ROCm Windows SDK) generally performs worst than ROCm on Linux. I understand this may not be an option for everyone, but if you can, when using AMD hardware you will almost always have a much better experience running Fooocus (or almost any AI that needs GPU acceleration) on Linux compared to on Windows.

@Laurent-VueJS
Copy link

Try this version of the app -> https://github.com/lllyasviel/Fooocus/tree/9660daff94b4d0f282567b96b3d387817818a4b3 Worked for me.

link is broken. Does not exist anymore (link is OK then the windows download link is broken :-(

@TDola
Copy link

TDola commented Jan 15, 2024

It works on my 6700xt now, this issue was resolved for me with the latest version.

@DolphinIQ
Copy link

Yes, seems like this has been fixed in the newest version. Can use Fooocus on Windows with AMD GPU now!

@Stefan-Mayer
Copy link

Yes, seems like this has been fixed in the newest version. Can use Fooocus on Windows with AMD GPU now!

what AMD GPU are you using?

@DolphinIQ
Copy link

Yes, seems like this has been fixed in the newest version. Can use Fooocus on Windows with AMD GPU now!

what AMD GPU are you using?

AMD Radeon RX 6700 XT

@mashb1t mashb1t added question Further information is requested and removed help wanted Extra attention is needed labels Feb 22, 2024
@mashb1t
Copy link
Collaborator

mashb1t commented Feb 22, 2024

It works on my 6700xt now, this issue was resolved for me with the latest version.

@CasualVult is this issue still present for you using the latest version of Fooocus or can it be closed?

@mashb1t mashb1t closed this as not planned Won't fix, can't repro, duplicate, stale Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests