Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Device type privateuseone is not supported for torch.Generator() api. #763

Closed
correquemecago opened this issue Oct 22, 2023 · 25 comments

Comments

@correquemecago
Copy link

At first, sorry for my english.... I love your work, and I have used it a lot in collab, but I can't get it to run on my machine with an AMD 6700xt graphics card. It seems like it doesn't recognize my graphics card correctly. For example, it indicates that VRAM: 1Gb instead of 12 as the card has, for example. Can you think of any solution?

To create a public link, set share=True in launch().
Using directml with device:
Total VRAM 1024 MB, total RAM 16310 MB
Set vram state to: NORMAL_VRAM
Device: privateuseone
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
[Fooocus] Disabling smart memory
model_type EPS
adm 2560
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
missing {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Refiner model loaded: G:\Fooocus\Fooocus\models\checkpoints\sd_xl_refiner_1.0_0.9vae.safetensors
model_type EPS
adm 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
missing {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
Base model loaded: G:\Fooocus\Fooocus\models\checkpoints\sd_xl_base_1.0_0.9vae.safetensors
LoRAs loaded: [('sd_xl_offset_example-lora_1.0.safetensors', 0.5), ('None', 0.5), ('None', 0.5), ('None', 0.5), ('None', 0.5)]
Fooocus Expansion engine loaded for privateuseone:0, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 6.30 seconds
App started successful. Use the app with http://127.0.0.1:7860/ or 127.0.0.1:7860
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 7.0
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 20
[Fooocus] Initializing ...
[Fooocus] Loading models ...
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
G:\Fooocus\python_embeded\lib\site-packages\transformers\generation\utils.py:723: UserWarning: The operator 'aten::repeat_interleave.Tensor' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.)
input_ids = input_ids.repeat_interleave(expand_size, dim=0)
[Prompt Expansion] New suffix: extremely detailed, fantastic details full face, mouth, trending on artstation, pixiv, cgsociety, hyperdetailed Unreal Engine 4k 8k ultra HD, WLOP
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] New suffix: intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, in the style of cam sykes, wayne barlowe, igor kieryluk
[Fooocus] Encoding positive #1 ...
[Fooocus Model Management] Moving model(s) has taken 0.26 seconds
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
Preparation time: 4.14 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.02916753850877285, sigma_max = 14.614643096923828
Traceback (most recent call last):
File "G:\Fooocus\Fooocus\modules\async_worker.py", line 585, in worker
handler(task)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\Fooocus\Fooocus\modules\async_worker.py", line 518, in handler
imgs = pipeline.process_diffusion(
File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\Fooocus\Fooocus\modules\default_pipeline.py", line 347, in process_diffusion
modules.patch.globalBrownianTreeNoiseSampler = BrownianTreeNoiseSampler(
File "G:\Fooocus\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 119, in init
self.tree = BatchedBrownianTree(x, t0, t1, seed, cpu=cpu)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 85, in init
self.trees = [torchsde.BrownianTree(t0, w0, t1, entropy=s, **kwargs) for s in seed]
File "G:\Fooocus\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 85, in
self.trees = [torchsde.BrownianTree(t0, w0, t1, entropy=s, **kwargs) for s in seed]
File "G:\Fooocus\python_embeded\lib\site-packages\torchsde_brownian\derived.py", line 155, in init
self._interval = brownian_interval.BrownianInterval(t0=t0,
File "G:\Fooocus\python_embeded\lib\site-packages\torchsde_brownian\brownian_interval.py", line 540, in init
W = self._randn(initial_W_seed) * math.sqrt(t1 - t0)
File "G:\Fooocus\python_embeded\lib\site-packages\torchsde_brownian\brownian_interval.py", line 234, in _randn
return _randn(size, self._top._dtype, self._top._device, seed)
File "G:\Fooocus\python_embeded\lib\site-packages\torchsde_brownian\brownian_interval.py", line 32, in _randn
generator = torch.Generator(device).manual_seed(int(seed))
RuntimeError: Device type privateuseone is not supported for torch.Generator() api.
Total time: 67.99 seconds

@correquemecago
Copy link
Author

this solved the problem #624 but now I have another:

To create a public link, set share=True in launch().
Using directml with device:
Total VRAM 1024 MB, total RAM 16310 MB
Set vram state to: NORMAL_VRAM
Device: privateuseone
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
[Fooocus] Disabling smart memory
model_type EPS
adm 2560
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
missing {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Refiner model loaded: G:\Fooocus\Fooocus\models\checkpoints\sd_xl_refiner_1.0_0.9vae.safetensors
model_type EPS
adm 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
missing {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Base model loaded: G:\Fooocus\Fooocus\models\checkpoints\sd_xl_base_1.0_0.9vae.safetensors
LoRAs loaded: [('sd_xl_offset_example-lora_1.0.safetensors', 0.5), ('None', 0.5), ('None', 0.5), ('None', 0.5), ('None', 0.5)]
Fooocus Expansion engine loaded for privateuseone:0, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 6.33 seconds
App started successful. Use the app with http://127.0.0.1:7860/ or 127.0.0.1:7860
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 7.0
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 20
[Fooocus] Initializing ...
[Fooocus] Loading models ...
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
G:\Fooocus\python_embeded\lib\site-packages\transformers\generation\utils.py:723: UserWarning: The operator 'aten::repeat_interleave.Tensor' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.)
input_ids = input_ids.repeat_interleave(expand_size, dim=0)
[Prompt Expansion] New suffix: extremely detailed digital painting, in the style of fenghua zhong and ruan jia and jeremy lipking and peter mohrbacher, mystical colors, rim light, beautiful lighting, 8 k, stunning scene, raytracing, octane, trending on artstation
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] New suffix: intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, art by Artgerm and Greg Rutkowski and Alphonse Mucha, UHD
[Fooocus] Encoding positive #1 ...
[Fooocus Model Management] Moving model(s) has taken 0.20 seconds
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
Preparation time: 4.61 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.02916753850877285, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 39.21 seconds
[Sampler] Fooocus sampler is activated.
0%| | 0/30 [00:08<?, ?it/s]
Traceback (most recent call last):
File "G:\Fooocus\Fooocus\modules\async_worker.py", line 585, in worker
handler(task)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\Fooocus\Fooocus\modules\async_worker.py", line 518, in handler
imgs = pipeline.process_diffusion(
File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\Fooocus\Fooocus\modules\default_pipeline.py", line 354, in process_diffusion
sampled_latent = core.ksampler(
File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\Fooocus\Fooocus\modules\core.py", line 263, in ksampler
samples = fcbh.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
File "G:\Fooocus\Fooocus\backend\headless\fcbh\sample.py", line 97, in sample
samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\samplers.py", line 781, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler(), sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\Fooocus\Fooocus\modules\sample_hijack.py", line 128, in sample_hacked
samples = sampler.sample(model_wrap, sigmas, extra_args, callback_wrap, noise, latent_image, denoise_mask, disable_pbar)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\samplers.py", line 638, in sample
samples = getattr(k_diffusion_sampling, "sample_{}".format(sampler_name))(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **extra_options)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\Fooocus\Fooocus\modules\patch.py", line 316, in sample_dpmpp_fooocus_2m_sde_inpaint_seamless
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\samplers.py", line 326, in forward
out = self.inner_model(x, sigma, cond=cond, uncond=uncond, cond_scale=cond_scale, model_options=model_options, seed=seed)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in call_impl
return forward_call(*args, **kwargs)
File "G:\Fooocus\Fooocus\modules\patch.py", line 198, in patched_discrete_eps_ddpm_denoiser_forward
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\k_diffusion\external.py", line 155, in get_eps
return self.inner_model.apply_model(*args, **kwargs)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\samplers.py", line 314, in apply_model
out = sampling_function(self.inner_model.apply_model, x, timestep, uncond, cond, cond_scale, model_options=model_options, seed=seed)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\samplers.py", line 292, in sampling_function
cond, uncond = calc_cond_uncond_batch(model_function, cond, uncond, x, timestep, max_total_area, model_options)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\samplers.py", line 266, in calc_cond_uncond_batch
output = model_options['model_function_wrapper'](model_function, {"input": input_x, "timestep": timestep
, "c": c, "cond_or_uncond": cond_or_uncond}).chunk(batch_chunks)
File "G:\Fooocus\Fooocus\modules\patch.py", line 206, in patched_model_function_wrapper
return func(x, t, **c)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\model_base.py", line 64, in apply_model
return self.diffusion_model(xc, t, context=context, y=c_adm, control=control, transformer_options=transformer_options).float()
File "G:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "G:\Fooocus\Fooocus\modules\patch.py", line 452, in patched_unet_forward
h = forward_timestep_embed(module, h, emb, context, transformer_options, output_shape)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\ldm\modules\diffusionmodules\openaimodel.py", line 56, in forward_timestep_embed
x = layer(x, context, transformer_options)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 536, in forward
x = block(x, context=context[i], transformer_options=transformer_options)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 366, in forward
return checkpoint(self._forward, (x, context, transformer_options), self.parameters(), self.checkpoint)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\ldm\modules\diffusionmodules\util.py", line 123, in checkpoint
return func(*inputs)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 431, in _forward
n = self.attn1(n, context=context_attn1, value=value_attn1)
File "G:\Fooocus\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 342, in forward
out = optimized_attention(q, k, v, self.heads)
File "G:\Fooocus\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 172, in attention_sub_quad
hidden_states = efficient_dot_product_attention(
File "G:\Fooocus\Fooocus\backend\headless\fcbh\ldm\modules\sub_quadratic_attention.py", line 243, in efficient_dot_product_attention
res = torch.cat([
File "G:\Fooocus\Fooocus\backend\headless\fcbh\ldm\modules\sub_quadratic_attention.py", line 244, in
compute_query_chunk_attn(
File "G:\Fooocus\Fooocus\backend\headless\fcbh\ldm\modules\sub_quadratic_attention.py", line 150, in _get_attention_scores_no_kv_chunking
attn_scores = torch.baddbmm(
RuntimeError: Could not allocate tensor with 165150720 bytes. There is not enough GPU video memory available!
Total time: 137.16 seconds

@disiztheend
Copy link

this solved the problem #624 but now I have another:

Same GPU, same problem :

RuntimeError: Could not allocate tensor with 165150720 bytes. There is not enough GPU video memory available!
Total time: 28.65 seconds

Any solution available?
Do you have a date for the AMD update ?

@disiztheend
Copy link

disiztheend commented Nov 2, 2023

Quick update
I have followed the instructions for Linux Install (under Solus OS) and I have a new error message when I launch the generation: Erreur de segmentation (core dumped)

@demoshane
Copy link

Hit the same. 7900xtx GPU. Ran it with altered bat-file:
.\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y .\python_embeded\python.exe -m pip install torch-directml .\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml pause

@zxcvxzcv-johndoe
Copy link

Hit the same. 7900xtx GPU. Ran it with altered bat-file: .\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y .\python_embeded\python.exe -m pip install torch-directml .\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml pause

I am having the same issue with 6800XT with latest drivers and Win11.

Sadly when you google the error you find this thread and also this microsoft/DirectML#374 and someone reported that to Microsoft back in January. But probably good to also let them know the issue still exists.

@piotrazsko
Copy link

piotrazsko commented Nov 20, 2023

I am having the same problem with rx580 . I tried on Windows and linux.

@HyunJae5463
Copy link

same issue

@WiRight
Copy link

WiRight commented Nov 21, 2023

same here

@Roninos
Copy link

Roninos commented Nov 23, 2023

RX5700 not work

@zxcvxzcv-johndoe
Copy link

Guys I think you need to post that to Microsoft since I think they are the guys who must fix this? To this link microsoft/DirectML#374

@Roninos
Copy link

Roninos commented Nov 23, 2023

Here's a working fix #624 (comment)

@WiRight
Copy link

WiRight commented Nov 23, 2023

Here's a working fix #624 (comment)

RX 6750 XT NITRO+ not working

@disiztheend
Copy link

disiztheend commented Nov 23, 2023

My computer is Win 11Pro, AMD Ryzen 9 5900X, 64 Gb RAM and AMD 6700xt 12 Gb.

Without the fix #624 I have the following error:
RuntimeError: Device type privateuseone is not supported for torch.Generator() api.

With the fix, the error changes:
Error message = RuntimeError: Could not allocate tensor with 165150720 bytes. There is not enough GPU video memory available!

I tried replacing device by "cpu" as seen somewhere but still the last error message.
I think I will continue using Colab (is it really faster when the script is working locally?).

full log:

C:\Fooocus_win64_2-1-754>.\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
Found existing installation: torch 2.0.0
Uninstalling torch-2.0.0:
Successfully uninstalled torch-2.0.0
Found existing installation: torchvision 0.15.1
Uninstalling torchvision-0.15.1:
Successfully uninstalled torchvision-0.15.1
WARNING: Skipping torchaudio as it is not installed.
WARNING: Skipping torchtext as it is not installed.
WARNING: Skipping functorch as it is not installed.
WARNING: Skipping xformers as it is not installed.
C:\Fooocus_win64_2-1-754>.\python_embeded\python.exe -m pip install torch-directml
Requirement already satisfied: torch-directml in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (0.2.0.dev230426)
Collecting torch==2.0.0 (from torch-directml)
Downloading torch-2.0.0-cp310-cp310-win_amd64.whl (172.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 172.3/172.3 MB 40.9 MB/s eta 0:00:00
Collecting torchvision==0.15.1 (from torch-directml)
Downloading torchvision-0.15.1-cp310-cp310-win_amd64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 74.1 MB/s eta 0:00:00
Requirement already satisfied: filelock in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (3.12.2)
Requirement already satisfied: typing-extensions in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (4.7.1)
Requirement already satisfied: sympy in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (1.12)
Requirement already satisfied: networkx in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (3.1)
Requirement already satisfied: jinja2 in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (3.1.2)
Requirement already satisfied: numpy in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (from torchvision==0.15.1->torch-directml) (1.23.5)
Requirement already satisfied: requests in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (from torchvision==0.15.1->torch-directml) (2.31.0)
Requirement already satisfied: pillow!=8.3.,>=5.3.0 in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (from torchvision==0.15.1->torch-directml) (9.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (from jinja2->torch==2.0.0->torch-directml) (2.1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (from requests->torchvision==0.15.1->torch-directml) (3.1.0)
Requirement already satisfied: idna<4,>=2.5 in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (from requests->torchvision==0.15.1->torch-directml) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (from requests->torchvision==0.15.1->torch-directml) (2.0.3)
Requirement already satisfied: certifi>=2017.4.17 in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (from requests->torchvision==0.15.1->torch-directml) (2023.5.7)
Requirement already satisfied: mpmath>=0.19 in c:\fooocus_win64_2-1-754\python_embeded\lib\site-packages (from sympy->torch==2.0.0->torch-directml) (1.3.0)
DEPRECATION: torchsde 0.2.5 has a non-standard dependency specifier numpy>=1.19.
; python_version >= "3.7". pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of torchsde or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at pypa/pip#12063
Installing collected packages: torch, torchvision
WARNING: The scripts convert-caffe2-to-onnx.exe, convert-onnx-to-caffe2.exe and torchrun.exe are installed in 'C:\Fooocus_win64_2-1-754\python_embeded\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed torch-2.0.0 torchvision-0.15.1
C:\Fooocus_win64_2-1-754>.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml
Already up-to-date
Update succeeded.
[System ARGV] ['Fooocus\entry_with_update.py', '--directml']
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
Fooocus version: 2.1.823
Running on local URL: http://127.0.0.1:7865
To create a public link, set share=True in launch().
Using directml with device:
Total VRAM 1024 MB, total RAM 65439 MB
Set vram state to: NORMAL_VRAM
Disabling smart memory management
Device: privateuseone
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
Refiner unloaded.
model_type EPS
adm 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra keys {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Base model loaded: C:\Fooocus_win64_2-1-754\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [C:\Fooocus_win64_2-1-754\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [C:\Fooocus_win64_2-1-754\Fooocus\models\loras\sd_xl_offset_example-lora_1.0.safetensors] for UNet [C:\Fooocus_win64_2-1-754\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 2908861330974640467
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] test, cinematic, highly detailed, intricate detail, dramatic light, gorgeous, colorful, polished, symmetry, full color, very inspirational, amazing, fine, perfect, artistic, surreal, beautiful,, balanced, deep colors, inspired, complex, glowing, magical, illuminated, mystical, rich, professional, best, winning, futuristic, great, hopeful, unique, cool
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] test, extremely detailed, fantastic composition, cinematic atmosphere, dynamic dramatic, precise perfect, aesthetic, very inspirational, color set, inspired, vibrant colors, rational, epic, stunning, inspiring, highly educated, clear, focused, passionate, unique, beautiful, attractive, confident, futuristic, new, best, creative, positive, cute, friendly, amazing, elegant, surreal
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 7.51 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 13.07 seconds
0%| | 0/30 [00:05<?, ?it/s]
Traceback (most recent call last):
File "C:\Fooocus_win64_2-1-754\Fooocus\modules\async_worker.py", line 803, in worker
handler(task)
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\Fooocus\modules\async_worker.py", line 735, in handler
imgs = pipeline.process_diffusion(
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\Fooocus\modules\default_pipeline.py", line 361, in process_diffusion
sampled_latent = core.ksampler(
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\Fooocus\modules\core.py", line 315, in ksampler
samples = fcbh.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\sample.py", line 100, in sample
samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\samplers.py", line 711, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\Fooocus\modules\sample_hijack.py", line 151, in sample_hacked
samples = sampler.sample(model_wrap, sigmas, extra_args, callback_wrap, noise, latent_image, denoise_mask, disable_pbar)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\samplers.py", line 556, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 701, in sample_dpmpp_2m_sde_gpu
return sample_dpmpp_2m_sde(model, x, sigmas, extra_args=extra_args, callback=callback, disable=disable, eta=eta, s_noise=s_noise, noise_sampler=noise_sampler, solver_type=solver_type)
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 613, in sample_dpmpp_2m_sde
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\Fooocus\modules\patch.py", line 331, in patched_KSamplerX0Inpaint_forward
out = self.inner_model(x, sigma,
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in call_impl
return forward_call(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\samplers.py", line 267, in forward
return self.apply_model(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\samplers.py", line 264, in apply_model
out = sampling_function(self.inner_model, x, timestep, uncond, cond, cond_scale, model_options=model_options, seed=seed)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\samplers.py", line 252, in sampling_function
cond, uncond = calc_cond_uncond_batch(model, cond, uncond, x, timestep, model_options)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\samplers.py", line 230, in calc_cond_uncond_batch
output = model.apply_model(input_x, timestep
, **c).chunk(batch_chunks)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\model_base.py", line 68, in apply_model
model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\Fooocus\modules\patch.py", line 461, in patched_unet_forward
h = forward_timestep_embed(module, h, emb, context, transformer_options, output_shape)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\ldm\modules\diffusionmodules\openaimodel.py", line 56, in forward_timestep_embed
x = layer(x, context, transformer_options)
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 560, in forward
x = block(x, context=context[i], transformer_options=transformer_options)
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 390, in forward
return checkpoint(self._forward, (x, context, transformer_options), self.parameters(), self.checkpoint)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\ldm\modules\diffusionmodules\util.py", line 123, in checkpoint
return func(*inputs)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 455, in _forward
n = self.attn1(n, context=context_attn1, value=value_attn1)
File "C:\Fooocus_win64_2-1-754\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 366, in forward
out = optimized_attention(q, k, v, self.heads)
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\ldm\modules\attention.py", line 177, in attention_sub_quad
hidden_states = efficient_dot_product_attention(
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\ldm\modules\sub_quadratic_attention.py", line 244, in efficient_dot_product_attention
res = torch.cat([
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\ldm\modules\sub_quadratic_attention.py", line 245, in
compute_query_chunk_attn(
File "C:\Fooocus_win64_2-1-754\Fooocus\backend\headless\fcbh\ldm\modules\sub_quadratic_attention.py", line 160, in _get_attention_scores_no_kv_chunking
attn_probs = attn_scores.softmax(dim=-1)
RuntimeError: Could not allocate tensor with 165150720 bytes. There is not enough GPU video memory available!
Total time: 26.49 seconds

@ahaMfM
Copy link

ahaMfM commented Nov 29, 2023

same issue

@Menober
Copy link

Menober commented Nov 29, 2023

Yup, same issue, fix with brownian_interval.py helped but problem with memory allocation still occurs.

@Analog-Valhalla
Copy link

Here's a working fix #624 (comment)

this worked for my 7800XTX. Thank you!

@napstar-420
Copy link

Same issue no enough GPU memory available i have an AMD 6600XT

@Taleno97
Copy link

Taleno97 commented Dec 2, 2023

Same issue here, after the fix for the device type im getting the vram error as well.

@IamHappyDei
Copy link

Same

@Trancep0rt
Copy link

Trancep0rt commented Dec 5, 2023

same problem on 6800xt 16gb win10

Edit: it seems to use only 1G of VRAM (there is no other GPU in my system, not even iGPU):
grafik

@Menober
Copy link

Menober commented Dec 5, 2023

same problem on 6800xt 16gb win10

Edit: it seems to use only 1G of VRAM (there is no other GPU in my system, not even iGPU): grafik

But the error is referring to only 160mb vram can't allocate, so this is wider problem.

@antikalk
Copy link

antikalk commented Dec 5, 2023

Yup, same issue, fix with brownian_interval.py helped but problem with memory allocation still occurs.

Same for me: 6700XT

@lukechar
Copy link

lukechar commented Dec 5, 2023

Same here on Windows 11; 6650 XT.

After applying brownian_interval.py fix above:

RuntimeError: Could not allocate tensor with 117964800 bytes. There is not enough GPU video memory available!

@uran124
Copy link

uran124 commented Dec 6, 2023

At first, sorry for my english.... I love your work, and I have used it a lot in collab, but I can't get it to run on my machine with an AMD 6700xt graphics card. It seems like it doesn't recognize my graphics card correctly. For example, it indicates that VRAM: 1Gb instead of 12 as the card has, for example. Can you think of any solution?

To create a public link, set share=True in launch(). Using directml with device: Total VRAM 1024 MB, total RAM 16310 MB Set vram state to: NORMAL_VRAM Device: privateuseone VAE dtype: torch.float32 Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention [Fooocus] Disabling smart memory model_type EPS adm 2560 Using split attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using split attention in VAE missing {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'} Refiner model loaded: G:\Fooocus\Fooocus\models\checkpoints\sd_xl_refiner_1.0_0.9vae.safetensors model_type EPS adm 2816 Using split attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using split attention in VAE missing {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'} Base model loaded: G:\Fooocus\Fooocus\models\checkpoints\sd_xl_base_1.0_0.9vae.safetensors LoRAs loaded: [('sd_xl_offset_example-lora_1.0.safetensors', 0.5), ('None', 0.5), ('None', 0.5), ('None', 0.5), ('None', 0.5)] Fooocus Expansion engine loaded for privateuseone:0, use_fp16 = False. Requested to load SDXLClipModel Requested to load GPT2LMHeadModel Loading 2 new models [Fooocus Model Management] Moving model(s) has taken 6.30 seconds App started successful. Use the app with http://127.0.0.1:7860/ or 127.0.0.1:7860 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 7.0 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 20 [Fooocus] Initializing ... [Fooocus] Loading models ... [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... G:\Fooocus\python_embeded\lib\site-packages\transformers\generation\utils.py:723: UserWarning: The operator 'aten::repeat_interleave.Tensor' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.) input_ids = input_ids.repeat_interleave(expand_size, dim=0) [Prompt Expansion] New suffix: extremely detailed, fantastic details full face, mouth, trending on artstation, pixiv, cgsociety, hyperdetailed Unreal Engine 4k 8k ultra HD, WLOP [Fooocus] Preparing Fooocus text #2 ... [Prompt Expansion] New suffix: intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, in the style of cam sykes, wayne barlowe, igor kieryluk [Fooocus] Encoding positive #1 ... [Fooocus Model Management] Moving model(s) has taken 0.26 seconds [Fooocus] Encoding positive #2 ... [Fooocus] Encoding negative #1 ... [Fooocus] Encoding negative #2 ... Preparation time: 4.14 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.02916753850877285, sigma_max = 14.614643096923828 Traceback (most recent call last): File "G:\Fooocus\Fooocus\modules\async_worker.py", line 585, in worker handler(task) File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "G:\Fooocus\Fooocus\modules\async_worker.py", line 518, in handler imgs = pipeline.process_diffusion( File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "G:\Fooocus\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "G:\Fooocus\Fooocus\modules\default_pipeline.py", line 347, in process_diffusion modules.patch.globalBrownianTreeNoiseSampler = BrownianTreeNoiseSampler( File "G:\Fooocus\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 119, in init self.tree = BatchedBrownianTree(x, t0, t1, seed, cpu=cpu) File "G:\Fooocus\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 85, in init self.trees = [torchsde.BrownianTree(t0, w0, t1, entropy=s, **kwargs) for s in seed] File "G:\Fooocus\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 85, in self.trees = [torchsde.BrownianTree(t0, w0, t1, entropy=s, **kwargs) for s in seed] File "G:\Fooocus\python_embeded\lib\site-packages\torchsde_brownian\derived.py", line 155, in init self._interval = brownian_interval.BrownianInterval(t0=t0, File "G:\Fooocus\python_embeded\lib\site-packages\torchsde_brownian\brownian_interval.py", line 540, in init W = self._randn(initial_W_seed) * math.sqrt(t1 - t0) File "G:\Fooocus\python_embeded\lib\site-packages\torchsde_brownian\brownian_interval.py", line 234, in _randn return _randn(size, self._top._dtype, self._top._device, seed) File "G:\Fooocus\python_embeded\lib\site-packages\torchsde_brownian\brownian_interval.py", line 32, in _randn generator = torch.Generator(device).manual_seed(int(seed)) RuntimeError: Device type privateuseone is not supported for torch.Generator() api. Total time: 67.99 seconds

i have the same problem, what did you fix it?

@mashb1t
Copy link
Collaborator

mashb1t commented Dec 31, 2023

Fixed in recent time, works with latest version of Fooocus. Please update.
Duplicated by #1111.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests