The function torch.std_mean causing a fallback to CPU on DirectML #1321

rickfeldschau · 2023-12-10T19:10:42Z

#313 and #520 added a call to torch.std_mean(), but aten::std_mean.correction is not implemented in DirectML. This triggers a fallback to the CPU.

The pseudo call stack is:

s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True) in function adaptive_anisotropic_filter (anisotropic.py)
positive_eps_degraded = anisotropic.adaptive_anisotropic_filter(x=positive_eps, g=positive_x0) in function patched_sampler_cfg_function (patch.py)
unet.model_options['sampler_cfg_function'] = patched_sampler_cfg_function in function load_model (core.py)

If I comment out the newer code and revert the call in core.py to its original before #520 as in the following snippet, it successfully moves the model to GPU, with a speedup.

def load_model(ckpt_filename):
    # unet, clip, vae, clip_vision = load_checkpoint_guess_config(ckpt_filename, embedding_directory=path_embeddings)
    unet, clip, vae, clip_vision = load_checkpoint_guess_config(ckpt_filename)
    # unet.model_options['sampler_cfg_function'] = patched_sampler_cfg_function
    return StableDiffusionModel(unet=unet, clip=clip, vae=vae, clip_vision=clip_vision, filename=ckpt_filename)

Content of my run.bat:

.\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
.\python_embeded\python.exe -m pip install torch-directml
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml --listen
pause

Console output:

C:\Programs\Fooocus>run.bat

C:\Programs\Fooocus>.\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
Found existing installation: torch 2.0.0
Uninstalling torch-2.0.0:
  Successfully uninstalled torch-2.0.0
Found existing installation: torchvision 0.15.1
Uninstalling torchvision-0.15.1:
  Successfully uninstalled torchvision-0.15.1
WARNING: Skipping torchaudio as it is not installed.
WARNING: Skipping torchtext as it is not installed.
WARNING: Skipping functorch as it is not installed.
WARNING: Skipping xformers as it is not installed.

C:\Programs\Fooocus>.\python_embeded\python.exe -m pip install torch-directml
Requirement already satisfied: torch-directml in c:\programs\fooocus\python_embeded\lib\site-packages (0.2.0.dev230426)
Collecting torch==2.0.0 (from torch-directml)
  Using cached torch-2.0.0-cp310-cp310-win_amd64.whl (172.3 MB)
Collecting torchvision==0.15.1 (from torch-directml)
  Using cached torchvision-0.15.1-cp310-cp310-win_amd64.whl (1.2 MB)
Requirement already satisfied: filelock in c:\programs\fooocus\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (3.12.2)
Requirement already satisfied: typing-extensions in c:\programs\fooocus\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (4.7.1)
Requirement already satisfied: sympy in c:\programs\fooocus\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (1.12)
Requirement already satisfied: networkx in c:\programs\fooocus\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (3.1)
Requirement already satisfied: jinja2 in c:\programs\fooocus\python_embeded\lib\site-packages (from torch==2.0.0->torch-directml) (3.1.2)
Requirement already satisfied: numpy in c:\programs\fooocus\python_embeded\lib\site-packages (from torchvision==0.15.1->torch-directml) (1.23.5)
Requirement already satisfied: requests in c:\programs\fooocus\python_embeded\lib\site-packages (from torchvision==0.15.1->torch-directml) (2.31.0)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in c:\programs\fooocus\python_embeded\lib\site-packages (from torchvision==0.15.1->torch-directml) (9.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\programs\fooocus\python_embeded\lib\site-packages (from jinja2->torch==2.0.0->torch-directml) (2.1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\programs\fooocus\python_embeded\lib\site-packages (from requests->torchvision==0.15.1->torch-directml) (3.1.0)
Requirement already satisfied: idna<4,>=2.5 in c:\programs\fooocus\python_embeded\lib\site-packages (from requests->torchvision==0.15.1->torch-directml) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\programs\fooocus\python_embeded\lib\site-packages (from requests->torchvision==0.15.1->torch-directml) (2.0.3)
Requirement already satisfied: certifi>=2017.4.17 in c:\programs\fooocus\python_embeded\lib\site-packages (from requests->torchvision==0.15.1->torch-directml) (2023.5.7)
Requirement already satisfied: mpmath>=0.19 in c:\programs\fooocus\python_embeded\lib\site-packages (from sympy->torch==2.0.0->torch-directml) (1.3.0)
DEPRECATION: torchsde 0.2.5 has a non-standard dependency specifier numpy>=1.19.*; python_version >= "3.7". pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of torchsde or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
Installing collected packages: torch, torchvision
Successfully installed torch-2.0.0 torchvision-0.15.1

[notice] A new release of pip is available: 23.2.1 -> 23.3.1
[notice] To update, run: C:\Programs\Fooocus\python_embeded\python.exe -m pip install --upgrade pip

C:\Programs\Fooocus>.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml --listen
Fast-forward merge
Update succeeded.
[System ARGV] ['Fooocus\\entry_with_update.py', '--directml', '--listen']
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
Fooocus version: 2.1.824
Running on local URL:  http://0.0.0.0:7865
Using directml with device:
Total VRAM 1024 MB, total RAM 31905 MB
Set vram state to: NORMAL_VRAM
Disabling smart memory management
Device: privateuseone
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
Refiner unloaded.
model_type EPS
adm 2816

To create a public link, set `share=True` in `launch()`.
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra keys {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
Base model loaded: C:\Programs\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [C:\Programs\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors].Loaded LoRA [C:\Programs\Fooocus\Fooocus\models\loras\sd_xl_offset_example-lora_1.0.safetensors] for UNet [C:\Programs\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
App started successful. Use the app with http://localhost:7865/ or 0.0.0.0:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 3639967960497481578
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] marcus aurelius, elegant, highly detailed, sublime, sharp focus, vibrant colors, radiant, magical, full strong crisp, romantic, intricate, epic, cinematic, stunning, attractive, enhanced, loving, caring, generous, handsome, coherent, passionate, amazing, flowing, symmetry, complex, glowing, color, cool, awesome, very inspirational, beautiful
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] marcus aurelius, elegant, highly detailed, divine holy, full flowing light, spiritual, intricate, pristine, sharp focus,,, extremely aesthetic, cinematic, fine composition, colorful, epic, joyful, marvelous, atmosphere, dynamic dramatic futuristic, background, ambient, glowing, vivid color, beautiful, symmetry, perfect, coherent, magical, surreal, complex
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 9.21 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 3.78 seconds
  0%| | 0/30 [00:00<?, ?it/s]C:\Programs\Fooocus\Fooocus\modules\anisotropic.py:132: UserWarning: The operator 'aten::std_mean.correction' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a\_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.)
  s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True)
C:\Programs\Fooocus\python_embeded\lib\site-packages\torchsde\_brownian\brownian_interval.py:594: UserWarning: Should have tb<=t1 but got tb=14.614643096923828 and t1=14.614643.
  warnings.warn(f"Should have {tb_name}<=t1 but got {tb_name}={tb} and t1={self._end}.")
100%| | 30/30 [01:59<00:00,  3.98s/it]
Image generated with private log at: C:\Programs\Fooocus\Fooocus\outputs\2023-12-10\log.html

The text was updated successfully, but these errors were encountered:

Thelionsfan · 2023-12-16T03:03:49Z

Did anyone figure this out?

mashb1t · 2024-01-02T22:28:55Z

I don't think so, @rickfeldschau feel free to reopen if you can provide provide new information (both when solved or still not working).

rickfeldschau mentioned this issue Dec 10, 2023

operator 'aten::std_mean.correction' is not currently supported on the DML backend microsoft/DirectML#536

Closed

mashb1t added bug Something isn't working help wanted Extra attention is needed labels Dec 29, 2023

mashb1t closed this as not planned Won't fix, can't repro, duplicate, stale Jan 2, 2024

mashb1t mentioned this issue Jan 17, 2024

AMD generating takes 25 minutes #1958

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The function torch.std_mean causing a fallback to CPU on DirectML #1321

The function torch.std_mean causing a fallback to CPU on DirectML #1321

rickfeldschau commented Dec 10, 2023

Thelionsfan commented Dec 16, 2023

mashb1t commented Jan 2, 2024

The function torch.std_mean causing a fallback to CPU on DirectML #1321

The function torch.std_mean causing a fallback to CPU on DirectML #1321

Comments

rickfeldschau commented Dec 10, 2023

Thelionsfan commented Dec 16, 2023

mashb1t commented Jan 2, 2024