Initial IPEX support for Intel Arc GPU #14171

Nuullll · 2023-12-02T08:24:28Z

Description

This is the initial PR of IPEX Windows support for Intel Arc GPU.
Related feature request: #6417

Introduces a new option --use-ipex to use xpu as the torch device.
Introduces a new module xpu_specific for IPEX XPU specific hijacks.
Users could simply add --use-ipex to COMMANDLINE_ARGS to use IPEX backend.

With this PR, an Intel Arc A770 16GB can now generate one 512x512 image (sdp cross attention opt, fp16, DPM++ 2M Karras, 20 steps) in 3~4 seconds (~6it/s).

Notes: I only verified basic txt2img functionality at the moment. Based on my experience with SD.Next, we will need more hijacks for IPEX to unlock more functionalities, but I'd like to keep this change minimal and address more IPEX issues in follow-up PRs.

Screenshots/videos:

QQ2023122-16130.mp4

Checklist:

I have read contributing wiki page
I have performed a self-review of my own code
My code follows the style guidelines
My code passes tests

python -m pytest -vv --verify-base-url test ============================================================================================ test session starts ============================================================================================= platform win32 -- Python 3.10.12, pytest-7.4.3, pluggy-1.3.0 -- D:\stable-diffusion-webui\venv\Scripts\python.exe cachedir: .pytest_cache baseurl: http://127.0.0.1:7860 rootdir: D:\stable-diffusion-webui configfile: pyproject.toml plugins: anyio-3.7.1, base-url-2.0.0, cov-4.1.0 collected 29 items

test/test_extras.py::test_simple_upscaling_performed PASSED test/test_extras.py::test_png_info_performed PASSED test/test_extras.py::test_interrogate_performed PASSED test/test_img2img.py::test_img2img_simple_performed PASSED test/test_img2img.py::test_inpainting_masked_performed PASSED test/test_img2img.py::test_inpainting_with_inverted_masked_performed PASSED test/test_img2img.py::test_img2img_sd_upscale_performed PASSED test/test_txt2img.py::test_txt2img_simple_performed PASSED test/test_txt2img.py::test_txt2img_with_negative_prompt_performed PASSED test/test_txt2img.py::test_txt2img_with_complex_prompt_performed PASSED test/test_txt2img.py::test_txt2img_not_square_image_performed PASSED test/test_txt2img.py::test_txt2img_with_hrfix_performed PASSED test/test_txt2img.py::test_txt2img_with_tiling_performed PASSED test/test_txt2img.py::test_txt2img_with_restore_faces_performed PASSED test/test_txt2img.py::test_txt2img_with_vanilla_sampler_performed[PLMS] test/test_txt2img.py::test_txt2img_with_vanilla_sampler_performed[DDIM] test/test_txt2img.py::test_txt2img_with_vanilla_sampler_performed[UniPC] test/test_txt2img.py::test_txt2img_multiple_batches_performed PASSED test/test_txt2img.py::test_txt2img_batch_performed PASSED test/test_utils.py::test_options_write PASSED test/test_utils.py::test_get_api_url[sdapi/v1/cmd-flags] PASSED test/test_utils.py::test_get_api_url[sdapi/v1/samplers] PASSED test/test_utils.py::test_get_api_url[sdapi/v1/upscalers] PASSED test/test_utils.py::test_get_api_url[sdapi/v1/sd-models] PASSED test/test_utils.py::test_get_api_url[sdapi/v1/hypernetworks] PASSED test/test_utils.py::test_get_api_url[sdapi/v1/face-restorers] PASSED test/test_utils.py::test_get_api_url[sdapi/v1/realesrgan-models] PASSED test/test_utils.py::test_get_api_url[sdapi/v1/prompt-styles] PASSED test/test_utils.py::test_get_api_url[sdapi/v1/embeddings] PASSED [ 3%]
[ 6%]
[ 10%]
[ 13%]
[ 17%]
[ 20%]
[ 24%]
[ 27%]
[ 31%]
[ 34%]
[ 37%]
[ 41%]
[ 44%]
[ 48%]
PASSED [ 51%]
PASSED [ 55%]
PASSED [ 58%]
[ 62%]
[ 65%]
[ 68%]
[ 72%]
[ 75%]
[ 79%]
[ 82%]
[ 86%]
[ 89%]
[ 93%]
[ 96%]
[100%]

============================================================================================= 29 passed in 8.39s =============================================================================================

AUTOMATIC1111 · 2023-12-02T08:38:36Z

I'd like to not have webui-ipex-user.bat file and I think this is easily achievable:

set TORCH_COMMAND in python in launcher if it's empty and if --use-ipex is set
the long comment goes there too
make --use-ipex automatically imply --skip-torch-cuda-test
user has to add --use-ipex to his commandline params and that's it

Also I assume I wouldn't be able to use it with just an AMD CPU, right?

Nuullll · 2023-12-02T08:45:58Z

I'd like to not have webui-ipex-user.bat file and I think this is easily achievable:

set TORCH_COMMAND in python in launcher if it's empty and if --use-ipex is set

the long comment goes there too

make --use-ipex automatically imply --skip-torch-cuda-test

user has to add --use-ipex to his commandline params and that's it

Thanks for the quick feedback! Will update soon.

Also I assume I wouldn't be able to use it with just an AMD CPU, right?

Right. At the moment IPEX XPU only works for Intel Arc dGPU. It doesn't even work for Intel iGPU (UHD or Iris Xe Graphics).
AMD CPU + Intel Arc GPU is fine, but one may experience more compatibility issues than the Intel CPU + Intel Arc GPU combination.

gmbhneo · 2023-12-04T16:49:56Z

Got the issue when using this, my CPU is being used to render, not my GPU (Intel Arc SE 16GB)

tusharbhutt · 2023-12-04T19:01:04Z

Got the issue when using this, my CPU is being used to render, not my GPU (Intel Arc SE 16GB)

Same on thee ARC770. I am using --use-ipex in the command line but only the CPU is used. Not sure if it's because the ReActor plugin is always "preheating" a device, and it only sees the CPU. The onboard iGPU770 is disabled too, so it's not causing any interference.

qiacheng · 2023-12-04T20:33:38Z

@gmbhneo @tusharbhutt are you using dev branch? and also is iGPU enabled? Ensure python version for the webui env is python 3.10 on windows.

if iGPU is enabled, add

--use-ipex --device-id 1

in COMMANDLINE_ARGS in webui-user.bat

just tried dev branch on windows and it worked. To monitor GPU utilization on windows, open task manager --> change one of the metric to compute and monitor utilization.

qiacheng · 2023-12-04T22:14:04Z

modules/launch_utils.py

@@ -352,6 +372,8 @@ def prepare_environment():
        run(f'"{python}" -m {torch_command}', "Installing torch and torchvision", "Couldn't install torch", live=True)
        startup_timer.record("install torch")

+    if args.use_ipex:
+        args.skip_torch_cuda_test = True


would be good to include a torch version check, if users have other torch packages installed in the env then run pip install to install required ipex, torch, torchvision packages

if args.use_ipex: if is_installed("torch"): import torch if torch.__version__ != "2.0.0a0+git9ebda2" or not is_installed("intel_extension_for_pytorch"): run(f'"{python}" -m {torch_command}', "Installing torch and torchvision", "Couldn't install torch", live=True) startup_timer.record("install torch")

Or we could check_run_python("import torch; import intel_extension_for_pytorch; assert torch.xpu.is_available()") to perform a sanity test, so that we don't assume a specific torch version -- Intel may release newer versions and user could build from source with a custom version.

Nuullll · 2023-12-05T02:53:24Z

@gmbhneo @tusharbhutt A few tips:

Make sure you are on the dev branch
Use python 3.10 for windows
Start with a fresh venv (by removing your current venv folder or set a new VENV_DIR in webui-user.bat)
- A less safer alternative is to specify --use-ipex --reinstall-torch with your existing venv
Disable suspicious extensions
Append --use-ipex to COMMANDLINE_ARGS in webui-user.bat
Have your iGPU disabled (UHD, Iris) in Device Manager or BIOS, or it may cause unexpected issues.

w-e-w · 2023-12-06T11:28:33Z

possible issue #14224

tusharbhutt · 2023-12-06T12:21:45Z

@gmbhneo @tusharbhutt A few tips:

* Make sure you are on the `dev` branch

* Use python 3.10 for windows

* Start with a fresh venv (by removing your current `venv` folder or set a new `VENV_DIR` in `webui-user.bat`)
  
  * A less safer alternative is to specify `--use-ipex --reinstall-torch` with your existing `venv`

* Disable suspicious extensions

* Append `--use-ipex` to `COMMANDLINE_ARGS` in `webui-user.bat`

* Have your iGPU disabled (UHD, Iris) in Device Manager or BIOS, or it may cause unexpected issues.

I'll give it a try in a bit, yanked out the A770 and put the 3060 back in. However, previously, I had disabled the iGPU and had "--use-ipex" in the args. I'll try the fresh Venv folder next. This is on Python 3.10 in Windows 10 using the dev branch

HyunJae5463 · 2023-12-06T13:59:58Z

Getting
RuntimeError: Native API failed. Native API returns: -997 (Command failed to enqueue/execute) -997 (Command failed to enqueue/execute)
when i try to generate something

Nuullll · 2023-12-07T02:09:50Z

Getting RuntimeError: Native API failed. Native API returns: -997 (Command failed to enqueue/execute) -997 (Command failed to enqueue/execute) when i try to generate something

File an issue ticket with more detail, please.

zakusworo · 2023-12-23T16:57:19Z

im getting AttributeError: 'str' object has no attribute 'type' error when using intel GPU (--use-ipex) and activate ToMe (Token Merging) in optimizations

uxdesignerhector · 2024-01-20T18:22:56Z

@gmbhneo @tusharbhutt A few tips:

* Make sure you are on the `dev` branch

* Use python 3.10 for windows

* Start with a fresh venv (by removing your current `venv` folder or set a new `VENV_DIR` in `webui-user.bat`)
  
  * A less safer alternative is to specify `--use-ipex --reinstall-torch` with your existing `venv`

* Disable suspicious extensions

* Append `--use-ipex` to `COMMANDLINE_ARGS` in `webui-user.bat`

* Have your iGPU disabled (UHD, Iris) in Device Manager or BIOS, or it may cause unexpected issues.

I can confirm it is working! I needed to disable my iGPU (UHD, Iris) in Device Manager and delete my old venv folder and after that launch Stable Diffusion WebUI with the extra launch argument --use-ipex

thejacer · 2024-01-24T20:21:04Z

—use-ipex has reduced my render time from ~1 minute using 512x512 20 step to ~30 seconds. Is there any other command line arg that might explain the difference in performance from what you’ve seen? I have an a770 16GB

qiacheng · 2024-01-24T21:03:58Z

—use-ipex has reduced my render time from ~1 minute using 512x512 20 step to ~30 seconds. Is there any other command line arg that might explain the difference in performance from what you’ve seen? I have an a770 16GB

try adding --opt-sdp-attention , in master branch it's default to invoke ai which causes perf issues. dev branch has this fixed

thejacer · 2024-01-25T03:10:49Z

--opt-sdp-attention worked to reduce the duration to less than <20 seconds going as low as just more than 14 seconds. I don't have ReBAR so I figured that was as fast as it would go, but it suddenly went back up to ~25 seconds.

qiacheng · 2024-01-25T03:17:15Z

--opt-sdp-attention worked to reduce the duration to less than <20 seconds going as low as just more than 14 seconds. I don't have ReBAR so I figured that was as fast as it would go, but it suddenly went back up to ~25 seconds.

do you have iGPU enabled? if so please disable iGPU. perf on A770 for 512x512 20 steps should take about 3 seconds

Nuullll · 2024-01-25T03:19:03Z

--opt-sdp-attention worked to reduce the duration to less than <20 seconds going as low as just more than 14 seconds. I don't have ReBAR so I figured that was as fast as it would go, but it suddenly went back up to ~25 seconds.

ReBAR is the bottleneck for sure. ReBAR-ON is ~5x faster than ReBAR-OFF for IPEX.

tusharbhutt · 2024-02-10T02:59:58Z

@uxdesignerhector Apologies for the late reply. I did get it working in my old machine (the one without ReBAR) as per this thread:

#14338

However, was about 2x slower than my 3060 and I haven't bothered to put the ARC in my new machine in about six weeks simply because I am swamped with work. I'll give it ago once I can wrestle it away from my son. if it works and is materially close to the 3060, at least I'll have 16GB of VRAM instead of 12. Then he can have the 3060 and I'll keep the ARC.

thejacer · 2024-02-10T03:09:29Z

I actually built a new PC for this card to have rebar and it runs 512x512 sd1.5 in 4 seconds! It’s amazing!

…

On Fri, Feb 9, 2024 at 9:00 PM tusharbhutt ***@***.***> wrote: @uxdesignerhector <https://github.com/uxdesignerhector> Apologies for the late reply. I did get it working in my old machine (the one without ReBAR) as per this thread: #14338 <#14338> However, was about 2x slower than my 3060 and I haven't bothered to put the ARC in my new machine in about six weeks simply because I am swamped with work. I'll give it ago once I can wrestle it away from my son. if it works and is materially close to the 3060, at least I'll have 16GB of VRAM instead of 12. Then he can have the 3060 and I'll keep the ARC. — Reply to this email directly, view it on GitHub <#14171 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APWKRXSZIDHMMBUGVQIVKUTYS3PDZAVCNFSM6AAAAABAD3FP7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZWHAZDSNBZGU> . You are receiving this because you commented.Message ID: ***@***.***>

guillaume-rce · 2024-07-08T09:13:01Z

Hello, what about using Intel AI Boost NPUs? Is this planned?

Nuullll added 3 commits November 30, 2023 20:22

Initial IPEX support

8b40f47

Disable ipex autocast due to its bad perf

7499148

Fix fp64

87cd07b

Nuullll requested a review from AUTOMATIC1111 as a code owner December 2, 2023 08:24

Remove webui-ipex-user.bat

96871e4

AUTOMATIC1111 approved these changes Dec 2, 2023

View reviewed changes

AUTOMATIC1111 merged commit af5f073 into AUTOMATIC1111:dev Dec 2, 2023
3 checks passed

w-e-w mentioned this pull request Dec 4, 2023

1.7.0-RC #14196

Closed

qiacheng reviewed Dec 4, 2023

View reviewed changes

w-e-w mentioned this pull request Dec 16, 2023

1.7.0 #14323

Closed

Nuullll mentioned this pull request Jan 6, 2024

[Feature Request]: Support for Intel Oneapi/Vulkan versions of pytorch as well #6417

Open

1 task

uxdesignerhector mentioned this pull request Feb 4, 2024

Add support for Intel Arc GPUs ollama/ollama#1590

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial IPEX support for Intel Arc GPU #14171

Initial IPEX support for Intel Arc GPU #14171

Nuullll commented Dec 2, 2023 •

edited

Loading

AUTOMATIC1111 commented Dec 2, 2023 •

edited

Loading

Nuullll commented Dec 2, 2023

gmbhneo commented Dec 4, 2023

tusharbhutt commented Dec 4, 2023

qiacheng commented Dec 4, 2023 •

edited

Loading

qiacheng Dec 4, 2023 •

edited

Loading

qiacheng Dec 4, 2023

Nuullll Dec 5, 2023

Nuullll commented Dec 5, 2023 •

edited

Loading

w-e-w commented Dec 6, 2023

tusharbhutt commented Dec 6, 2023

HyunJae5463 commented Dec 6, 2023

Nuullll commented Dec 7, 2023

zakusworo commented Dec 23, 2023

uxdesignerhector commented Jan 20, 2024

thejacer commented Jan 24, 2024

qiacheng commented Jan 24, 2024 •

edited

Loading

thejacer commented Jan 25, 2024

qiacheng commented Jan 25, 2024

Nuullll commented Jan 25, 2024

tusharbhutt commented Feb 10, 2024

thejacer commented Feb 10, 2024 via email

guillaume-rce commented Jul 8, 2024

Initial IPEX support for Intel Arc GPU #14171

Initial IPEX support for Intel Arc GPU #14171

Conversation

Nuullll commented Dec 2, 2023 • edited Loading

Description

Screenshots/videos:

Checklist:

AUTOMATIC1111 commented Dec 2, 2023 • edited Loading

Nuullll commented Dec 2, 2023

gmbhneo commented Dec 4, 2023

tusharbhutt commented Dec 4, 2023

qiacheng commented Dec 4, 2023 • edited Loading

qiacheng Dec 4, 2023 • edited Loading

Choose a reason for hiding this comment

qiacheng Dec 4, 2023

Choose a reason for hiding this comment

Nuullll Dec 5, 2023

Choose a reason for hiding this comment

Nuullll commented Dec 5, 2023 • edited Loading

w-e-w commented Dec 6, 2023

tusharbhutt commented Dec 6, 2023

HyunJae5463 commented Dec 6, 2023

Nuullll commented Dec 7, 2023

zakusworo commented Dec 23, 2023

uxdesignerhector commented Jan 20, 2024

thejacer commented Jan 24, 2024

qiacheng commented Jan 24, 2024 • edited Loading

thejacer commented Jan 25, 2024

qiacheng commented Jan 25, 2024

Nuullll commented Jan 25, 2024

tusharbhutt commented Feb 10, 2024

thejacer commented Feb 10, 2024 via email

guillaume-rce commented Jul 8, 2024

Nuullll commented Dec 2, 2023 •

edited

Loading

AUTOMATIC1111 commented Dec 2, 2023 •

edited

Loading

qiacheng commented Dec 4, 2023 •

edited

Loading

qiacheng Dec 4, 2023 •

edited

Loading

Nuullll commented Dec 5, 2023 •

edited

Loading

qiacheng commented Jan 24, 2024 •

edited

Loading