Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial IPEX support for Intel Arc GPU #14171

Merged
merged 4 commits into from
Dec 2, 2023
Merged

Conversation

Nuullll
Copy link
Contributor

@Nuullll Nuullll commented Dec 2, 2023

Description

This is the initial PR of IPEX Windows support for Intel Arc GPU.
Related feature request: #6417

  • Introduces a new option --use-ipex to use xpu as the torch device.
  • Introduces a new module xpu_specific for IPEX XPU specific hijacks.
  • Users could simply add --use-ipex to COMMANDLINE_ARGS to use IPEX backend.

With this PR, an Intel Arc A770 16GB can now generate one 512x512 image (sdp cross attention opt, fp16, DPM++ 2M Karras, 20 steps) in 3~4 seconds (~6it/s).

Notes: I only verified basic txt2img functionality at the moment. Based on my experience with SD.Next, we will need more hijacks for IPEX to unlock more functionalities, but I'd like to keep this change minimal and address more IPEX issues in follow-up PRs.

Screenshots/videos:

QQ2023122-16130.mp4

Checklist:

python -m pytest -vv --verify-base-url test ============================================================================================ test session starts ============================================================================================= platform win32 -- Python 3.10.12, pytest-7.4.3, pluggy-1.3.0 -- D:\stable-diffusion-webui\venv\Scripts\python.exe cachedir: .pytest_cache baseurl: http://127.0.0.1:7860 rootdir: D:\stable-diffusion-webui configfile: pyproject.toml plugins: anyio-3.7.1, base-url-2.0.0, cov-4.1.0 collected 29 items

test/test_extras.py::test_simple_upscaling_performed PASSED [ 3%]
test/test_extras.py::test_png_info_performed PASSED [ 6%]
test/test_extras.py::test_interrogate_performed PASSED [ 10%]
test/test_img2img.py::test_img2img_simple_performed PASSED [ 13%]
test/test_img2img.py::test_inpainting_masked_performed PASSED [ 17%]
test/test_img2img.py::test_inpainting_with_inverted_masked_performed PASSED [ 20%]
test/test_img2img.py::test_img2img_sd_upscale_performed PASSED [ 24%]
test/test_txt2img.py::test_txt2img_simple_performed PASSED [ 27%]
test/test_txt2img.py::test_txt2img_with_negative_prompt_performed PASSED [ 31%]
test/test_txt2img.py::test_txt2img_with_complex_prompt_performed PASSED [ 34%]
test/test_txt2img.py::test_txt2img_not_square_image_performed PASSED [ 37%]
test/test_txt2img.py::test_txt2img_with_hrfix_performed PASSED [ 41%]
test/test_txt2img.py::test_txt2img_with_tiling_performed PASSED [ 44%]
test/test_txt2img.py::test_txt2img_with_restore_faces_performed PASSED [ 48%]
test/test_txt2img.py::test_txt2img_with_vanilla_sampler_performed[PLMS] PASSED [ 51%]
test/test_txt2img.py::test_txt2img_with_vanilla_sampler_performed[DDIM] PASSED [ 55%]
test/test_txt2img.py::test_txt2img_with_vanilla_sampler_performed[UniPC] PASSED [ 58%]
test/test_txt2img.py::test_txt2img_multiple_batches_performed PASSED [ 62%]
test/test_txt2img.py::test_txt2img_batch_performed PASSED [ 65%]
test/test_utils.py::test_options_write PASSED [ 68%]
test/test_utils.py::test_get_api_url[sdapi/v1/cmd-flags] PASSED [ 72%]
test/test_utils.py::test_get_api_url[sdapi/v1/samplers] PASSED [ 75%]
test/test_utils.py::test_get_api_url[sdapi/v1/upscalers] PASSED [ 79%]
test/test_utils.py::test_get_api_url[sdapi/v1/sd-models] PASSED [ 82%]
test/test_utils.py::test_get_api_url[sdapi/v1/hypernetworks] PASSED [ 86%]
test/test_utils.py::test_get_api_url[sdapi/v1/face-restorers] PASSED [ 89%]
test/test_utils.py::test_get_api_url[sdapi/v1/realesrgan-models] PASSED [ 93%]
test/test_utils.py::test_get_api_url[sdapi/v1/prompt-styles] PASSED [ 96%]
test/test_utils.py::test_get_api_url[sdapi/v1/embeddings] PASSED [100%]

============================================================================================= 29 passed in 8.39s =============================================================================================

@AUTOMATIC1111
Copy link
Owner

AUTOMATIC1111 commented Dec 2, 2023

I'd like to not have webui-ipex-user.bat file and I think this is easily achievable:

  • set TORCH_COMMAND in python in launcher if it's empty and if --use-ipex is set
  • the long comment goes there too
  • make --use-ipex automatically imply --skip-torch-cuda-test
  • user has to add --use-ipex to his commandline params and that's it

Also I assume I wouldn't be able to use it with just an AMD CPU, right?

@Nuullll
Copy link
Contributor Author

Nuullll commented Dec 2, 2023

I'd like to not have webui-ipex-user.bat file and I think this is easily achievable:

  • set TORCH_COMMAND in python in launcher if it's empty and if --use-ipex is set
  • the long comment goes there too
  • make --use-ipex automatically imply --skip-torch-cuda-test
  • user has to add --use-ipex to his commandline params and that's it

Thanks for the quick feedback! Will update soon.

Also I assume I wouldn't be able to use it with just an AMD CPU, right?

Right. At the moment IPEX XPU only works for Intel Arc dGPU. It doesn't even work for Intel iGPU (UHD or Iris Xe Graphics).
AMD CPU + Intel Arc GPU is fine, but one may experience more compatibility issues than the Intel CPU + Intel Arc GPU combination.

@AUTOMATIC1111 AUTOMATIC1111 merged commit af5f073 into AUTOMATIC1111:dev Dec 2, 2023
3 checks passed
@w-e-w w-e-w mentioned this pull request Dec 4, 2023
@gmbhneo
Copy link

gmbhneo commented Dec 4, 2023

Got the issue when using this, my CPU is being used to render, not my GPU (Intel Arc SE 16GB)

@tusharbhutt
Copy link

Got the issue when using this, my CPU is being used to render, not my GPU (Intel Arc SE 16GB)

Same on thee ARC770. I am using --use-ipex in the command line but only the CPU is used. Not sure if it's because the ReActor plugin is always "preheating" a device, and it only sees the CPU. The onboard iGPU770 is disabled too, so it's not causing any interference.

@qiacheng
Copy link

qiacheng commented Dec 4, 2023

@gmbhneo @tusharbhutt are you using dev branch? and also is iGPU enabled? Ensure python version for the webui env is python 3.10 on windows.

if iGPU is enabled, add

--use-ipex --device-id 1 

in COMMANDLINE_ARGS in webui-user.bat

just tried dev branch on windows and it worked. To monitor GPU utilization on windows, open task manager --> change one of the metric to compute and monitor utilization.

@@ -352,6 +372,8 @@ def prepare_environment():
run(f'"{python}" -m {torch_command}', "Installing torch and torchvision", "Couldn't install torch", live=True)
startup_timer.record("install torch")

if args.use_ipex:
args.skip_torch_cuda_test = True
Copy link

@qiacheng qiacheng Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to include a torch version check, if users have other torch packages installed in the env then run pip install to install required ipex, torch, torchvision packages

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if args.use_ipex:
    if is_installed("torch"):
        import torch
        if torch.__version__ != "2.0.0a0+git9ebda2" or not is_installed("intel_extension_for_pytorch"):
            run(f'"{python}" -m {torch_command}', "Installing torch and torchvision", "Couldn't install torch", live=True)
            startup_timer.record("install torch")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could check_run_python("import torch; import intel_extension_for_pytorch; assert torch.xpu.is_available()") to perform a sanity test, so that we don't assume a specific torch version -- Intel may release newer versions and user could build from source with a custom version.

@Nuullll
Copy link
Contributor Author

Nuullll commented Dec 5, 2023

@gmbhneo @tusharbhutt A few tips:

  • Make sure you are on the dev branch
  • Use python 3.10 for windows
  • Start with a fresh venv (by removing your current venv folder or set a new VENV_DIR in webui-user.bat)
    • A less safer alternative is to specify --use-ipex --reinstall-torch with your existing venv
  • Disable suspicious extensions
  • Append --use-ipex to COMMANDLINE_ARGS in webui-user.bat
  • Have your iGPU disabled (UHD, Iris) in Device Manager or BIOS, or it may cause unexpected issues.

@w-e-w
Copy link
Collaborator

w-e-w commented Dec 6, 2023

possible issue #14224

@tusharbhutt
Copy link

@gmbhneo @tusharbhutt A few tips:

* Make sure you are on the `dev` branch

* Use python 3.10 for windows

* Start with a fresh venv (by removing your current `venv` folder or set a new `VENV_DIR` in `webui-user.bat`)
  
  * A less safer alternative is to specify `--use-ipex --reinstall-torch` with your existing `venv`

* Disable suspicious extensions

* Append `--use-ipex` to `COMMANDLINE_ARGS` in `webui-user.bat`

* Have your iGPU disabled (UHD, Iris) in Device Manager or BIOS, or it may cause unexpected issues.

I'll give it a try in a bit, yanked out the A770 and put the 3060 back in. However, previously, I had disabled the iGPU and had "--use-ipex" in the args. I'll try the fresh Venv folder next. This is on Python 3.10 in Windows 10 using the dev branch

@HyunJae5463
Copy link

Getting
RuntimeError: Native API failed. Native API returns: -997 (Command failed to enqueue/execute) -997 (Command failed to enqueue/execute)
when i try to generate something

@Nuullll
Copy link
Contributor Author

Nuullll commented Dec 7, 2023

Getting RuntimeError: Native API failed. Native API returns: -997 (Command failed to enqueue/execute) -997 (Command failed to enqueue/execute) when i try to generate something

File an issue ticket with more detail, please.

@w-e-w w-e-w mentioned this pull request Dec 16, 2023
@zakusworo
Copy link

im getting AttributeError: 'str' object has no attribute 'type' error when using intel GPU (--use-ipex) and activate ToMe (Token Merging) in optimizations

Screenshot_20231223_235122

@uxdesignerhector
Copy link

@gmbhneo @tusharbhutt A few tips:

* Make sure you are on the `dev` branch

* Use python 3.10 for windows

* Start with a fresh venv (by removing your current `venv` folder or set a new `VENV_DIR` in `webui-user.bat`)
  
  * A less safer alternative is to specify `--use-ipex --reinstall-torch` with your existing `venv`

* Disable suspicious extensions

* Append `--use-ipex` to `COMMANDLINE_ARGS` in `webui-user.bat`

* Have your iGPU disabled (UHD, Iris) in Device Manager or BIOS, or it may cause unexpected issues.

I can confirm it is working! I needed to disable my iGPU (UHD, Iris) in Device Manager and delete my old venv folder and after that launch Stable Diffusion WebUI with the extra launch argument --use-ipex
image

@thejacer
Copy link

—use-ipex has reduced my render time from ~1 minute using 512x512 20 step to ~30 seconds. Is there any other command line arg that might explain the difference in performance from what you’ve seen? I have an a770 16GB

@qiacheng
Copy link

qiacheng commented Jan 24, 2024

—use-ipex has reduced my render time from ~1 minute using 512x512 20 step to ~30 seconds. Is there any other command line arg that might explain the difference in performance from what you’ve seen? I have an a770 16GB

try adding --opt-sdp-attention , in master branch it's default to invoke ai which causes perf issues. dev branch has this fixed

@thejacer
Copy link

--opt-sdp-attention worked to reduce the duration to less than <20 seconds going as low as just more than 14 seconds. I don't have ReBAR so I figured that was as fast as it would go, but it suddenly went back up to ~25 seconds.

@qiacheng
Copy link

--opt-sdp-attention worked to reduce the duration to less than <20 seconds going as low as just more than 14 seconds. I don't have ReBAR so I figured that was as fast as it would go, but it suddenly went back up to ~25 seconds.

do you have iGPU enabled? if so please disable iGPU. perf on A770 for 512x512 20 steps should take about 3 seconds

@Nuullll
Copy link
Contributor Author

Nuullll commented Jan 25, 2024

--opt-sdp-attention worked to reduce the duration to less than <20 seconds going as low as just more than 14 seconds. I don't have ReBAR so I figured that was as fast as it would go, but it suddenly went back up to ~25 seconds.

ReBAR is the bottleneck for sure. ReBAR-ON is ~5x faster than ReBAR-OFF for IPEX.

@tusharbhutt
Copy link

@uxdesignerhector Apologies for the late reply. I did get it working in my old machine (the one without ReBAR) as per this thread:

#14338

However, was about 2x slower than my 3060 and I haven't bothered to put the ARC in my new machine in about six weeks simply because I am swamped with work. I'll give it ago once I can wrestle it away from my son. if it works and is materially close to the 3060, at least I'll have 16GB of VRAM instead of 12. Then he can have the 3060 and I'll keep the ARC.

@thejacer
Copy link

thejacer commented Feb 10, 2024 via email

@guillaume-rce
Copy link

Hello, what about using Intel AI Boost NPUs? Is this planned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.