Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Support for Intel Oneapi/Vulkan versions of pytorch as well #6417

Open
1 task done
Vidyut opened this issue Jan 6, 2023 · 112 comments
Open
1 task done
Labels
enhancement New feature or request platform:mac Issues that apply to Apple OS X, M1, M2, etc

Comments

@Vidyut
Copy link

Vidyut commented Jan 6, 2023

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

This is a brilliant project and I like that it supports most versions of pytorch.

A large group of users on unsupported machines, intel, windows, etc get excluded from the performance options (which are basically cuda and wannabe-cuda) . Many of these machines have fairly decent hardware, just that it doesn't run cuda/rocm. There are pytorch versions like oneapi or vulkan, etc that would really take the reach of this project out to those with lesser machines, so to say. https://pytorch.org/tutorials/recipes/recipes/intel_extension_for_pytorch.html

I'm not a coder, but they have a pytorch version in the works similar to cuda/rocm, but it seems to support a lot of intel CPUs and GPUs, including discrete GPUs and older ones abandoned by ROCm https://github.com/intel/intel-extension-for-pytorch/tree/xpu-master

and adapting the code doesn't seem to be excessively complicated.
https://intel.github.io/intel-extension-for-pytorch/xpu/1.10.200+gpu/tutorials/examples.html
https://intel.github.io/intel-extension-for-pytorch/xpu/1.10.200+gpu/tutorials/api_doc.html

It would make the project accessible to those with simpler laptops/desktops.

https://towardsdatascience.com/pytorch-stable-diffusion-using-hugging-face-and-intel-arc-77010e9eead6

Proposed workflow

  1. Go to ....
  2. Press ....
  3. ...

Additional information

No response

@uxdesignerhector
Copy link

Yes, it would be nice to squeeze those 16gb from the intel arc 770. It seems that the problem resides at PyTorch pytorch/pytorch#30029. PyTorch will need to support One API. But it seems it's possible to run PyTorch with intel GPUs through extensions as is stated in https://github.com/intel/intel-extension-for-pytorch/tree/xpu-master

This thread from Reddit has useful information about the possible ways Intel could approach Stable Diffusion /PyTorch problem https://www.reddit.com/r/intel/comments/xvbmif/will_intel_arc_support_stable_diffusion/

@Vidyut
Copy link
Author

Vidyut commented Jan 6, 2023

Hi @uxdesignerhector,

Pytorch HAS the intel extension, though unlike ROCm, it requires code changes as it stands. It is just a couple of lines - this project already seems to do similar to integrate mps, which is why I suggested here. But it can accelerate CPUs and the unreleased version runs on older GPUs and what not, which is great! I wouldn't be surprised if such an integration made this version of Stable Diffusion the staple implementation.

Stable Diffusion runs on TesorFlow, I think, which supports OneAPI - so this is less an Intel issue and more of one for those who love this project, with its well designed implementation, but would like to not wait ages while their hardware twiddles its thumbs. Almost nothing (that wouldn't crash at the task) would be left out, since it would also automatically support OpenCL I think.

Not to mention I am fed up of these elitist projects refusing to recognise anything not CUDA as not GPU!!! (this includes Intel's openvino-gpu runtime - which is basically for cuda/rocm!!!) This repository with its inclusion of everything it can lay its hands on is literally the only reason I bother with pytorch. (that said, not a coder, so not like I'm using all sorts of other technologies overwhelmingly)

V

@Vidyut
Copy link
Author

Vidyut commented Jan 9, 2023

The intel extension for gpu now supports pytorch 1.13.10 https://github.com/intel/intel-extension-for-pytorch/releases/tag/v1.13.10%2Bxpu

@mezotaken mezotaken added the enhancement New feature or request label Jan 12, 2023
@rahulunair
Copy link

rahulunair commented Jan 12, 2023

For anyone looking for working code for Stable Diffusion on Intel dGPUs (Arc Alchemist ) and iGPUs with PyTorch and TensorFlow, please check this out: https://github.com/rahulunair/stable_diffusion_arc or my blog: https://blog.rahul.onl/posts/2022-09-06-arc-dgpu-stable-diffusion.html

For context, oneAPI is already part of PyTorch and TensorFlow as oneDNN, which is a oneAPI library is the default accelerator for CPUs that both the frameworks uses. And Intel extensions for PyTorch (ipex) are kernels that support further optimizations and Intel GPU backend. Eventually most of the code from ipex would be merged into mainline PyTorch.

@uxdesignerhector
Copy link

For anyone looking for working code for Stable Diffusion on Intel dGPUs (Arc Alchemist ) and iGPUs with PyTorch and TensorFlow, please check this out: https://github.com/rahulunair/stable_diffusion_arc or my blog: https://blog.rahul.onl/posts/2022-09-06-arc-dgpu-stable-diffusion.html

For context, oneAPI is already part of PyTorch and TensorFlow as oneDNN, which is a oneAPI library is the default accelerator for CPUs that both the frameworks uses. And Intel extensions for PyTorch (ipex) are kernels that support further optimizations and Intel GPU backend. Eventually most of the code from ipex would be merged into mainline PyTorch.

Thank you for your clarification.

@jbaboval
Copy link

#4690

@jbaboval
Copy link

I'm going to take a stab at putting together a PR for this.....

@jbaboval
Copy link

Unfortunately it's more than a few lines of code. And getting the intel libraries and drivers setup isn't well integrated with distributions.

This is a work in progress, but it shows signs of life:
https://github.com/jbaboval/stable-diffusion-webui/tree/oneapi

@jbaboval
Copy link

I'm still having some issues. One is seeding. I can't get reproducible output. I thought it might be the seeding in pytorch_lightning, but at this point I have implemented full support in pytorch_lightning and instrumented the seeding code there - it never gets called. All the seeding happens in sd-webui. I've also instrumented sd-webui to validate repeatability of noise and subnoise, and it's fully repeatable. Not sure what gives yet.

The other issue is that batches always have junk for the second image.

image

On the plus side, it's really fast. Especially compared to my old GTX1660.

@jbaboval
Copy link

@Vidyut
Copy link
Author

Vidyut commented Jan 22, 2023

I'm not a coder. I can't even begin to figure this out, but I'd happy to test if you've uploaded what you have to github.

@jbaboval
Copy link

It's linked above. I made some notes in ArcNotes.txt that might help get you set up.

@jbaboval
Copy link

jbaboval commented Jan 22, 2023

If you're going to try the branch above:

  • It might not work without my pytorch_lightning branch. I think it will, but if not let me know. I can test later and fix it.
  • Turn up the batch size
  • Pass --use-intel-oneapi to launch.py
  • Pass --config configs/v1-inference-xpu.yaml to launch.py

@Vidyut
Copy link
Author

Vidyut commented Jan 23, 2023

Saw your comment just now and tried it.

I had everything installed and the preparation was fine as per your test.

--use-intel-oneapi wasn't recognised. So I probably did something wrong.

The command to make the Intel version of python a system default is problematic and I almost broke other python stuff going on. Better to use the setup vars in a launcher for use only for this or add to .bashrc (and comment out when not needed...).

Something like a small script:
#!/bin/bash
#That way you can comment out code options to comment or uncomment quickly to test also (and if you're like me, not forget commands)
. /opt/intel/oneapi/setvars.sh
TORCH_COMMAND='pip install torch torchvision' python launch.py --medvram --precision full --no-half --skip-torch-cuda-test

That said:

  • Stable diffusion started without trouble, loaded webpage. The code didn't break.
  • Takes a long time to draw a single image
  • But I'm not sure it is using the XPU
  • Will need more investigation and tweaking. Work in progress. Will update.

For reference (the conspicuous lack of xpu, etc words suggests I missed a trick somewhere)

`:: oneAPI environment initialized ::

Python 3.9.15 (main, Nov 11 2022, 13:58:57)
[GCC 11.2.0]
Commit hash: 3a0d6b7
Installing requirements for Web UI
Launching Web UI with arguments: --medvram --precision full --no-half --ckpt /home/[stuff]/TEST/stable-diffusion-webui/models/Stable-diffusion/sd-v1-4.ckpt
No module 'xformers'. Proceeding without it.
Warning: caught exception 'Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx', memory monitor disabled
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading weights [fe4efff1e1] from /home/[stuff]/TEST/stable-diffusion-webui/models/Stable-diffusion/sd-v1-4.ckpt
Applying cross attention optimization (InvokeAI).
Textual inversion embeddings loaded(0):
Model loaded in 119.4s (1.8s create model, 109.9s load weights).
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
100%|███████████████████████████████████████████| 20/20 [09:44<00:00, 29.21s/it]
Total progress: 100%|███████████████████████████| 20/20 [09:18<00:00, 32.14s/it]
`
Will be able to spend time properly on this in a day or so and update any enlightenment that follows.

@Vidyut
Copy link
Author

Vidyut commented Jan 23, 2023

Arrrgh. Never mind. The torch version was wrong (I accidentally installed it in the regular python, so the script installed the regular torch in intel's python...). Now sorted. And now I have problems with Intel's torch and torchvision playing nice with each other... trying with intel's torch and regular torchvision. Sigh.

Update: Fails with Intel's torchvision, but works with Intel's torch and regular torchvision. But still takes too long. Probably because I can't convince it to use the parameters you said to pass. The xpu test returns true, so requirements are installed. But I don't think it is using the xpu still.

This is currently slower than untampered CPU.
100%|███████████████████████████████████████████| 20/20 [10:11<00:00, 30.56s/it]
Total progress: 100%|███████████████████████████| 20/20 [10:27<00:00, 31.38s/it]

@jbaboval
Copy link

Can you check what branch of the fork you're on? It should be oneapi.

If it's not recognizing the command line option, it's definitely not running the right code.

@Vidyut
Copy link
Author

Vidyut commented Jan 23, 2023

Okay, you were right. It was the wrong branch. facepalm. And I downloaded the zip and I still think it is the master. Not sure how to get the oneapi branch (I'm a champion copy-paster, but don't actually know a lot). Figuring it out.

@jbaboval
Copy link

I'm not sure how you get the branch with the zip download. I just grabbed the zip and it doesn't include the .git directory.

Try git clone -b oneapi https://github.com/jbaboval/stable-diffusion-webui.git

@Nathan-dm
Copy link

Unfortunately it's more than a few lines of code. And getting the intel libraries and drivers setup isn't well integrated with distributions.

This is a work in progress, but it shows signs of life: https://github.com/jbaboval/stable-diffusion-webui/tree/oneapi

will it work using intel igpu?

@Vidyut
Copy link
Author

Vidyut commented Jan 24, 2023

Try git clone -b oneapi https://github.com/jbaboval/stable-diffusion-webui.git

I'm fairly certain I have the right branch now. It has the Arcnotes.txt - so what am I doing wrong?

Launching Web UI with arguments: --medvram --precision full --no-half --ckpt /home/vidyut/AI/TEST/stable-diffusion-webui/models/Stable-diffusion/sd-v1-4.ckpt --config configs/v1-inference-xpu.yaml
/home/vidyut/.local/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
warn(f"Failed to load image Python extension: {e}")
'NoneType' object has no attribute 'enable_tf32': str
Traceback (most recent call last):
File "/home/vidyut/AI/TEST/stable-diffusion-webui/modules/errors.py", line 29, in run
code()
File "/home/vidyut/AI/TEST/stable-diffusion-webui/modules/accelerator.py", line 58, in enable_tf32
return impl.enable_tf32()
AttributeError: 'NoneType' object has no attribute 'enable_tf32'

2023-01-24 16:59:58,437 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmpsg3pgt5w
2023-01-24 16:59:58,438 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmpsg3pgt5w/_remote_module_non_scriptable.py
2023-01-24 16:59:58,634 - root - WARNING - Pytorch pre-release version 1.13.0a0+gitb1dde16 - assuming intent to test it
2023-01-24 16:59:58,648 - root - WARNING - Pytorch pre-release version 1.13.0a0+gitb1dde16 - assuming intent to test it
No module 'xformers'. Proceeding without it.
Traceback (most recent call last):
File "/home/vidyut/AI/TEST/stable-diffusion-webui/launch.py", line 315, in
start()
File "/home/vidyut/AI/TEST/stable-diffusion-webui/launch.py", line 306, in start
import webui
File "/home/vidyut/AI/TEST/stable-diffusion-webui/webui.py", line 13, in
from modules.call_queue import wrap_queued_call, queue_lock, wrap_gradio_gpu_call
File "/home/vidyut/AI/TEST/stable-diffusion-webui/modules/call_queue.py", line 7, in
from modules import shared
File "/home/vidyut/AI/TEST/stable-diffusion-webui/modules/shared.py", line 131, in
devices.device, devices.device_interrogate, devices.device_gfpgan, devices.device_esrgan, devices.device_codeformer =
File "/home/vidyut/AI/TEST/stable-diffusion-webui/modules/shared.py", line 132, in
(devices.cpu if any(y in cmd_opts.use_cpu for y in [x, 'all']) else devices.get_optimal_device() for x in ['sd', 'interrogate', 'gfpgan', 'esrgan', 'codeformer'])
File "/home/vidyut/AI/TEST/stable-diffusion-webui/modules/devices.py", line 29, in get_optimal_device
accelerator_device = accelerator.get_device()
File "/home/vidyut/AI/TEST/stable-diffusion-webui/modules/accelerator.py", line 25, in get_device
return impl.get_device()
AttributeError: 'NoneType' object has no attribute 'get_device'

@Vidyut
Copy link
Author

Vidyut commented Jan 24, 2023

Reinstalled everything. Different error.

Launching Web UI with arguments: --medvram --precision full --no-half --ckpt /home/vidyut/AI/TEST/stable-diffusion-webui/models/Stable-diffusion/sd-v1-4.ckpt --config configs/v1-inference-xpu.yaml
'NoneType' object has no attribute 'enable_tf32': str
Traceback (most recent call last):
File "/home/vidyut/AI/TEST/stable-diffusion-webui/modules/errors.py", line 29, in run
code()
File "/home/vidyut/AI/TEST/stable-diffusion-webui/modules/accelerator.py", line 58, in enable_tf32
return impl.enable_tf32()
AttributeError: 'NoneType' object has no attribute 'enable_tf32'

2023-01-24 17:38:30,868 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmpnapm8vcl
2023-01-24 17:38:30,869 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmpnapm8vcl/_remote_module_non_scriptable.py
2023-01-24 17:38:31,054 - root - WARNING - Pytorch pre-release version 1.13.0a0+gitb1dde16 - assuming intent to test it
2023-01-24 17:38:31,075 - root - WARNING - Pytorch pre-release version 1.13.0a0+gitb1dde16 - assuming intent to test it
No module 'xformers'. Proceeding without it.
Traceback (most recent call last):
File "/home/vidyut/AI/TEST/stable-diffusion-webui/launch.py", line 315, in
start()
File "/home/vidyut/AI/TEST/stable-diffusion-webui/launch.py", line 306, in start
import webui
File "/home/vidyut/AI/TEST/stable-diffusion-webui/webui.py", line 13, in
from modules.call_queue import wrap_queued_call, queue_lock, wrap_gradio_gpu_call
File "/home/vidyut/AI/TEST/stable-diffusion-webui/modules/call_queue.py", line 7, in
from modules import shared
File "/home/vidyut/AI/TEST/stable-diffusion-webui/modules/shared.py", line 131, in
devices.device, devices.device_interrogate, devices.device_gfpgan, devices.device_esrgan, devices.device_codeformer =
File "/home/vidyut/AI/TEST/stable-diffusion-webui/modules/shared.py", line 132, in
(devices.cpu if any(y in cmd_opts.use_cpu for y in [x, 'all']) else devices.get_optimal_device() for x in ['sd', 'interrogate', 'gfpgan', 'esrgan', 'codeformer'])
File "/home/vidyut/AI/TEST/stable-diffusion-webui/modules/devices.py", line 29, in get_optimal_device
accelerator_device = accelerator.get_device()
File "/home/vidyut/AI/TEST/stable-diffusion-webui/modules/accelerator.py", line 25, in get_device
return impl.get_device()
AttributeError: 'NoneType' object has no attribute 'get_device'

At this point I'm not sure this is within my ability.

@jbaboval
Copy link

It should be telling you right at the beginning that it's using OneAPI:

Launching Web UI with arguments: --config configs/v1-inference-xpu.yaml --listen
OneAPI is available
2023-01-24 08:03:28,418 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmpsh22b93t
2023-01-24 08:03:28,419 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmpsh22b93t/_remote_module_non_scriptable.py
2023-01-24 08:03:28,468 - root - WARNING - Pytorch pre-release version 1.13.0a0+gitb1dde16 - assuming intent to test it
2023-01-24 08:03:28,501 - root - WARNING - Pytorch pre-release version 1.13.0a0+gitb1dde16 - assuming intent to test it
No module 'xformers'. Proceeding without it.
Device is xpu

However it shouldn't crash out with an exception if it's not working. I'll have to fix that.

In the meantime you'll have to figure out how to get your OneAPI environment working before I can help with the webui. There's a section in the notes about how to validate

> python3
Python 3.9.15 (main, Nov 11 2022, 13:58:57) 
[GCC 11.2.0] :: Intel Corporation on linux
Type "help", "copyright", "credits" or "license" for more information.
Intel(R) Distribution for Python is brought to you by Intel Corporation.
Please check out: https://software.intel.com/en-us/python-distribution
>>> import torch
>>> import intel_extension_for_pytorch
[W OperatorEntry.cpp:150] Warning: Overriding a previously registered kernel for the same operator and the same dispatch key
  operator: torchvision::nms
    no debug info
  dispatch key: CPU
  previous kernel: registered at /build/intel-pytorch-extension/csrc/cpu/aten/TorchVisionNms.cpp:47
       new kernel: registered at /opt/workspace/vision/torchvision/csrc/ops/cpu/nms_kernel.cpp:112 (function registerKernel)
>>> torch.xpu.is_available()
True

@jbaboval
Copy link

There's a new branch: rebase. It has a fix for the above exception (your GPU still won't work if you don't get the "OneAPI is Available" message). And it includes the latest upstream changes

@Vidyut
Copy link
Author

Vidyut commented Jan 24, 2023

Contents of test.sh

#!/bin/bash
. /opt/intel/oneapi/setvars.sh
sycl-ls
pip list|grep torch
python -c 'import torch; import intel_extension_for_pytorch; print(torch.xpu.is_available())'

Result:

vidyut@saaki:~/AI/TEST/stable-diffusion-webui$ sh test.sh

:: initializing oneAPI environment ...
test.sh: SH_VERSION = unknown
args: Using "$@" for setvars.sh arguments:
:: advisor -- latest
:: ccl -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: embree -- latest
:: inspector -- latest
:: intelpython -- latest
:: ipp -- latest
:: ippcp -- latest
:: ipp -- latest
:: ispc -- latest
:: mkl -- latest
:: modelzoo -- latest
:: modin -- latest
:: mpi -- latest
:: neural-compressor -- latest
:: oidn -- latest
:: openvkl -- latest
:: ospray -- latest
:: ospray_studio -- latest
:: pytorch -- latest
:: rkcommon -- latest
:: rkutil -- latest
:: tbb -- latest
:: tensorflow -- latest
:: vpl -- latest
:: vtune -- latest
:: oneAPI environment initialized ::

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.15.12.0.01_081451]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz 3.0 [2022.15.12.0.01_081451]
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) HD Graphics 520 [0x1916] 3.0 [22.43.24595.35]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) HD Graphics 520 [0x1916] 1.3 [1.3.24595]
intel-extension-for-pytorch 1.13.10+xpu
open-clip-torch 2.7.0
pytorch-lightning 1.7.6
torch 1.13.0a0+gitb1dde16
torchdiffeq 0.2.3
torchmetrics 0.11.0
torchsde 0.2.5
torchvision 0.14.1a0+0504df5
[W OperatorEntry.cpp:150] Warning: Overriding a previously registered kernel for the same operator and the same dispatch key
operator: torchvision::nms
no debug info
dispatch key: CPU
previous kernel: registered at /build/intel-pytorch-extension/csrc/cpu/aten/TorchVisionNms.cpp:47
new kernel: registered at /opt/workspace/vision/torchvision/csrc/ops/cpu/nms_kernel.cpp:112 (function registerKernel)
True

Are you passing the original suggested arguments to launch.py? Because I am, but they aren't showing in your example. Maybe that's the issue? Update: Not working without them and just passing the two you said either. The "OneAPI is available" message doesn't show.

I'm able to set up the environment as far as I can tell, but I can't get the code to run. Maybe there's a missing dependency...

@Vidyut
Copy link
Author

Vidyut commented Jan 24, 2023

I've got other work I'm doing now. Will test more when I get time.

@jbaboval
Copy link

Can you try running source ./venv/bin/activate in your webui tree to activate the virtual environment and then run your test script again?

I think that the problem now might be a difference between your system python environment and the venv environment.

@Vidyut
Copy link
Author

Vidyut commented Jan 25, 2023

nope :(

@jbaboval
Copy link

Sorry I couldn't get you working. I'm going to try and tidy this stuff up and submit it back. So hopefully you'll have better luck when it's properly integrated.

@Nathan-dm
Copy link

could you provide installation tutorial on windows os? i would like to try it on my laptop, coz im sick of waiting my cpu to generate images. my laptop spec is i5 1135g7,16gb ddr4 ram, intel xe graphic (80cu),and intel xe max graphic (DG1)

@jbaboval
Copy link

There's still a lot of stuff broken, but at this point it's hard to tell the difference between bugs here, and driver issues, and pytorch extension issues. I'm also unsure that my SD2.1 fix is the right fix, though it works. I wish I had a CUDA system along side my A770 to compare. Is batch matrix multiplication on cuda just more automatically adaptable? Because otherwise I don't understand what is OneAPI specific about the change I made, or why it works elsewhere without it.

I agree about the messy setup, however it's also pretty typical for Intel's experimental GPU work. It's a mess in early days, but once everything matures they get it all upstream and things are easier. By then, though, somebody else did all the fun hacking.

@genewitch
Copy link

genewitch commented Mar 1, 2023

@jbaboval
i guess a git pull in my directory with your repo will get me up to date?

i have half a mind to wipe all drives and start from scratch. If you want to have access to a decent platform message me. I'm this username most places, including gmail.

edit: i have a system with only cuda - 3060 and 3070. I also have a system with a 1050ti and an arc 770 or whatever. the $350 one.

@genewitch
Copy link

edit: after updating the intel drivers the first 2 images of any batch size (including 1 and 2) are blank or garbage. Good times.

Yes, this seems to have broken with the 1.13 extensions release and the corresponding driver. (Though I only get image 2 in batches as corrupt, not 1 & 2).

is there a way to roll back packages in ubuntu? i actually had a decently working system and decided to update the graphics drivers. (i use gentoo, and i know how to do this there.)

I'll google it if i don't have a reply by tomorrow.

@jbaboval
Copy link

jbaboval commented Mar 4, 2023

@jbaboval i guess a git pull in my directory with your repo will get me up to date?

Yeah, but make sure you're on the oneapi branch. The master branch is the same as upstream.

If you want to run it on your cuda system and tell me what I broke I'll fix that up too. I'll need to make it work everywhere if it's ever going to get merged.

@genewitch
Copy link

unfortunately i returned the A770. The performance was fine, 7.12it/s after the latest intel drivers; but it just started returning garbled, kaleidoscope, paint-spray looking images, and nothing but. Not even a hint that it was doing stable diffusion things.

I told the retailer (and intel, as this was an intel branded card) that i suspected a memory issue. I'm searching for a replacement as we speak, and i still have your oneapi branch checked out on the drive, so when something comes in i'll test again.

heopfully my "how to install" guide above comes in handy, and someone else has a better experience than i did.

@jbaboval
Copy link

jbaboval commented Mar 4, 2023

Probably is a memory issue, but I think driver, not hardware.

I'm hoping this means progress soon: intel/intel-extension-for-pytorch#302 (comment)

@neggles
Copy link

neggles commented Mar 5, 2023

FWIW, I have a number of nVidia systems available, a ROCm system, and now an A770 16GB (though the system for that is currently lashed together on a bench, it does work, so whatever). Will be attempting to have another go at it sometime in the next day-ish.

@Glubb
Copy link

Glubb commented Mar 8, 2023

update, it works on gentoo using kernel 6.2.2, so we don't have to use ubuntu with intel's kernel for those on other operating systems
also I have the same problems as genewitch so I can confirm the hardware is fine it must be the memory issue
also I noticed at least with my configuration and images I'm producing, blitter has gone down from 70%-100% to 30%-55% compared to the original ubuntu 22.04 install, so something has improved somewhere, but this is with --no-half --no-half-vae, without these arguments it's not stable

@neggles
Copy link

neggles commented Mar 9, 2023

It looks like something is just straight broken in the intel pytorch extension - I can't get the damn thing to build, even following their instructions, using their python distribution and build from conda, or using the script in the repo that it looks they used to build the release artifacts. And based on the comments on some of the issues in the repo (see the one @jbaboval linked above) it seems like they know there are Problems and they're working on it.

Think I've bashed my head against this wall enough for a little while... going to wait for another release from Intel. But yes, it does work on latest mainline - I'm on 6.3.0-rc1 on fedora 38 prelease atm, but in an ubuntu 22.04 docker container - just... for a given value of "work". With no-half and no-half-vae i get all kinds of fun casting problems, fp32 stuff seems to just eat itself... something is broken here :(

@genewitch
Copy link

genewitch commented Mar 9, 2023

It looks like something is just straight broken in the intel pytorch extension -
...
With no-half and no-half-vae i get all kinds of fun casting problems, fp32 stuff seems to just eat itself... something is broken here :(

i think it's actually the latest intel driver, because i had SD working "fine" before i tried to do a driver update. By "fine" i mean it would only garble 1/12 or 1/15 images, usually because "a tensor has produced all NaNs, try no-half/no-half-vae" - but setting those flags made no difference. So i updated, and now it was 10/10 images garbled, blank, or kaleidoscope.

I also couldn't get intel pytorch to build, but i did get the intel tensorflow to build. I think their automated build system is missing some configuration file, because i was getting errors about "missing prerequisites" - that's the whole point of a build system, isn't it? >.<

@genewitch
Copy link

for the record:

paid for, received and installed a 12gig 3060, went into my ssd mountpoint and did a git clone automatic1111 or whatever, and i synced the embeddings and models folders, and everything just works.

intel arc is just broken.

@neggles
Copy link

neggles commented Mar 19, 2023

update: have managed to build intel's patched torch and the IPEX extension from source, with much pain. It still can't actually run a generation - something goes screwy in the scheduler and it hard-locks the GPU - but I suspect half of the problem there is that I've been building on GCC 13 which changed a whole bunch of stuff and throws errors all over the place because of missing headers.

Will be making another attempt with some older GCC versions (probably just admit defeat and use Ubuntu 22.04) at some point soonish, or possibly just waiting for intel to give us some slightly newer versions.

@jbaboval
Copy link

New IPEX release today, but it's CPU only again. I wonder if they can't find the bug?

@whchan05
Copy link

whchan05 commented Apr 4, 2023

Came across SD2.0 OpenVINO implementation by Intel. Any chance of it being integrated in Windows version of Web UI?

@jbaboval
Copy link

New IPEX release for xpu today! No wheels though... have to wait for it to build from source.

@jbaboval
Copy link

jbaboval commented May 7, 2023

Intel finally published wheels.

It looks like the new version fixes the major issues. It also introduces some new ones. I've worked around enough to get SD1.5 working and pushed it (with updated instructions) to my fork.

I'll try to rebase closer to AUTOMATIC1111's tip soon, but since this project has moved on to torch 2.0 and the IPEX repo is still on 1.13.x, there will be yet more waiting for releases...

image

@Nathan-dm
Copy link

Intel finally published wheels.

It looks like the new version fixes the major issues. It also introduces some new ones. I've worked around enough to get SD1.5 working and pushed it (with updated instructions) to my fork.

I'll try to rebase closer to AUTOMATIC1111's tip soon, but since this project has moved on to torch 2.0 and the IPEX repo is still on 1.13.x, there will be yet more waiting for releases...

image

good news, but IPEX doesn't support my gpu yet (Intel xe graphics), im still waiting intel to give support to they IGPU lineup/pre ARC gpu

@RigoLigoRLC
Copy link

RigoLigoRLC commented May 30, 2023

@jbaboval Your branch has no issue section so I had to try my luck here. Installed the oneAPI kit as in ArcNotes.txt, verified xpu is available. But I would keep getting "OpenCL error -6". I'm not sure where things has gone wrong but the presence of OpenCL seems sussy.

Logs
rigoligo@RIGO-DESKTOP:~/gitcode/stable-diffusion-webui$ python3 ./launch.py --use-intel-oneapi
Python 3.9.16 (main, Feb 22 2023, 01:57:33)
[GCC 11.2.0]
Commit hash: 2b316c206c84221b94e67456c3811f4df3f699e9
Installing requirements
Launching Web UI with arguments: --use-intel-oneapi
/home/rigoligo/.local/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
  warn(f"Failed to load image Python extension: {e}")
OneAPI is available
Device is xpu
No module 'xformers'. Proceeding without it.
...
2023-05-31 05:46:51,081 - httpx - INFO - HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2023-05-31 05:46:51,083 - httpx - INFO - HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
Error completing request
Arguments: ('task(cai83ytctr6u3cn)', '', '', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0) {}
Traceback (most recent call last):
  File "/home/rigoligo/gitcode/stable-diffusion-webui/modules/call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "/home/rigoligo/gitcode/stable-diffusion-webui/modules/call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "/home/rigoligo/gitcode/stable-diffusion-webui/modules/txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "/home/rigoligo/gitcode/stable-diffusion-webui/modules/processing.py", line 517, in process_images
    res = process_images_inner(p)
  File "/home/rigoligo/gitcode/stable-diffusion-webui/modules/processing.py", line 660, in process_images_inner
    uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps * step_multiplier, cached_uc)
  File "/home/rigoligo/gitcode/stable-diffusion-webui/modules/processing.py", line 599, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "/home/rigoligo/gitcode/stable-diffusion-webui/modules/prompt_parser.py", line 140, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/home/rigoligo/gitcode/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/home/rigoligo/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/rigoligo/gitcode/stable-diffusion-webui/modules/sd_hijack_clip.py", line 229, in forward
    z = self.process_tokens(tokens, multipliers)
  File "/home/rigoligo/gitcode/stable-diffusion-webui/modules/sd_hijack_clip.py", line 254, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "/home/rigoligo/gitcode/stable-diffusion-webui/modules/sd_hijack_clip.py", line 302, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
  File "/home/rigoligo/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/rigoligo/.local/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 811, in forward
    return self.text_model(
  File "/home/rigoligo/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/rigoligo/.local/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 721, in forward
    encoder_outputs = self.encoder(
  File "/home/rigoligo/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/rigoligo/.local/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 650, in forward
    layer_outputs = encoder_layer(
  File "/home/rigoligo/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/rigoligo/.local/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 378, in forward
    hidden_states = self.layer_norm1(hidden_states)
  File "/home/rigoligo/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/rigoligo/.local/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 190, in forward
    return F.layer_norm(
  File "/home/rigoligo/.local/lib/python3.9/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: An OpenCL error occurred: -6

@akx akx added the platform:mac Issues that apply to Apple OS X, M1, M2, etc label Jun 13, 2023
@eddiesmithgit
Copy link

I know this might be a dumb question, but just to save me some trouble (if someone knows this might not work for sure), will this work on Mac OS 10.15, I am using :
MacBook Pro (Retina, 15-inch, Mid 2015)
Processor : 2.2 GHz Quad-Core Intel Core i7 (it's 4th gen by the way)
Graphics : Intel Iris Pro 1536 MB
Memory : 16 GB 1600 MHz DDR3

@RigoLigoRLC
Copy link

RigoLigoRLC commented Jun 28, 2023 via email

@eddiesmithgit
Copy link

You should look for a Metal solution. But if you're not on latest OS,

necessary Metal APIs may be missing.

eddiesmithgit @.***> 于 2023年6月28日周三 下午3:44写道:

I know this might be a dumb question, but just to save me some trouble (if

someone knows this might not work for sure), will this work on Mac OS

10.15, I am using :

MacBook Pro (Retina, 15-inch, Mid 2015)

Processor : 2.2 GHz Quad-Core Intel Core i7 (it's 4th gen by the way)

Graphics : Intel Iris Pro 1536 MB

Memory : 16 GB 1600 MHz DDR3

Reply to this email directly, view it on GitHub

#6417 (comment),

or unsubscribe

https://github.com/notifications/unsubscribe-auth/ALYTVUTDHQRRS2F7VCIICGLXNPOFNANCNFSM6AAAAAATS2Z5SY

.

You are receiving this because you commented.Message ID:

@.***>

Thanks for pointing me in the right direction, so I did some digging in web and found this repo , hopefully this should work I suppose

https://github.com/soten355/MetalDiffusion

@mindplay-dk
Copy link

Is Intel support coming?

(I gave the Vlad fork a go - they claim they've got Intel support, but I couldn't make it work...)

@Nuullll
Copy link
Contributor

Nuullll commented Aug 1, 2023

Is Intel support coming?

(I gave the Vlad fork a go - they claim they've got Intel support, but I couldn't make it work...)

If it was a problem of environment setup, you could give my docker image a try: https://github.com/Nuullll/ipex-sd-docker-for-arc-gpu

@mindplay-dk
Copy link

Going to try this today, I guess 🙂

https://github.com/openvinotoolkit/stable-diffusion-webui/wiki/Installation-on-Intel-Silicon

@Nuullll
Copy link
Contributor

Nuullll commented Jan 6, 2024

FYI. IPEX is supported since 1.7.0: #14171

@uxdesignerhector
Copy link

uxdesignerhector commented Jan 20, 2024

FYI. IPEX is supported since 1.7.0: #14171

image

I can confirm it is working in Windows! it is really fast! to make use of it you must append --use-ipex to COMMANDLINE_ARGS in webui-user.bat or if you are ussing Stability Matrix add it to extra launch arguments.

Also make sure to follow these steps #14171 (comment) if you are using an old installation (you don't need to be on dev branch as it was merged already in 1.7.0)

I needed to disable my iGPU (UHD, Iris) in Device Manager and delete my old venv folder and after that launch Stable Diffusion WebUI with the extra launch argument --use-ipex

@Vidyut

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request platform:mac Issues that apply to Apple OS X, M1, M2, etc
Projects
None yet
Development

No branches or pull requests