-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intel Arc A770 LLM Fails #3076
Comments
@kprinssu can you try disabling |
@mudler That works but llama.cpp does not use my GPU. It uses my CPU, pegs all of my CPU cores, and then does not stream nor produce any output. Llama.cpp keeps running until I force the container. Edit: Here are the logs when I attempt to re-use the same model config (with https://gist.github.com/kprinssu/a3f617daeb416919e3360365bc0d43a6 Error that causes sycl to fail (same as when f16 is true):
Kernel logs:
Model config for reference:
|
I cannot confirm this yet, but gonna try with my Intel Arc cluster soon to update images. However - from the logs you are sharing it looks related to driver incompatibility issues. When I was setting up my cluster I remember I had to go down to use Ubuntu 22.04 LTS due to Intel drivers not being compatible.
|
I'll attempt to run a local build of llama and subsequently LocalAI on my host rather than via Docker and see if I run into the same error. |
Just to follow up, it. indeed was a driver issue. I hacked in the Ubuntu 22.04 Jammy packages and it's working well. For reference, I had to use: https://chsasank.com/intel-arc-gpu-driver-oneapi-installation.html to install all the required libraries on my host machine. Then I installed local-ai as an native binary and it's working great! Edit: I am pretty sure I missed something as I am unable to get local-ai binary to use sycl. |
Just for clarification, I was unable get LocalAI working with GPU acceleration and I have pivoted to running llama-cpp directly. |
You can try the vulkan version. You may have to manually update the yaml files for the models and specify gpu_layers. Start with a low number, if you set it too high you'll get an error like 'OutOfDeviceMemoryError'. |
if you could share the logs I might help - otherwise it's hard. Did you tried the container images? |
I have tried the images and unfortunately, I found that they stalled or time out. I tried my own custom build of Ollama and it seems to be working very very well now. For reference, this is my Dockerfile to build my own Ollama:
@mudler I believe these env vars will be helpful that I grabbed from Intel's ipex-llm documentation. Would we be able to document these on the LocalAI docs? I found these helped accelerate the computation quite dramatically. Environment variables:
I also found excellent documentation on what each env var does on the sycl llvm repo. |
I am trying to get text generation working on my Intel Arc A770 8GB and I am running into issues sycl not utilising the GPU. When I attempt to run use any prompt or any model I run into the following messages in the logs:
It looks like my hardware does not support half (f16) precision?
LocalAI version:
v2.19.3
Environment, CPU architecture, OS, and Version:
Note: I had to use a custom test kernel from here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2072755/comments/4 to fix my Intel Arc from timing out. It looks like the 6.8 kernel has regressions and I am now running into it.
Edit:
I tried the mainline 6.10.2 kernel and I am still running into the same error. I am still seeing GPU hangs and resets.
Describe the bug
To Reproduce
Attempt to use text generation via the Web UI with any LLM model.
Expected behavior
Text to be produced and llama.cpp not to crash.
Logs
Logs:
https://gist.github.com/kprinssu/6cc5e9798018e05098ec091d1b5f4611
Linux Kernel logs:
Additional context
Docker Compose:
Note: I need to to the extra env vars above as
sycl-ls
did not list my GPU due to the 6.8 kernel (intel/compute-runtime#710 and intel/compute-runtime#710 (comment)).Example Model Config:
The text was updated successfully, but these errors were encountered: