-
Notifications
You must be signed in to change notification settings - Fork 969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuBLAS with llama-cpp-python on Windows #117
Comments
how did you do cuBLAS with llama-cpp-python on linux? |
My method is stupid af. Key thingies: delete line 107 We are barbarians now, can't go wrong with setting argument if we leave it no choice right. Then i go to
after that i check if .\vendor\llama.cpp has export LLAMA_CUBLAS=1 This way i try to set argument LLAMA_CUBLAS=1 about 4 times in a row in different places. Yet it always takes me multiple attempts to finally build it. And it always eventually does, but as you can see the way i do it isn't straightforward. At this point i don't know what is actually important so i just repeat steps. |
Can you run |
If you ask about windows one, then it has nothing cuda related in it. I can't build working dll with cuda. |
I seem to have progressed further than you w/Ubuntu? |
Well, this one: ggerganov/llama.cpp#1207 doesn't work for me in WSL Ubuntu, opened issue here: ggerganov/llama.cpp#1230 But previous way of memory allocation in GPU worked for me just fine and still works. |
I have a working dynamic Linux binary. It frustratingly doesn't however generate any GPU%, independent of the batch size. |
you mean libllama.so? |
Yes, in that it seems to load fine from within |
Well, this one works for me too. This is basically the very reason i use WSL at all. It does give me working libllama.so with enabled cuBLAS. I don't use llama-cpp-python as standalone in Ubuntu, i call it with oobabooga webui, which then i use just as api. I build cuda-enabled llama-cpp-python's libllama.so and feed it to ooba, then i launch it like that:
and return back to windows, where i run https://github.com/Cohee1207/SillyTavern and https://github.com/Cohee1207/TavernAI-extras which links to WSL's api and this all comes together. And yeah cuda works cus i can see and feel it The problem for me and reason why this issue is still here is that i can't make working llama.dll in windows, to exclude WSL from my setup. |
Can you try the script in #182? |
cuBLAS with llama-cpp-python on Windows.
Well, it works on WSL for me as intended but no tricks of mine help me to make it work using llama.dll in Windows.
I try it daily for the last week changing one thing or another. Asked friend to try it on a different system but he found no success either.
At this point i sincerely wonder if anyone ever made this work.
I don't expect any help regarding this issue, but If anyone could confirm that it does work in Windows and is possible at all as a concept it would surely encourage me to continue this struggle.
(there is no issue to make original llama.cpp to work with cuBLAS anywhere, issue lies with a wrapper for me)
I'll post error just in case:
And here the Llama DLL that doesn't work:
llama.zip
The text was updated successfully, but these errors were encountered: