-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for fastLLaMa? #575
Comments
Let's make this thread the meta for this |
Please correct me if I'm misunderstanding, but FastLLaMA looks less like a wrapper and more like a fork of LLaMA.cpp, following pull #370. If that's in fact the case: everyone should keep in mind that, much like this project, llama.cpp is developing a breakneck speed. They, too, have experienced model-conversion-requiring changes a couple times recently. I think if the project was truly a wrapper this would be something to keep an eye on but manageable. However being a fork you'd be relying on PotatoSpudowski et al to downstream any changes. If any changes aren't downstreamed, models would be incompatible between FastLlama and LLama.cpp until they have been. This would have the unfortunate consequence of adding yet another class of model to the pool of conversions+lora packs+formats floating, which is already... large (USBHost can attest to community fatigue with prolific model branching). Again, I might be misunderstanding a good amount of information here, but after a little reading on the shared library approach I'm wondering if someone more knowledgeable could weigh in. Is there any credence to an approach of instead including LLaMA.cpp as a direct build requirement in this project with only minimal additional interfacing? edit: After reading FastLLama code a little bit more, I think I still feel similarly but just like the approach of adding llama.cpp to requirements rather than adding a static copy within the project. The bridge.cpp file is probably entirely valid still. |
In ggerganov/llama.cpp#370 it is said that llama.cpp now has its own API, but there is no documentation about it anywhere. |
Hi, @oobabooga if this is something you are still interested, I can help you in any way possible :) BTW love the project!!! |
@PotatoSpudowski just for your information, ooba eventually moved forward with llama-cpp-python. Ooba this issue might be worth closing? |
Description
https://github.com/PotatoSpudowski/fastLLaMa
It's a Python wrapper on the llama.cpp implementation. I feel that this would be easier than directly using llama.cpp in terms of integration.
Additional Info
This would be useful for running larger models like llama-65B due to the VRAM requirements.
The text was updated successfully, but these errors were encountered: