Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for fastLLaMa? #575

Closed
official-elinas opened this issue Mar 25, 2023 · 5 comments
Closed

Support for fastLLaMa? #575

official-elinas opened this issue Mar 25, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@official-elinas
Copy link

official-elinas commented Mar 25, 2023

Description

https://github.com/PotatoSpudowski/fastLLaMa

It's a Python wrapper on the llama.cpp implementation. I feel that this would be easier than directly using llama.cpp in terms of integration.

Additional Info

This would be useful for running larger models like llama-65B due to the VRAM requirements.

@official-elinas official-elinas added the enhancement New feature or request label Mar 25, 2023
@oobabooga
Copy link
Owner

Let's make this thread the meta for this

@Loufe
Copy link
Contributor

Loufe commented Mar 29, 2023

Please correct me if I'm misunderstanding, but FastLLaMA looks less like a wrapper and more like a fork of LLaMA.cpp, following pull #370.

If that's in fact the case: everyone should keep in mind that, much like this project, llama.cpp is developing a breakneck speed. They, too, have experienced model-conversion-requiring changes a couple times recently. I think if the project was truly a wrapper this would be something to keep an eye on but manageable. However being a fork you'd be relying on PotatoSpudowski et al to downstream any changes. If any changes aren't downstreamed, models would be incompatible between FastLlama and LLama.cpp until they have been. This would have the unfortunate consequence of adding yet another class of model to the pool of conversions+lora packs+formats floating, which is already... large (USBHost can attest to community fatigue with prolific model branching).

Again, I might be misunderstanding a good amount of information here, but after a little reading on the shared library approach I'm wondering if someone more knowledgeable could weigh in. Is there any credence to an approach of instead including LLaMA.cpp as a direct build requirement in this project with only minimal additional interfacing?

edit: After reading FastLLama code a little bit more, I think I still feel similarly but just like the approach of adding llama.cpp to requirements rather than adding a static copy within the project. The bridge.cpp file is probably entirely valid still.

@oobabooga
Copy link
Owner

In ggerganov/llama.cpp#370 it is said that llama.cpp now has its own API, but there is no documentation about it anywhere.

@PotatoSpudowski
Copy link

PotatoSpudowski commented Apr 14, 2023

Hi,
Really cool that you guys found the repo interesting. Initially it was just a quick thing my friend and I hacked together. Based on the feedback from folks we have been working on make the repo more usable. We have a new branch named 'feature/refactor' we did a few things like updating the ggml library, removing python version dependency and made the setup process easier. We will be merging it to main in a few hours.

@oobabooga if this is something you are still interested, I can help you in any way possible :)

BTW love the project!!!

@Loufe
Copy link
Contributor

Loufe commented Apr 14, 2023

@PotatoSpudowski just for your information, ooba eventually moved forward with llama-cpp-python.

Ooba this issue might be worth closing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants