Is there any plan to support exllama? #763

yarray · 2023-07-17T12:37:19Z

yarray
Jul 17, 2023

Although llama.cpp can now support GPU via cublas, it seems that exllama runs times faster if with a good enough GPU (3090 as an example). Is there any plan to support exllama, or in general, other loaders to load LLM?

Answered by mudler

Jul 24, 2023

created: #796

View full answer

tmm1 · 2023-07-20T04:32:12Z

tmm1
Jul 20, 2023

@mudler could there theoretically be a grpc backend implemented in python that used exllama directly?

5 replies

mudler Jul 20, 2023
Maintainer

Yes, definetly - in the same way now python embeddings could be an option as well

fblissjr Jul 24, 2023

If there's an modular way to bring in GPTQ models via exllama, autogptq (https://github.com/PanQiWei/AutoGPTQ/), I'd love to use LocalAI - have only used other options (like ooba's openai extension) because of the GPTQ support.

mudler Jul 24, 2023
Maintainer

yes now it's possible - I'm converting this as an issue to track it!

mudler Jul 24, 2023
Maintainer

created: #796

Answer selected by yarray

mudler Aug 10, 2023
Maintainer

JFYI: coming soon #881

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any plan to support exllama? #763

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Is there any plan to support exllama? #763

yarray Jul 17, 2023

Replies: 1 comment · 5 replies

tmm1 Jul 20, 2023

mudler Jul 20, 2023 Maintainer

fblissjr Jul 24, 2023

mudler Jul 24, 2023 Maintainer

mudler Jul 24, 2023 Maintainer

mudler Aug 10, 2023 Maintainer

yarray
Jul 17, 2023

Replies: 1 comment 5 replies

tmm1
Jul 20, 2023

mudler Jul 20, 2023
Maintainer

mudler Jul 24, 2023
Maintainer

mudler Jul 24, 2023
Maintainer

mudler Aug 10, 2023
Maintainer