-
Although llama.cpp can now support GPU via cublas, it seems that exllama runs times faster if with a good enough GPU (3090 as an example). Is there any plan to support exllama, or in general, other loaders to load LLM? |
Beta Was this translation helpful? Give feedback.
Answered by
mudler
Jul 24, 2023
Replies: 1 comment 5 replies
-
@mudler could there theoretically be a grpc backend implemented in python that used exllama directly? |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
created: #796