-
Notifications
You must be signed in to change notification settings - Fork 684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connecting to VLLM OpenAI API Compatible Server #123
Comments
Looks as though you are boxed in to use either ADA embedding from OpenAI or MiniLM or Cohere as well... |
Yes, right now there is no support for Mixtral models! But great point, we'll look into that for the next update |
I got this working end to end but had to make some changes to be able to use my custom embedding model server. I submitted a PR for the changes I needed to be able to use OpenAI compatible API server for both embeddings and LLM: #148 I plan to publish an end to end tutorial that runs on K8s to install Verba, Weaviate, an LLM and an embedding model server all within the same K8s cluster. Stay tuned! |
I finished writing my guide for end-to-end private Verba RAG using Weaviate, Lingo, vLLM + Mistral 7b v2 and Sentence Transformers: https://www.substratus.ai/blog/lingo-weaviate-private-rag Looking forward to hearing feedback. The guide should help you with figuring out how to use vanilla vLLM with Verba too. |
In my lab environment i am serving mixtral with VLLM using their OpenAI API compatible server and I'm hosting a weaviate instance as well.
I just spun up verba, pointing to both my weaviate instance and VLLM instance using the .env file and connection to weaviate seems to be all good. I can see my schema and object count in the status tab but any queries i make seem to break... Unsure if this is a limitation on being able to handle models other than GPT-3.5 || GPT-4 being served from OpenAI.
Has anyone been able to configure a setup like this?
Thanks!
The text was updated successfully, but these errors were encountered: