-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Add embeddings api for Llama #6947
Comments
This means that the model doesn't support embeddings in vLLM yet. You can edit this issue to request such implementation. |
Which models can generate embeddings in vllm endpoint? |
Currently, only Mistral is supported (#3734). https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/__init__.py#L82 |
Thanks for the reply. |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
Closed as completed by #9806. Note that you need to set your architecture to |
How do I do that ? Do you mean |
No, I mean that the |
Wait I am confused I am running this in kubernetes is there no way to pass this as an argument at runtime ? |
You can create a script to edit the |
Alternatively, you can create a fork of the HF repo with the changes to |
currently I load openai api server using the command
python3 -m vllm.entrypoints.openai.api_server --model Llama3-8B-Instruct --dtype auto --host 0.0.0.0 --port 8051 --gpu-memory-utilization 0.8 --enforce-eager
I want to try embedding using llama3 but. after loading i can see that embedding API is not loaded
I couldn't find any parm to enable embeddings.
Help me to enable embeddings API
The text was updated successfully, but these errors were encountered: