Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add embeddings api for Llama #6947

Closed
Harsha-Pulagam opened this issue Jul 30, 2024 · 11 comments
Closed

[Feature]: Add embeddings api for Llama #6947

Harsha-Pulagam opened this issue Jul 30, 2024 · 11 comments

Comments

@Harsha-Pulagam
Copy link

Harsha-Pulagam commented Jul 30, 2024

currently I load openai api server using the command
python3 -m vllm.entrypoints.openai.api_server --model Llama3-8B-Instruct --dtype auto --host 0.0.0.0 --port 8051 --gpu-memory-utilization 0.8 --enforce-eager
I want to try embedding using llama3 but. after loading i can see that embedding API is not loaded
image

I couldn't find any parm to enable embeddings.

Help me to enable embeddings API

@Harsha-Pulagam Harsha-Pulagam added the usage How to use vllm label Jul 30, 2024
@DarkLight1337
Copy link
Member

This means that the model doesn't support embeddings in vLLM yet. You can edit this issue to request such implementation.

@Harsha-Pulagam
Copy link
Author

Which models can generate embeddings in vllm endpoint?
I am using Llama 3

@DarkLight1337
Copy link
Member

Currently, only Mistral is supported (#3734).

https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/__init__.py#L82

@Harsha-Pulagam Harsha-Pulagam changed the title [Usage]: embedding_mode is False. Embedding API will not work. [Feature]: Add embeddings api for Llama Jul 30, 2024
@Harsha-Pulagam
Copy link
Author

Thanks for the reply.
I have edited the issue to request

Copy link

github-actions bot commented Nov 1, 2024

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Nov 1, 2024
@DarkLight1337
Copy link
Member

Closed as completed by #9806. Note that you need to set your architecture to LlamaModel for this to work.

@bernardgut
Copy link

@DarkLight1337

Note that you need to set your architecture to LlamaModel for this to work.

How do I do that ? Do you mean --config-format ?

@DarkLight1337
Copy link
Member

@DarkLight1337

Note that you need to set your architecture to LlamaModel for this to work.

How do I do that ? Do you mean --config-format ?

No, I mean that the architectures field in HF config.json should contain LlamaModel instead of LlamaForCausalLM.

@bernardgut
Copy link

Wait I am confused I am running this in kubernetes is there no way to pass this as an argument at runtime ?

@DarkLight1337
Copy link
Member

DarkLight1337 commented Nov 8, 2024

You can create a script to edit the config.json prior to running vLLM, if needed.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Nov 8, 2024

Alternatively, you can create a fork of the HF repo with the changes to config.json. Another way would be to open a PR to allow LlamaForCausalLM to be used directly as an embedding model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants