[Feature]: Add embeddings api for Llama #6947

Harsha-Pulagam · 2024-07-30T13:05:59Z

currently I load openai api server using the command
python3 -m vllm.entrypoints.openai.api_server --model Llama3-8B-Instruct --dtype auto --host 0.0.0.0 --port 8051 --gpu-memory-utilization 0.8 --enforce-eager
I want to try embedding using llama3 but. after loading i can see that embedding API is not loaded

I couldn't find any parm to enable embeddings.

Help me to enable embeddings API

DarkLight1337 · 2024-07-30T13:39:15Z

This means that the model doesn't support embeddings in vLLM yet. You can edit this issue to request such implementation.

Harsha-Pulagam · 2024-07-30T13:58:04Z

Which models can generate embeddings in vllm endpoint?
I am using Llama 3

DarkLight1337 · 2024-07-30T14:06:32Z

Currently, only Mistral is supported (#3734).

https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/__init__.py#L82

Harsha-Pulagam · 2024-07-30T14:30:24Z

Thanks for the reply.
I have edited the issue to request

github-actions · 2024-11-01T02:05:12Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

DarkLight1337 · 2024-11-01T04:10:19Z

Closed as completed by #9806. Note that you need to set your architecture to LlamaModel for this to work.

bernardgut · 2024-11-08T09:57:57Z

@DarkLight1337

Note that you need to set your architecture to LlamaModel for this to work.

How do I do that ? Do you mean --config-format ?

DarkLight1337 · 2024-11-08T09:59:15Z

@DarkLight1337

Note that you need to set your architecture to LlamaModel for this to work.

How do I do that ? Do you mean --config-format ?

No, I mean that the architectures field in HF config.json should contain LlamaModel instead of LlamaForCausalLM.

bernardgut · 2024-11-08T10:07:19Z

Wait I am confused I am running this in kubernetes is there no way to pass this as an argument at runtime ?

DarkLight1337 · 2024-11-08T10:12:58Z

You can create a script to edit the config.json prior to running vLLM, if needed.

DarkLight1337 · 2024-11-08T10:14:03Z

Alternatively, you can create a fork of the HF repo with the changes to config.json. Another way would be to open a PR to allow LlamaForCausalLM to be used directly as an embedding model.

Harsha-Pulagam added the usage How to use vllm label Jul 30, 2024

Harsha-Pulagam changed the title ~~[Usage]: embedding_mode is False. Embedding API will not work.~~ [Feature]: Add embeddings api for Llama Jul 30, 2024

DarkLight1337 added feature request and removed usage How to use vllm labels Jul 30, 2024

wangzhen0518 mentioned this issue Aug 21, 2024

[Bug]: Successfully deployed embedding model 'gte-Qwen2-7B-instruct', but got "TypeError: 'async for' requires an object with __aiter__ method, got coroutine" when calling it #7389

Open

noooop mentioned this issue Sep 26, 2024

[RFC]: Support encode only models by Workflow Defined Engine #8453

Open

1 task

noooop mentioned this issue Oct 17, 2024

[Model] Add user-configurable task for models that support both generation and embedding #9424

Merged

github-actions bot added the stale label Nov 1, 2024

DarkLight1337 closed this as completed Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add embeddings api for Llama #6947

[Feature]: Add embeddings api for Llama #6947

Harsha-Pulagam commented Jul 30, 2024 •

edited

Loading

DarkLight1337 commented Jul 30, 2024

Harsha-Pulagam commented Jul 30, 2024

DarkLight1337 commented Jul 30, 2024

Harsha-Pulagam commented Jul 30, 2024

github-actions bot commented Nov 1, 2024

DarkLight1337 commented Nov 1, 2024

bernardgut commented Nov 8, 2024

DarkLight1337 commented Nov 8, 2024

bernardgut commented Nov 8, 2024

DarkLight1337 commented Nov 8, 2024 •

edited

Loading

DarkLight1337 commented Nov 8, 2024 •

edited

Loading

[Feature]: Add embeddings api for Llama #6947

[Feature]: Add embeddings api for Llama #6947

Comments

Harsha-Pulagam commented Jul 30, 2024 • edited Loading

DarkLight1337 commented Jul 30, 2024

Harsha-Pulagam commented Jul 30, 2024

DarkLight1337 commented Jul 30, 2024

Harsha-Pulagam commented Jul 30, 2024

github-actions bot commented Nov 1, 2024

DarkLight1337 commented Nov 1, 2024

bernardgut commented Nov 8, 2024

DarkLight1337 commented Nov 8, 2024

bernardgut commented Nov 8, 2024

DarkLight1337 commented Nov 8, 2024 • edited Loading

DarkLight1337 commented Nov 8, 2024 • edited Loading

Harsha-Pulagam commented Jul 30, 2024 •

edited

Loading

DarkLight1337 commented Nov 8, 2024 •

edited

Loading

DarkLight1337 commented Nov 8, 2024 •

edited

Loading