-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: phi-3 end token not handled? #6903
Comments
i can confirm that, llama 3 template also, it seems there's change in llama cpp and utils.hpp not including the stop token |
Seeing the same issue, with both Phi 3 and Llama 3, using server with the latest changes. I had to roll back to an older commit to get Llama 3 working properly again |
Did you try Llama 3 with the latest commit? I was just made aware that it should have been fixed by this PR #6860 I pulled the latest changes and tried again just now, and Llama 3 is working again for me. But Phi 3 still has issues with the stop token for server, at least for chat completions:
Edit: I didn't try with a newer quant, so I suppose it might be an issue with the specific model I'm using |
I've seen this with every model I've used so far. Will have to test. Been busy working, but I've been using the |
My temporary solution is re-adding :
into util.hpp |
You can pass the stop tokens in the payload: https://github.com/ggerganov/llama.cpp/pull/6916/files |
It's in the API for both server and main examples.
|
I am running into this issue but with ./main:
Commit: 8a56075 On commit 928e0b7 I get gibberish which is probably related to #6944 It looks like -r is not being handled correctly even if added manually? Another example, using the chat template:
Or am I doing something wrong here? |
Yes, I noticed that as well. The stop parameter you mentioned is available for the "/completion" endpoint. However, when using the OpenAI API, the endpoint is "/v1/chat/completions". |
I also encountered this issue, I found that <|end|> token was processed as a User_Defined Type instead of a Control Type, causing it to be output as a normal token during the token_id to text conversion. Referring to vllm-project/vllm#4182 , they would additionally handle the generation_config.json. Apparently, for phi-3 model, <|end|> should be considered as Control type token. However, the handling of the phi-3 model by convert-hf-to-gguf.py would categorize the token ids inside added_tokens.json as USER_DEFINED type. |
Out of curiosity, can |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
When I run Phi-3 with |
Phi-3 4k model include in all responses the end token "<|end|>"
Im using: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf and llama.cpp for docker cuda server in the latest version.
Thanks in advance.
The text was updated successfully, but these errors were encountered: