server: phi-3 end token not handled? #6903

infozzdatalabs · 2024-04-25T10:58:12Z

Phi-3 4k model include in all responses the end token "<|end|>"

Im using: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf and llama.cpp for docker cuda server in the latest version.

Thanks in advance.

x4080 · 2024-04-25T21:22:11Z

i can confirm that, llama 3 template also, it seems there's change in llama cpp and utils.hpp not including the stop token

thecivilizedgamer · 2024-04-25T22:48:12Z

Seeing the same issue, with both Phi 3 and Llama 3, using server with the latest changes. I had to roll back to an older commit to get Llama 3 working properly again

thecivilizedgamer · 2024-04-25T23:03:06Z

i can confirm that, llama 3 template also, it seems there's change in llama cpp and utils.hpp not including the stop token

Did you try Llama 3 with the latest commit? I was just made aware that it should have been fixed by this PR #6860

I pulled the latest changes and tried again just now, and Llama 3 is working again for me. But Phi 3 still has issues with the stop token for server, at least for chat completions:

Hello there! It's great to interact with you. How can I assist you today?<|end|>

Edit: I didn't try with a newer quant, so I suppose it might be an issue with the specific model I'm using

teleprint-me · 2024-04-26T00:14:40Z

I've seen this with every model I've used so far. Will have to test. Been busy working, but I've been using the stop option to handle stop tokens and it goes away. e.g. mistral </s>, llama3 <|eot_id|>, phi3 <|end|>, etc.

x4080 · 2024-04-26T22:38:26Z

My temporary solution is re-adding :

    llama_params["stop"].push_back("<|eot_id|>"); // llama 3
    llama_params["stop"].push_back("<|end|>"); // phi 3

into util.hpp

phymbert · 2024-04-26T22:43:42Z

You can pass the stop tokens in the payload:

https://github.com/ggerganov/llama.cpp/pull/6916/files

teleprint-me · 2024-04-27T05:27:21Z

It's in the API for both server and main examples.

stop: Specify a JSON array of stopping strings. These words will not be included in the completion, so make sure to add them to the prompt for the next iteration. Default: []

REST API Docs

Any-Winter-4079 · 2024-04-27T15:53:15Z

I am running into this issue but with ./main:

./main -m models/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf -n 1024 -e -c 4096 -ngl -1 -r "['<|end|>']" -p '1-2='

1-2=0<|end|><|assistant|> The equation 1-2=0 is incorrect. The correct result of 1-2 is -1. Therefore, the equation 1-2 ≠ 0 is true.<|end|><|assistant|> Yes, that is correct. The equation 1-2 does not equal 0; instead, it equals -1. The equation 1-2 = 0 is false.<|end|><|assistant|> Absolutely, you've got it right. The expression 1-2 indeed equals -1, not 0. So, the equation 1-2 = 0 is not true.<|end|><|endoftext|> [end of text]

Commit: 8a56075

On commit 928e0b7 I get gibberish which is probably related to #6944

It looks like -r is not being handled correctly even if added manually?

Another example, using the chat template:

./main -m models/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-fp16.gguf -n 256 -e -c 4096 -ngl -1 -r "[<|end|>]" -p '<|user|>Result of 1-2:<|end|>\n<|assistant|>'

<|user|>Tell me the final result of 1-2:<|end|>
<|assistant|> The result of 1 - 2 is -1.<|end|><|assistant|> When you subtract 2 from 1, you get -1. Here's the calculation:

1 - 2 = -1

Subtraction is the operation of removing one quantity from another. So if you have 1 item and remove 2 items, you cannot do so directly; instead, you end up with a deficit, represented by -1 in this context.<|end|><|endoftext|> [end of text]

Or am I doing something wrong here?

infozzdatalabs · 2024-04-29T07:51:07Z

It's in the API for both server and main examples.

stop: Specify a JSON array of stopping strings. These words will not be included in the completion, so make sure to add them to the prompt for the next iteration. Default: []

REST API Docs

Yes, I noticed that as well. The stop parameter you mentioned is available for the "/completion" endpoint. However, when using the OpenAI API, the endpoint is "/v1/chat/completions".

victorlwchen · 2024-05-10T06:58:41Z

I also encountered this issue, I found that <|end|> token was processed as a User_Defined Type instead of a Control Type, causing it to be output as a normal token during the token_id to text conversion.

Referring to vllm-project/vllm#4182 , they would additionally handle the generation_config.json. Apparently, for phi-3 model, <|end|> should be considered as Control type token.

However, the handling of the phi-3 model by convert-hf-to-gguf.py would categorize the token ids inside added_tokens.json as USER_DEFINED type.

maziyarpanahi · 2024-05-10T12:19:34Z

Out of curiosity, can llama_params["stop"] be saved inside GGUF metadata? So I can edit GGUF models before shipping them and add the proper stops instead of asking users to do it?

github-actions · 2024-06-25T02:41:36Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

mosujiba · 2024-06-26T03:27:34Z

When I run Phi-3 with llama-cli -cnv and the default chat template #8068, it still spits out the end token in all responses.

infozzdatalabs added the bug-unconfirmed label Apr 25, 2024

RhinoDevel mentioned this issue Apr 25, 2024

main exe with deepseek-coder-1.3b-instruct.Q8_0.gguf not stopping correctly #6912

Closed

thecivilizedgamer mentioned this issue Apr 25, 2024

Added llama-3 chat template #6751

Merged

DePasqualeOrg mentioned this issue May 16, 2024

Phi-3 mini stop token not recognized ml-explore/mlx-swift-examples#74

Closed

github-actions bot added the stale label Jun 10, 2024

github-actions bot closed this as completed Jun 25, 2024

bakkot mentioned this issue Sep 7, 2024

llama : set attrs of mislabelled EOT/EOM tokens #9348

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: phi-3 end token not handled? #6903

server: phi-3 end token not handled? #6903

infozzdatalabs commented Apr 25, 2024

x4080 commented Apr 25, 2024

thecivilizedgamer commented Apr 25, 2024

thecivilizedgamer commented Apr 25, 2024 •

edited

Loading

teleprint-me commented Apr 26, 2024 •

edited

Loading

x4080 commented Apr 26, 2024

phymbert commented Apr 26, 2024

teleprint-me commented Apr 27, 2024

Any-Winter-4079 commented Apr 27, 2024 •

edited

Loading

infozzdatalabs commented Apr 29, 2024

victorlwchen commented May 10, 2024

maziyarpanahi commented May 10, 2024

github-actions bot commented Jun 25, 2024

mosujiba commented Jun 26, 2024

server: phi-3 end token not handled? #6903

server: phi-3 end token not handled? #6903

Comments

infozzdatalabs commented Apr 25, 2024

x4080 commented Apr 25, 2024

thecivilizedgamer commented Apr 25, 2024

thecivilizedgamer commented Apr 25, 2024 • edited Loading

teleprint-me commented Apr 26, 2024 • edited Loading

x4080 commented Apr 26, 2024

phymbert commented Apr 26, 2024

teleprint-me commented Apr 27, 2024

Any-Winter-4079 commented Apr 27, 2024 • edited Loading

infozzdatalabs commented Apr 29, 2024

victorlwchen commented May 10, 2024

maziyarpanahi commented May 10, 2024

github-actions bot commented Jun 25, 2024

mosujiba commented Jun 26, 2024

thecivilizedgamer commented Apr 25, 2024 •

edited

Loading

teleprint-me commented Apr 26, 2024 •

edited

Loading

Any-Winter-4079 commented Apr 27, 2024 •

edited

Loading