-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add chat template support for llama-cli #8068
Conversation
std::string user_inp = params.conversation | ||
? chat_add_and_format("user", buffer) | ||
: buffer; | ||
// TODO: one inconvenient of current chat template implementation is that we can't distinguish between user input and special tokens (prefix/postfix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When params.conversation == false
there is an extra string copy that should be avoided here
Regarding the comment - can you illustrate with an example as I'm not sure what is the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example would be a prompt like this: Which one is correct HTML tag? <s> or <a>?
Some models having <s>
as BOS will see the prompt as Which one is correct HTML tag? BOS or <a>?
Leaving special == false
will fix that, but will also break chat template since we're now adding special tokens to user's text. This could be avoided with some more code. But IMO it's not really a big deal though, assuming that special tokens are unlikely to accidentally appear in the text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a std::move(buffer)
since we no longer use buffer
after this line. Is it OK to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha got it. Yes, for now let's make have the simple solution
Co-authored-by: Georgi Gerganov <[email protected]>
It looks like it broke some models, here is the llama-cli output and brief gdb inspection from DeepSeek-V2-Lite:
|
@fairydreaming The default behavior should be "if built-in template is not supported, we use chatml as fallback value" Turns out it's not the case here (I missed something). I'll need to push a fix for this. |
* add chat template support for llama-cli * add help message * server: simplify format_chat * more consistent naming * improve * add llama_chat_format_example * fix server * code style * code style * Update examples/main/main.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>
* add chat template support for llama-cli * add help message * server: simplify format_chat * more consistent naming * improve * add llama_chat_format_example * fix server * code style * code style * Update examples/main/main.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>
In this PR, I propose some changes: - Update binary name to `llama-cli` (for more details, see this PR: ggerganov/llama.cpp#7809 and this [homebrew formula](https://github.com/Homebrew/homebrew-core/blob/03cf5d39d8bf27dfabfc90d62c9a3fe19205dc2a/Formula/l/llama.cpp.rb)) - Add method to download llama.cpp via pre-built release - Split snippet into 3 sections: `title`, `setup` and `command` - Use `--conversation` mode to start llama.cpp in chat mode (chat template is now supported, ref: ggerganov/llama.cpp#8068) --- Proposal for the UI: (Note: Maybe the 3 sections title - setup - command can be more separated visually) ![image](https://github.com/huggingface/huggingface.js/assets/7702203/2bd302f0-88b1-4057-9cd3-3cf4536aae50)
This PR brings the same logic of chat template from server to main (llama-cli).
Goals
llama_chat_apply_template
function--chat-template
argumentllama_chat_apply_template
and thus requires additional maintenance.How it works
llama_chat_apply_template
that supportstd::string
==> simplify the codellama_chat_format_single
==> it evaluates the history twice, once with and once without the added message, then return the diffDemo
Fix #8053 #6391
Replace #6810