-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add llama_chat_apply_template() #5538
Conversation
Update: seems like I completely missed the discussion about this subject since 11/2023: #4216 (comment) Many developers expect that we have some kind of "jinja parser" for the template I need to add a clarification on the inline docs, so it will be clear that |
llama.cpp
Outdated
|
||
// Simple version of "llama_apply_chat_template" that only works with strings | ||
// This function uses heuristic checks to determine commonly used template. It is not a jinja parser. | ||
int32_t llama_chat_apply_template_internal(std::string &dest, std::string chat_template, std::vector<const llama_chat_message *> conversation, bool add_ass) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the const args in front and pass by ref:
int32_t llama_chat_apply_template_internal(std::string &dest, std::string chat_template, std::vector<const llama_chat_message *> conversation, bool add_ass) { | |
int32_t llama_chat_apply_template_internal( | |
const std::string & chat_template, | |
const std::vector<const llama_chat_message *> & chat, | |
std::string & dest, bool add_ass) { |
The terms chat
and conversation
seem conflated. Propose to use chat
universally (apply to other places too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed all the occurrences of conversation
and msg
to chat
in this commit: 73fbd67
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - good job!
// load template from model | ||
std::vector<char> model_template(2048, 0); // longest known template is about 1200 bytes | ||
std::string template_key = "tokenizer.chat_template"; | ||
int32_t res = llama_model_meta_val_str(model, template_key.c_str(), model_template.data(), curr_tmpl.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that I made a mistake in this line: it should be model_template.size()
, not curr_tmpl.size()
. I'm fixing it in the next PR (using this function in server)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P/s: It took me almost 1 hr to figure out this error. Sorry if I accidentally put somebody else in the same situation as me.
* llama: add llama_chat_apply_template * test-chat-template: remove dedundant vector * chat_template: do not use std::string for buffer * add clarification for llama_chat_apply_template * llama_chat_apply_template: add zephyr template * llama_chat_apply_template: correct docs * llama_chat_apply_template: use term "chat" everywhere * llama_chat_apply_template: change variable name to "tmpl"
* llama: add llama_chat_apply_template * test-chat-template: remove dedundant vector * chat_template: do not use std::string for buffer * add clarification for llama_chat_apply_template * llama_chat_apply_template: add zephyr template * llama_chat_apply_template: correct docs * llama_chat_apply_template: use term "chat" everywhere * llama_chat_apply_template: change variable name to "tmpl"
Closes #5527
Since most gguf models already have chat template saved in its metadata, developers can use
llama_chat_apply_template
function to format the chat:Result (chatml for example):
CC @ggerganov and @cebtenzzre for review