Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add llama_chat_apply_template() #5538

Merged
merged 9 commits into from
Feb 19, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Feb 16, 2024

Closes #5527

Since most gguf models already have chat template saved in its metadata, developers can use llama_chat_apply_template function to format the chat:

llama_chat_message conversation[] = {
    {"system", "You are a helpful assistant"},
    {"user", "Hello"},
    {"assistant", "Hi there"},
    {"user", "Who are you"},
    {"assistant", "   I am an assistant   "},
    {"user", "Another question"},
};
size_t message_count = 6;

// ideally, size of buffer should be 2 * (total number of characters of all messages)
std::vector<char> formatted_chat(1024);
res = llama_chat_apply_template(
    model, // by default, template is taken from model metadata
    nullptr, // alternatively, you can supply your own template as string
    conversation,
    message_count,
    true, // add a trailing "assistant" prompt, only use on chatml for now
    formatted_chat.data(),
    formatted_chat.size()
);
formatted_chat.resize(res);

std::cout << std::string(formatted_chat.data(), formatted_chat.size());

Result (chatml for example):

<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
Who are you<|im_end|>
<|im_start|>assistant
   I am an assistant   <|im_end|>
<|im_start|>user
Another question<|im_end|>
<|im_start|>assistant

CC @ggerganov and @cebtenzzre for review

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 17, 2024

Update: seems like I completely missed the discussion about this subject since 11/2023: #4216 (comment)

Many developers expect that we have some kind of "jinja parser" for the template

I need to add a clarification on the inline docs, so it will be clear that llama_chat_apply_template is NOT a jinja parser

llama.h Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated

// Simple version of "llama_apply_chat_template" that only works with strings
// This function uses heuristic checks to determine commonly used template. It is not a jinja parser.
int32_t llama_chat_apply_template_internal(std::string &dest, std::string chat_template, std::vector<const llama_chat_message *> conversation, bool add_ass) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the const args in front and pass by ref:

Suggested change
int32_t llama_chat_apply_template_internal(std::string &dest, std::string chat_template, std::vector<const llama_chat_message *> conversation, bool add_ass) {
int32_t llama_chat_apply_template_internal(
const std::string & chat_template,
const std::vector<const llama_chat_message *> & chat,
std::string & dest, bool add_ass) {

The terms chat and conversation seem conflated. Propose to use chat universally (apply to other places too)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed all the occurrences of conversation and msg to chat in this commit: 73fbd67

llama.h Outdated Show resolved Hide resolved
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - good job!

@ggerganov ggerganov merged commit 11b12de into ggerganov:master Feb 19, 2024
52 of 54 checks passed
ggerganov added a commit that referenced this pull request Feb 19, 2024
// load template from model
std::vector<char> model_template(2048, 0); // longest known template is about 1200 bytes
std::string template_key = "tokenizer.chat_template";
int32_t res = llama_model_meta_val_str(model, template_key.c_str(), model_template.data(), curr_tmpl.size());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that I made a mistake in this line: it should be model_template.size(), not curr_tmpl.size(). I'm fixing it in the next PR (using this function in server)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P/s: It took me almost 1 hr to figure out this error. Sorry if I accidentally put somebody else in the same situation as me.

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* llama: add llama_chat_apply_template

* test-chat-template: remove dedundant vector

* chat_template: do not use std::string for buffer

* add clarification for llama_chat_apply_template

* llama_chat_apply_template: add zephyr template

* llama_chat_apply_template: correct docs

* llama_chat_apply_template: use term "chat" everywhere

* llama_chat_apply_template: change variable name to "tmpl"
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* llama: add llama_chat_apply_template

* test-chat-template: remove dedundant vector

* chat_template: do not use std::string for buffer

* add clarification for llama_chat_apply_template

* llama_chat_apply_template: add zephyr template

* llama_chat_apply_template: correct docs

* llama_chat_apply_template: use term "chat" everywhere

* llama_chat_apply_template: change variable name to "tmpl"
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add equivalent to hf apply_chat_template()
3 participants