Add llama_chat_apply_template() #5538

ngxson · 2024-02-16T15:36:24Z

Since most gguf models already have chat template saved in its metadata, developers can use llama_chat_apply_template function to format the chat:

llama_chat_message conversation[] = {
    {"system", "You are a helpful assistant"},
    {"user", "Hello"},
    {"assistant", "Hi there"},
    {"user", "Who are you"},
    {"assistant", "   I am an assistant   "},
    {"user", "Another question"},
};
size_t message_count = 6;

// ideally, size of buffer should be 2 * (total number of characters of all messages)
std::vector<char> formatted_chat(1024);
res = llama_chat_apply_template(
    model, // by default, template is taken from model metadata
    nullptr, // alternatively, you can supply your own template as string
    conversation,
    message_count,
    true, // add a trailing "assistant" prompt, only use on chatml for now
    formatted_chat.data(),
    formatted_chat.size()
);
formatted_chat.resize(res);

std::cout << std::string(formatted_chat.data(), formatted_chat.size());

Result (chatml for example):

<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
Who are you<|im_end|>
<|im_start|>assistant
   I am an assistant   <|im_end|>
<|im_start|>user
Another question<|im_end|>
<|im_start|>assistant

CC @ggerganov and @cebtenzzre for review

ngxson · 2024-02-17T15:48:08Z

Update: seems like I completely missed the discussion about this subject since 11/2023: #4216 (comment)

Many developers expect that we have some kind of "jinja parser" for the template

I need to add a clarification on the inline docs, so it will be clear that llama_chat_apply_template is NOT a jinja parser

llama.h

llama.cpp

ggerganov · 2024-02-18T19:01:46Z

llama.cpp

+
+// Simple version of "llama_apply_chat_template" that only works with strings
+// This function uses heuristic checks to determine commonly used template. It is not a jinja parser.
+int32_t llama_chat_apply_template_internal(std::string &dest, std::string chat_template, std::vector<const llama_chat_message *> conversation, bool add_ass) {


Put the const args in front and pass by ref:

Suggested change

int32_t llama_chat_apply_template_internal(std::string &dest, std::string chat_template, std::vector<const llama_chat_message *> conversation, bool add_ass) {

int32_t llama_chat_apply_template_internal(

const std::string & chat_template,

const std::vector<const llama_chat_message *> & chat,

std::string & dest, bool add_ass) {

The terms chat and conversation seem conflated. Propose to use chat universally (apply to other places too)

I changed all the occurrences of conversation and msg to chat in this commit: 73fbd67

llama.h

ggerganov

Nice - good job!

ngxson · 2024-02-19T18:36:07Z

llama.cpp

+        // load template from model
+        std::vector<char> model_template(2048, 0); // longest known template is about 1200 bytes
+        std::string template_key = "tokenizer.chat_template";
+        int32_t res = llama_model_meta_val_str(model, template_key.c_str(), model_template.data(), curr_tmpl.size());


I noticed that I made a mistake in this line: it should be model_template.size(), not curr_tmpl.size(). I'm fixing it in the next PR (using this function in server)

P/s: It took me almost 1 hr to figure out this error. Sorry if I accidentally put somebody else in the same situation as me.

* llama: add llama_chat_apply_template * test-chat-template: remove dedundant vector * chat_template: do not use std::string for buffer * add clarification for llama_chat_apply_template * llama_chat_apply_template: add zephyr template * llama_chat_apply_template: correct docs * llama_chat_apply_template: use term "chat" everywhere * llama_chat_apply_template: change variable name to "tmpl"

ngxson added 3 commits February 16, 2024 16:27

llama: add llama_chat_apply_template

4e64440

test-chat-template: remove dedundant vector

bba75c7

chat_template: do not use std::string for buffer

9c4422f

psugihara mentioned this pull request Feb 16, 2024

Add support for OpenAI API psugihara/FreeChat#59

Open

add clarification for llama_chat_apply_template

6012ad6

ngxson added 3 commits February 17, 2024 16:54

llama_chat_apply_template: add zephyr template

7a3eac8

llama_chat_apply_template: correct docs

011af99

Merge branch 'master' into xsn/chat_apply_template

dba4337

ggerganov approved these changes Feb 18, 2024

View reviewed changes

ngxson added 2 commits February 18, 2024 21:44

llama_chat_apply_template: use term "chat" everywhere

73fbd67

llama_chat_apply_template: change variable name to "tmpl"

649f6f8

ngxson mentioned this pull request Feb 18, 2024

Server: use llama_chat_apply_template to format the chat #5575

Closed

ggerganov approved these changes Feb 19, 2024

View reviewed changes

ggerganov merged commit 11b12de into ggerganov:master Feb 19, 2024
52 of 54 checks passed

ggerganov added a commit that referenced this pull request Feb 19, 2024

minor : fix trailing whitespace (#5538)

f53119c

ngxson commented Feb 19, 2024

View reviewed changes

SilasMarvin mentioned this pull request Mar 3, 2024

Added Apply Chat Template utilityai/llama-cpp-rs#127

Merged

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

minor : fix trailing whitespace (ggerganov#5538)

5ff87db

ngxson mentioned this pull request Mar 29, 2024

Implement (properly) different chat templates in main.cpp #6391

Closed

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

minor : fix trailing whitespace (ggerganov#5538)

d2970f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add llama_chat_apply_template() #5538

Add llama_chat_apply_template() #5538

ngxson commented Feb 16, 2024 •

edited

Loading

ngxson commented Feb 17, 2024 •

edited

Loading

ggerganov Feb 18, 2024

ngxson Feb 18, 2024

ggerganov left a comment

ngxson Feb 19, 2024

ngxson Feb 19, 2024

-int32_t llama_chat_apply_template_internal(std::string &dest, std::string chat_template, std::vector<const llama_chat_message *> conversation, bool add_ass) {
+int32_t llama_chat_apply_template_internal(
+    const std::string & chat_template,
+    const std::vector<const llama_chat_message *> & chat,
+    std::string & dest, bool add_ass) {

Add llama_chat_apply_template() #5538

Add llama_chat_apply_template() #5538

Conversation

ngxson commented Feb 16, 2024 • edited Loading

ngxson commented Feb 17, 2024 • edited Loading

ggerganov Feb 18, 2024

Choose a reason for hiding this comment

ngxson Feb 18, 2024

Choose a reason for hiding this comment

ggerganov left a comment

Choose a reason for hiding this comment

ngxson Feb 19, 2024

Choose a reason for hiding this comment

ngxson Feb 19, 2024

Choose a reason for hiding this comment

ngxson commented Feb 16, 2024 •

edited

Loading

ngxson commented Feb 17, 2024 •

edited

Loading