Use `llama_chat_apply_template` in `main` (WIP) #6810

ngxson · 2024-04-21T16:20:01Z

Resolve #6391

The core idea is to use llama_chat_apply_template to apply the template twice: with and without the last user message. Then, we find the diff between 2 output strings and finally feed it into inference.

Example:

<start_of_turn>user
You are a helpful assistant

Hello<end_of_turn>
<start_of_turn>model
Hi there<end_of_turn>
<start_of_turn>user
Who are you<end_of_turn>
<start_of_turn>model
I am an assistant<end_of_turn>
<start_of_turn>user
Another question<end_of_turn>
<start_of_turn>model

-----
chat_get_added_part(): <start_of_turn>user
Another question<end_of_turn>
<start_of_turn>model

This approach will require minimal effort to maintain the chat template infrastructure, while using the extract same logic for main and server (remind: server also have the notion of "prompt cache" which works the same way)

Having to re-format the whole chat history each time seems inefficient at first glance, but it is needed because:

There're some edge cases, see: Implement (properly) different chat templates in main.cpp #6391 (comment)
That's the same logic with server (which is designed to be stateless)

Then, we find the diff between the 2 strings.

Implement chat_get_added_part to get the diff part with / without the last user message
main must keep track of the list of messages
Update arguments for main, deprecate -cml (but not remove it) while adding -chat-template argument

add chat_get_added_part

eb9a1ff

This was referenced Apr 21, 2024

Implement (properly) different chat templates in main.cpp #6391

Closed

Refactor chat template API #6822

Draft

mofosyne added enhancement New feature or request Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels May 9, 2024

ngxson mentioned this pull request Jun 22, 2024

Add chat template support for llama-cli #8068

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `llama_chat_apply_template` in `main` (WIP) #6810

Use `llama_chat_apply_template` in `main` (WIP) #6810

ngxson commented Apr 21, 2024 •

edited

Loading

Use llama_chat_apply_template in main (WIP) #6810

Are you sure you want to change the base?

Use llama_chat_apply_template in main (WIP) #6810

Conversation

ngxson commented Apr 21, 2024 • edited Loading

Use `llama_chat_apply_template` in `main` (WIP) #6810

Use `llama_chat_apply_template` in `main` (WIP) #6810

ngxson commented Apr 21, 2024 •

edited

Loading