Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use llama_chat_apply_template in main (WIP) #6810

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Apr 21, 2024

Resolve #6391

The core idea is to use llama_chat_apply_template to apply the template twice: with and without the last user message. Then, we find the diff between 2 output strings and finally feed it into inference.

Example:

<start_of_turn>user
You are a helpful assistant

Hello<end_of_turn>
<start_of_turn>model
Hi there<end_of_turn>
<start_of_turn>user
Who are you<end_of_turn>
<start_of_turn>model
I am an assistant<end_of_turn>
<start_of_turn>user
Another question<end_of_turn>
<start_of_turn>model

-----
chat_get_added_part(): <start_of_turn>user
Another question<end_of_turn>
<start_of_turn>model

This approach will require minimal effort to maintain the chat template infrastructure, while using the extract same logic for main and server (remind: server also have the notion of "prompt cache" which works the same way)

Having to re-format the whole chat history each time seems inefficient at first glance, but it is needed because:

Then, we find the diff between the 2 strings.

  • Implement chat_get_added_part to get the diff part with / without the last user message
  • main must keep track of the list of messages
  • Update arguments for main, deprecate -cml (but not remove it) while adding -chat-template argument

@mofosyne mofosyne added enhancement New feature or request Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement (properly) different chat templates in main.cpp
2 participants