You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The bug
When trying to load a Llama.cpp model with a chat template, I run into errors that the given template cannot be loaded. When I investigated a bit, I see that the keys in the CHAT_TEMPLATE_CACHE dictionary are the templates themselves. This does not seem right to me.
To Reproduce
I think this should work, and should not give an error:
However, I get this error: UserWarning: Chat template ChatML was unable to be loaded directly into guidance. and the model then loads the default ChatML template anyway.
When I look in the template cache, I see key:value pairs that look like this:
When I tried changing the key used in the cache to "ChatML" the first option worked without error.
So, unless I'm missing something I think the keys to the chat template cache should be the names of the template, not the template themselves. Its a tiny change, but I'm happy to do it and submit a PR if its useful.
System info (please complete the following information):
MacOS Sonoma 14.5
Guidance version 0.1.15
The text was updated successfully, but these errors were encountered:
Good suggestion! We started having the primary keys here go down this path, but quickly found that even "standard" chat templates like ChatML actually come in many variants (typically small details like amount of whitespace. Furthermore, even within a model family, there are often frequent updates to these templates that invalidate the cache we build on the guidance side (e.g. Mistral, Phi, llama, etc. all have made chat template updates several times between model releases).
I wonder if the best solution here is a bit of a hybrid, where perhaps we provide some duplicate aliases to the ChatTemplateCache (e.g. where "ChatML" also points to one of the hand implemented ones), but don't use it as a default for any particular model. That way people can still intuitively pass the parameter in the way you described, but that we still appropriately show a warning if our cache has gone stale thanks to a model update on Huggingface/Llama.cpp. Curious about your thoughts and would definitely welcome a PR!
The bug
When trying to load a Llama.cpp model with a chat template, I run into errors that the given template cannot be loaded. When I investigated a bit, I see that the keys in the CHAT_TEMPLATE_CACHE dictionary are the templates themselves. This does not seem right to me.
To Reproduce
I think this should work, and should not give an error:
However, I get this error:
UserWarning: Chat template ChatML was unable to be loaded directly into guidance.
and the model then loads the default ChatML template anyway.When I look in the template cache, I see key:value pairs that look like this:
When I tried changing the key used in the cache to
"ChatML"
the first option worked without error.So, unless I'm missing something I think the keys to the chat template cache should be the names of the template, not the template themselves. Its a tiny change, but I'm happy to do it and submit a PR if its useful.
System info (please complete the following information):
MacOS Sonoma 14.5
Guidance version 0.1.15
The text was updated successfully, but these errors were encountered: