Models without Vocabulary #5798

Xarbirus · 2024-02-29T14:37:52Z

I made some changes to the model converter so that it could create a gguf model without a built-in dictionary.
This will allow to use any custom external dictionary in an application built with llama.cpp.

llama.h

convert.py

cebtenzzre · 2024-02-29T17:11:09Z

Do you have a more specific example of a use case for this feature - e.g., a model with a vocab type not currently supported by llama.cpp, but with weights that are?

ggerganov

This seems something that can be useful

Xarbirus · 2024-03-04T15:44:41Z

@cebtenzzre right now we're using some kind of this tokenizer with the llama model trained by our ml engineers. And in our system the vocab is on the client side, and the server only processes tokens. So there is no need for the vocab to be included in the model.

convert.py

llama.h

convert.py

* additional methods to read model and ctx parameters * vocab size as a part of a model metadata * models without vocabulary, convert.py part * models without vocabulary, llama.cpp part * PR clean up * converter scrypt fixes * llama_vocab_type update (renamed the new key) * pr review fixes * revert function renaming * one more NoVocab assert

cebtenzzre reviewed Feb 29, 2024

View reviewed changes

llama.h Outdated Show resolved Hide resolved

cebtenzzre reviewed Feb 29, 2024

View reviewed changes

convert.py Outdated Show resolved Hide resolved

ggerganov approved these changes Mar 1, 2024

View reviewed changes

ggerganov requested a review from cebtenzzre March 1, 2024 08:54

dranger003 mentioned this pull request Mar 2, 2024

Unable to convert Smaug 72B #5807

Closed

Xarbirus force-pushed the models-without-vocab branch from 735c684 to 2580fe5 Compare March 4, 2024 16:09

Xarbirus added 5 commits March 7, 2024 14:59

additional methods to read model and ctx parameters

e700b44

vocab size as a part of a model metadata

cc3fe18

models without vocabulary, convert.py part

4f4258f

models without vocabulary, llama.cpp part

afa9d09

PR clean up

e0504d5

Xarbirus force-pushed the models-without-vocab branch from 2580fe5 to e0504d5 Compare March 7, 2024 14:11

cebtenzzre reviewed Mar 8, 2024

View reviewed changes

convert.py Outdated Show resolved Hide resolved

convert.py Outdated Show resolved Hide resolved

llama.h Outdated Show resolved Hide resolved

Xarbirus added 2 commits March 10, 2024 18:28

converter scrypt fixes

0c69016

llama_vocab_type update (renamed the new key)

80f66a8

cebtenzzre reviewed Mar 12, 2024

View reviewed changes

convert.py Outdated Show resolved Hide resolved

cebtenzzre reviewed Mar 12, 2024

View reviewed changes

convert.py Outdated Show resolved Hide resolved

convert.py Outdated Show resolved Hide resolved

Xarbirus added 2 commits March 13, 2024 11:44

pr review fixes

0a1322a

revert function renaming

94a1050

cebtenzzre reviewed Mar 14, 2024

View reviewed changes

convert.py Show resolved Hide resolved

one more NoVocab assert

9cb1554

cebtenzzre approved these changes Mar 14, 2024

View reviewed changes

ggerganov merged commit 69ff613 into ggerganov:master Mar 14, 2024
54 of 62 checks passed

Xarbirus deleted the models-without-vocab branch April 17, 2024 10:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models without Vocabulary #5798

Models without Vocabulary #5798

Xarbirus commented Feb 29, 2024

cebtenzzre commented Feb 29, 2024

ggerganov left a comment

Xarbirus commented Mar 4, 2024

Models without Vocabulary #5798

Models without Vocabulary #5798

Conversation

Xarbirus commented Feb 29, 2024

cebtenzzre commented Feb 29, 2024

ggerganov left a comment

Choose a reason for hiding this comment

Xarbirus commented Mar 4, 2024