-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CausalLM: Llama + vocab.json BPE tokenizer = error loading model: cannot find tokenizer merges in model file
#3732
Comments
Looks like an old style (i.e. using slow tokenizers) model to me. Edit: funny, didn't find a mention of |
@TheBloke Assuming you have
and a import json
with open('tokenizer.json', 'r') as fp:
tokenizer = json.load(fp)
merges = []
with open('merges.txt', 'r') as mfp:
firstline = next(mfp).strip()
if not firstline.startswith('#version:'):
merges.append(firstline)
for l in mfp:
l = l.strip()
if len(l) > 0:
merges.append(l)
tokenizer['merges'] = merges
with open('tokenizer.json.new', 'w') as outfp:
json.dump(tokenizer, outfp, indent = 4) It'll open |
It works, but the |
I don't have a HF account so I can't look at it myself. I guess TB could try just trimming that last line then? Or change From what I recall, before GGUF we didn't even add the merges at all so it'll probably be okay. What are the odds that one merge is the super important one? (With my luck...) quick edit: Even if it seems to work, probably a bad idea to leave it as is though. I assume that if it doesn't just crash/detect an error then it's going to work like "blah" merges with empty string, which might actually have an effect. |
The tokenizer is the same as Qwen Models, they use a tiktoken, and this GPT2FastTokenizer is converted from their vocab. Their CPP tiktoken implement: https://github.com/QwenLM/qwen.cpp/tree/master/tiktoken_cpp And their tiktoken vocab: https://huggingface.co/Qwen/Qwen-7B/blob/main/qwen.tiktoken The converted GPT2 style tokenizer from: https://huggingface.co/JosephusCheung/Qwen-LLaMAfied-7B-Chat/tree/main But I am still confused, what makes it different from those working BPE tokenized models? |
For the purposes of converting to GGUF in BPE mode, the difference is that it (apparently) doesn't have the merges in a Also the original Qwen as far as I know isn't included in the category "already working BPE tokenized models", there are still some issues open requesting Qwen support. So after fixing/working around this issue there definitely may be more to deal with. |
Please test #3743 and see if you can create a functional model. You'll need to use |
Testing now, thanks. Love --padvocab, that's awesome thanks! |
Working, thank you! Great work.
re-uploading 14B and 7B quants now |
7B and 14B quants are tested and re-uploaded |
Can't use with text generation webui currently. llama-cpp-python may need a upgrade.
|
If this model is to be supported can we have a tokenizer test, please? |
Have error on CUDA GPU:
Not when prompt processed, but in first maeeage processing |
CUDA GPU Error |
Hello, how did you make a 14B gguf file that works properly? I used [python "D:\llama.cpp\convert.py" "D: \14B" - -padvocab], but the converted 14B file could not answer correctly, answer confusion, and output scrambled code. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Hi guys
A coupe of new and interesting models dropped today:
These are a merge of Qwen + Llama in Llama architecture, but with a vobab.json + merges.txt GPT2 tokenizer, with a vocab size exceeding 150,000.
I was able to make an FP16 with two extra steps:
<dummyXXX>
tokens to added_tokens.json../convert.py --vocabtype bpe --outtype fp16 /path/to/causallm_14b/source /path/to/gguf/causallm_14b.fp16.gguf
This seemed to produce a valid FP16, from which I made quants as normal. For 14B I could only make old-style quants, as many of the tensors are not 256-divisible. For 7B I could make k-quants.
Unfortunately, the resulting files are not usable with llama.cpp, giving this error:
Did I do anything wrong? Or is this a bug?
Full log of attempting to run inference on one of the 7B k-quants:
The text was updated successfully, but these errors were encountered: