Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : support StableLM 2 1.6B #5052

Merged
merged 2 commits into from
Jan 22, 2024

Conversation

compilade
Copy link
Collaborator

@compilade compilade commented Jan 20, 2024

Stable LM 2 1.6B was recently released (see https://stability.ai/news/introducing-stable-lm-2). It's different enough from their older 3B model that it requires some changes in llama.cpp in order to work.

It's mostly the same model architecture as stablelm-3b-4e1t, but they seem to have added bias tensors (or whatever they are called) for Q, K, and V, so this is now also handled for the LLM_ARCH_STABLELM model type.

The tokenizer is also different from the stablelm-3b-4e1t; in StableLM 2, it is defined in the tiktoken format, in a very similar way than with the Qwen models.
To avoid unnecessary code duplication, I added _set_vocab_qwen to the Model class so that both Qwen and StableLM 2 could make their vocab in the same way.

In doing so, I noticed a bug in the previous implementation: all special tokens were named [PAD{id}]. This is because unlike in tokenizers.json, the special tokens for Qwen-style tokenizers are not a subset of the vocab. So special tokens could not be found in the reverse_vocab and were always named like padding tokens. Combining the added_vocab with the vocab when making the reverse_vocab fixes this. (this is not necessarily relevant for _set_vocab_gpt2, because in tokenizer.json, the vocab usually contains all tokens, including special ones)

In convert-hf-to-gguf.py, to know which kind of tokenizer to look for when converting a StableLMModel, I used the vocab size instead of something like the number of layers because Qwen-style tokenizers seem to have a lot more tokens than others, so it seems like a good enough heuristic for at least this specific case. A better way would perhaps be to check for the absence of tokenizer.json. (EDIT: now implemented this way (with the tokenizer.json presence check). It should have the same behavior as with the vocab size check (nothing in the actual conversion was changed, so resulting converted models are the same as before))

Oh, and since the tiktoken library is used when converting, I added it to the llama-python-extra package list in the nix package so that it's included when using a devShell like with nix develop .#default-extra.

Since I moved the code for Qwen's set_vocab, I recommend using git log -p --color-moved when reviewing this.

* convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}]

* convert : refactor Qwen's set_vocab to use it for StableLM 2 too

* nix : add tiktoken to llama-python-extra
@brittlewis12
Copy link

brittlewis12 commented Jan 21, 2024

great work! 🙌 worked great for me, was able to generate a full suite of k-quants + 8_0 & fp16, on huggingface!

fp16 conversion output
Loading model: stablelm-2-zephyr-1_6b
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
gguf: Adding 100000 merge(s).
gguf: Setting special token type bos to 100257
gguf: Setting special token type eos to 100257
gguf: Setting special token type unk to 100257
gguf: Setting chat_template to {% for message in messages %}
{% if message['role'] == 'user' %}
{{ '<|user|>
' + message['content'] + eos_token }}
{% elif message['role'] == 'system' %}
{{ '<|system|>
' + message['content'] + eos_token }}
{% elif message['role'] == 'assistant' %}
{{ '<|assistant|>
'  + message['content'] + eos_token }}
{% endif %}
{% if loop.last and add_generation_prompt %}
{{ '<|assistant|>' }}
{% endif %}
{% endfor %}
Exporting model to 'stablelm-2-zephyr-1_6b/stablelm-2-zephyr-1_6b.fp16.gguf'
gguf: loading model part 'model.safetensors'
output.weight, n_dims = 2, torch.float16 --> float16
token_embd.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.0.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.0.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.0.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.0.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.0.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.0.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.0.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.0.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.0.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.0.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.1.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.1.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.1.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.1.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.1.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.1.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.1.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.1.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.1.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.1.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.10.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.10.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.10.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.10.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.10.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.10.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.10.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.10.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.10.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.10.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.11.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.11.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.11.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.11.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.11.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.11.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.11.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.11.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.11.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.11.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.12.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.12.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.12.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.12.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.12.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.12.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.12.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.12.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.12.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.12.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.13.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.13.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.13.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.13.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.13.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.13.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.13.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.13.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.13.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.13.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.14.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.14.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.14.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.14.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.14.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.14.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.14.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.14.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.14.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.14.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.15.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.15.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.15.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.15.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.15.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.15.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.15.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.15.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.15.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.15.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.16.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.16.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.16.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.16.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.16.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.16.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.16.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.16.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.16.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.16.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.17.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.17.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.17.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.17.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.17.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.17.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.17.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.17.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.17.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.17.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.18.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.18.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.18.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.18.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.18.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.18.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.18.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.18.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.18.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.18.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.19.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.19.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.19.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.19.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.19.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.19.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.19.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.19.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.19.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.19.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.2.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.2.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.2.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.2.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.2.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.2.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.2.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.2.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.2.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.2.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.20.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.20.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.20.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.20.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.20.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.20.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.20.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.20.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.20.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.20.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.21.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.21.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.21.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.21.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.21.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.21.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.21.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.21.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.21.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.21.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.22.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.22.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.22.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.22.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.22.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.22.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.22.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.22.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.22.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.22.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.23.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.23.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.23.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.23.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.23.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.23.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.23.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.23.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.23.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.23.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.3.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.3.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.3.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.3.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.3.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.3.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.3.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.3.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.3.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.3.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.4.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.4.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.4.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.4.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.4.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.4.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.4.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.4.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.4.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.4.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.5.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.5.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.5.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.5.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.5.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.5.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.5.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.5.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.5.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.5.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.6.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.6.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.6.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.6.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.6.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.6.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.6.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.6.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.6.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.6.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.7.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.7.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.7.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.7.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.7.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.7.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.7.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.7.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.7.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.7.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.8.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.8.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.8.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.8.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.8.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.8.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.8.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.8.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.8.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.8.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.9.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.9.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.9.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.9.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.9.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.9.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.9.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.9.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.9.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.9.attn_v.weight, n_dims = 2, torch.float16 --> float16
output_norm.bias, n_dims = 1, torch.float16 --> float32
output_norm.weight, n_dims = 1, torch.float16 --> float32
Model successfully exported to 'stablelm-2-zephyr-1_6b/stablelm-2-zephyr-1_6b.fp16.gguf'

ran conversions on colab


Separately, does it make sense to add tiktoken to requirements/requirements-convert.txt in this case?

I’m not sure what the typical approach is for model-specific dependencies like this, but it would seem if this is a new requirement for model conversion, perhaps it should be declared here. Or maybe a new file like persimmon?

thanks again!

@cebtenzzre
Copy link
Collaborator

Separately, does it make sense to add tiktoken to requirements/requirements-convert.txt in this case?

I’m not sure what the typical approach is for model-specific dependencies like this, but it would seem if this is a new requirement for model conversion, perhaps it should be declared here. Or maybe a new file like persimmon?

I think dependencies should only be added to requirements.txt if they are unconditionally required - conditional requirements should simply throw a clear exception if they are needed but not found. And persimmon is only a separate file because it wasn't working when the convert scripts were merged; new code should go in convert-hf-to-gguf.py.

…zer loader

It's a less arbitrary heuristic than the vocab size.
@compilade
Copy link
Collaborator Author

I think dependencies should only be added to requirements.txt if they are unconditionally required - conditional requirements should simply throw a clear exception if they are needed but not found.

Agreed, and it already throws an helpful exception when tiktoken is not installed (thanks to transformers which checks the imports) :
ImportError: This modeling file requires the following packages that were not found in your environment: tiktoken. Run `pip install tiktoken`

And to be clear, I added the tiktoken package to the *-extra devShells because with nix, the only way to add a package to a Python environment is to rebuild that environment with the new package, unlike with venv where a simple pip install tiktoken is doable when an error is encountered.

Running nix shell nixpkgs-unstable#python3Packages.tiktoken does not make it available to Python; a Python package has to be included when the Python environment is built (with python3.withPackages as in llama-python-extra).

I assume the *-extra devShells (which include the llama-python-extra Python environment) are for Nix users who want all possibly required dependencies for the convert scripts, or else they would be using the leaner llama-python environment. At least, that's how I use it.

@ggerganov ggerganov merged commit d6bd4d4 into ggerganov:master Jan 22, 2024
40 of 44 checks passed
@Green-Sky
Copy link
Collaborator

Green-Sky commented Jan 22, 2024

I assume the *-extra devShells (which include the llama-python-extra Python environment) are for Nix users who want all possibly required dependencies for the convert scripts, or else they would be using the leaner llama-python environment. At least, that's how I use it.

Yes, that is why it exists. Originally transformertorch also pulled in cuda etc, so it was very heavy.

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024
* llama : support StableLM 2 1.6B

* convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}]

* convert : refactor Qwen's set_vocab to use it for StableLM 2 too

* nix : add tiktoken to llama-python-extra

* convert : use presence of tokenizer.json to determine StableLM tokenizer loader

It's a less arbitrary heuristic than the vocab size.
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* llama : support StableLM 2 1.6B

* convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}]

* convert : refactor Qwen's set_vocab to use it for StableLM 2 too

* nix : add tiktoken to llama-python-extra

* convert : use presence of tokenizer.json to determine StableLM tokenizer loader

It's a less arbitrary heuristic than the vocab size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants