Support for Salamandra models - Request for Comments #9822

elsatch · 2024-10-10T12:31:37Z

elsatch
Oct 10, 2024

New Salamandra models have been released by Barcelona Supercomputing Center, using Llama3 architecture and trained from scratch with open datasets. These models and the training details are available at https://github.com/langtech-bsc/salamandra/

These models are working out of the box when using Transfomers > 4.40.2 but fail to convert using the llama.cpp convert scripts as reported by @robbiemu at #9813 .

Using the sample code provided in the repo, we can see that the model is being loaded using LlamaForCausalLM module:

from datetime import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_id = "BSC-LT/salamandra-2b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
  )

print(type(model))

it returns: <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>

Checking for similar examples in current llama.cpp codebase, I've found that Granite models are also Llama derivatives but have created their own MODEL_ARCH to accomodate their changes.

Salamandra official paper is yet to be released but the configuration (and architectures are available already at the github repo both for 2B and 7B.

Salamandra 7B presents the following hyperparameters:

Layers: 32
Model dimension: 4096
FFN dimension: 11008
Attention heads: 32
Key / Value heads: 8 (inferred from num_query_groups: 8)
Peak Learning Rate: 3.0e-04
Activation function: fast-swiglu
Vocabulary Size: 256,000
Positional Embeddings: RoPE (rotary_base: 10000)

According to the Llama 3 paper, the architecture is for 8B is:

Layers: 32
Model dimension: 4096
FFN dimension: 14336
Attention heads: 32
Key / Value heads: 8
Peak Learning Rate: 3.0e-04
Activation function: SwiGLU
Vocabulary Size: 128.000
Positional Embeddings: RoPE (base frequency: 500,000)

Tokenizer in Llama 3 is based on tiktoken for 100.000 tokens plus 28.000 additional multiligual tokens. Salamandra uses a BPE sentencepiece based tokenizer with a vocabulary of 256.000.

Salamandra 2B presents the following hyperparameters:

Layers: 24
Model dimension: 2048
FFN dimension: 5440
Attention heads: 16
Key / Value heads: Not explicitly specified (num_query_groups is null)
Peak Learning Rate: 2.0e-04
Activation function: fast-swiglu
Vocabulary Size: 256.000
Positional Embeddings: RoPE (rotary_base: 10000)

Llama 3.2 1B, presents the following hyperparameters (based on https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/07_gpt_to_llama/standalone-llama32.ipynb)

Layers: 16
Model dimension 2048
FFN dimension 8192
Attention heads 32
Key / Value heads 8
Peak Learning Rate 2.0e-04
Activation function fast-swiglu
Vocabulary Size 128,256
Positional Embeddings RoPE (rotary_base: 50000)

And Llama 3.2 3B presents the following hyperparameters:

Layers: 28
Model dimension 3072
FFN dimension 8192
Attention heads 24
Key / Value heads 8
Peak Learning Rate 2.0e-04
Activation function fast-swiglu
Vocabulary Size 128,256
Positional Embeddings RoPE (rotary_base: 50000)

The 8/7B models look similar, yet with the new Llama 3.2 models and the incoming Salamandra 40B model, these differences might become bigger.

Main questions

Right now, we're unable to convert the model to GGUF and I assume this is due to the different hyperparameters not matching. Would it make sense to add a MODEL_ARCH for Salamandra?

I am opening this discussion after developing most of the code to add support to Salamandra in llama.cpp. But then I realized this change would also imply creating a SalamandraForCausalLLM in transformers library. What are your thoughts about it?

slaren · 2024-10-10T12:50:17Z

slaren
Oct 10, 2024
Collaborator

It happens when writing the general.language metadata. It tries to write this:

key: general.languages
val: GGUFValue(value=['bg', 'ca', 'code', 'cs', 'cy', 'da', 'de', 'el', 'en', 'es', 'et', 'eu', 'fi', 'fr', 'ga', 'gl', 'hr', 'hu', 'it', 'lt', 'lv', 'mt', 'nl', 'nn', False, 'oc', 'pl', 'pt', 'ro', 'ru', 'sh', 'sk', 'sl', 'sr', 'sv', 'uk'], type=<GGUFValueType.ARRAY: 9>)

One of the elements is False, which is a different type than the rest. This metadata is loaded from the model card in the README, and the no language is being incorrectly parsed as a boolean. @mofosyne can you take a look?

In the meantime, removing the no language from the README should allow you to convert the model.

10 replies

elsatch Oct 10, 2024
Author

This might be the most insane resolution in the story of... how to go from architectural changes, to adding a slash solving the root cause in minutes.

I will try to replicate it back home tonight and ping back, so they can update the yaml formatting.

Thanks for your help!

robbiemu Oct 10, 2024

Les dejé un comentario. Justo cuando estaba creando un fork para publicarles la solución, me di cuenta de que Marc Pármies ya había agregado la solución en el upstream.

mofosyne Oct 10, 2024
Collaborator

Ah you were struck by the Norway problem...

https://langdev.stackexchange.com/questions/1123/what-design-trade-offs-led-to-the-norway-problem-in-yaml-and-when-are-they-wo

That reminds me of https://hitchdev.com/strictyaml/why/implicit-typing-removed/

The author proposes that people should switch to StrictYAML which assumes all values are strings unless told otherwise via a schema.

How would you approach this Norway problem?

robbiemu Oct 10, 2024

I think requiring the yaml to be well-formed is okay. A backslash is all you need (TM).

mapama247 Oct 10, 2024

Thanks for noticing that! I have just updated the README files in all model cards :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Salamandra models - Request for Comments #9822

{{title}}

Replies: 1 comment 10 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Support for Salamandra models - Request for Comments #9822

elsatch Oct 10, 2024

Main questions

Replies: 1 comment · 10 replies

slaren Oct 10, 2024 Collaborator

elsatch Oct 10, 2024 Author

robbiemu Oct 10, 2024

mofosyne Oct 10, 2024 Collaborator

robbiemu Oct 10, 2024

mapama247 Oct 10, 2024

elsatch
Oct 10, 2024

Replies: 1 comment 10 replies

slaren
Oct 10, 2024
Collaborator

elsatch Oct 10, 2024
Author

mofosyne Oct 10, 2024
Collaborator