Replies: 1 comment 10 replies
-
It happens when writing the
One of the elements is In the meantime, removing the |
Beta Was this translation helpful? Give feedback.
10 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
New Salamandra models have been released by Barcelona Supercomputing Center, using Llama3 architecture and trained from scratch with open datasets. These models and the training details are available at https://github.com/langtech-bsc/salamandra/
These models are working out of the box when using Transfomers > 4.40.2 but fail to convert using the llama.cpp convert scripts as reported by @robbiemu at #9813 .
Using the sample code provided in the repo, we can see that the model is being loaded using LlamaForCausalLM module:
it returns:
<class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>
Checking for similar examples in current llama.cpp codebase, I've found that Granite models are also Llama derivatives but have created their own MODEL_ARCH to accomodate their changes.
Salamandra official paper is yet to be released but the configuration (and architectures are available already at the github repo both for 2B and 7B.
Salamandra 7B presents the following hyperparameters:
According to the Llama 3 paper, the architecture is for 8B is:
Tokenizer in Llama 3 is based on tiktoken for 100.000 tokens plus 28.000 additional multiligual tokens. Salamandra uses a BPE sentencepiece based tokenizer with a vocabulary of 256.000.
Salamandra 2B presents the following hyperparameters:
Llama 3.2 1B, presents the following hyperparameters (based on https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/07_gpt_to_llama/standalone-llama32.ipynb)
And Llama 3.2 3B presents the following hyperparameters:
The 8/7B models look similar, yet with the new Llama 3.2 models and the incoming Salamandra 40B model, these differences might become bigger.
Main questions
Right now, we're unable to convert the model to GGUF and I assume this is due to the different hyperparameters not matching. Would it make sense to add a MODEL_ARCH for Salamandra?
I am opening this discussion after developing most of the code to add support to Salamandra in llama.cpp. But then I realized this change would also imply creating a SalamandraForCausalLLM in transformers library. What are your thoughts about it?
Beta Was this translation helpful? Give feedback.
All reactions