Added the fact that llama.cpp supports Mistral AI release 0.1 #3362

paschembri · 2023-09-27T12:36:47Z

Mistral AI v0.1 model works out of the box once converted with the script convert.py.

slaren · 2023-09-27T12:45:31Z

Are there any details available about this model? All I could find about this release is a link to a torrent.

paschembri · 2023-09-27T12:49:38Z

Inspecting the tokenizer model, there is evidence indicated a training dataset of 8T tokens (/mnt/test/datasets/tokenizer_training/8T_train_data/shuffled.txt)

The convert script worked and I am currently evaluating the model...

slaren · 2023-09-27T13:05:29Z

F16 ppl looks good for a 7B model.
Final estimate: PPL = 5.6918 +/- 0.03191

Some generation

[I believe the meaning of life is] simple, just like that “Life” episode of “The Twilight Zone”, with William Shatner. But sometimes it’s also easy to forget, and in those times a reminder from something or someone else can be very welcome.

Sometimes those reminders are small things that don’t seem important at all, but later on become more meaningful than one would have thought. Other times they are big, life-altering events. But as long as you get to see the beauty in them when they happen, they will help you live your best possible life.

Here is a list of reminders that I’ve had. Some may seem silly or irrelevant, but they’re all important and meaningful for me. These are my 50 things that make life worth living:

To be able to feel the love of your family and your friends
To love someone so much that their happiness means more than your own
The simple joy of being happy
Being able to live without fear in your mind, heart or soul
A beautiful song that makes you want to cry because it is so gorgeous and inspiring
A really good book that gets inside your head and refuses to let go
Dancing on a rainy night
To be able to say “I’m sorry” when you’ve done something wrong, even if no one else will know
Watching the sunset with someone who understands why sunsets are so important
A beautiful sunrise that reminds you of how much better your life could get right now
The feeling of finally falling asleep after a bad day or week
Falling in love for the first time, and realizing that this is what they really meant when they said “love at first sight”
Hugging someone who needs it most
Holding your child’s hand as they walk across the street for the first time without fear or apprehension
Feeling like you’re part of something bigger than yourself
Knowing that no matter what happens in your life, there will always be people who care about you and support you
Watching someone else be happy when it seems impossible for them to find happiness anywhere else
A song or poem that gives hope where there was none before
Knowing that even though life isn’t fair sometimes, at least some part of it is working out okay for me right now (at least most of the time)
Making someone laugh when they need it most—whether because you made them laugh or just by being there with them through their tears and pain
Realizing that even though things might not go according to plan sometimes, life still has its moments of beauty and joy
Feeling like anything is possible if only we believe hard enough in ourselves and what we can do together with others around us—no matter how small our dreams may seem at first glance [end of text]

slaren · 2023-09-27T13:15:50Z

param.json:

{
    "dim": 4096,
    "n_layers": 32,
    "head_dim": 128,
    "hidden_dim": 14336,
    "n_heads": 32,
    "n_kv_heads": 8,
    "norm_eps": 1e-05,
    "sliding_window": 4096,
    "vocab_size": 32000
}

Looks like it uses sliding_window as the context length. convert.py may need to be updated. This may also be the first 7B model to use GQA.

jxy · 2023-09-27T15:56:50Z

Does sliding window attention actually work here, or it really only works with 4096 context length with llama.cpp? What happens if we set context length to 8192?

paschembri · 2023-09-27T16:19:49Z

I did test before they released the model card on HF.

I'll try that

TheBloke · 2023-09-27T16:21:26Z

Currently convert.py is failing for me on the vocab - doesn't like that it's adding tokens 0, 1 and 2 in added_tokens.json. Haven't got as far as actually reading the model files

If anyone has converted this successfully, how did you make the fp16?

Oh never mind, I just deleted the added_tokens.json duh :)

paschembri · 2023-09-27T16:28:39Z

Setting the context size to 8k actually works.

I got the model (a q6_K version) to perform a summary and the results are promising

slaren · 2023-09-27T16:29:14Z

@TheBloke I just converted from the pth file in the torrent. There is no added_tokens.json there.

TheBloke · 2023-09-27T16:31:40Z

Ah OK fair enough, I've been using the official release from https://huggingface.co/mistralai/Mistral-7B-v0.1, which is in HF format and they added an added_tokens.json but I don't think they quite understand what it's for, because they've added the special tokens, which are already listed in tokenizer.json and tokenizer.model

Anyway my quants are up here and seem to work fine: https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF

TheBloke · 2023-09-27T16:35:58Z

Actually no my quants don't work fine! I needed that permute fix. Re-making now

paschembri · 2023-09-27T16:36:30Z

Ah OK fair enough, I've been using the official release from https://huggingface.co/mistralai/Mistral-7B-v0.1, which is in HF format and they added an added_tokens.json but I don't think they quite understand what it's for, because they've added the special tokens, which are already listed in tokenizer.json and tokenizer.model

Anyway my quants are up here and seem to work fine: https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF

You have to tell us how you can upload this fast to HF. For me it took forever !

TheBloke · 2023-09-27T16:42:55Z

OK all my quants are remade and re-uploaded and are working fine now.

system_info: n_threads = 15 / 30 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 0


 The quick brown fox jumped over the lazy dog.

If you’re a writer who’s been looking for a place to publish, you may have seen this sentence somewhere in the fine print of an online magazine’s submission guidelines. It may also be found on writing sites as an example of how to use the various characters (letters and punctuation) available on your keyboard.

This classic example is sometimes called the “typewriter test.”  But nowadays, it’s a bit of a misnomer. The sentence looks like gibberish even if you copy-and-paste it into an email and send it to yourself.

The problem lies with the letter “J,” which is often mistakenly identified as a lowercase “L” by software programs, including Microsoft Word (which tends to have issues with all of the letters that look like each other).  The issue is not limited to just lowercase J and L; capital I and lower case l are also prone to being confused.

There’s an easy fix, though: simply replace the uppercase J in “dog” with a lowercase j (or vice versa) and you can test that your email program is picking up all 26 letters.

Here’s another example of what we’re talking about: [end of text]

You have to tell us how you can upload this fast to HF. For me it took forever !

10Gbit internet! :) I don't always have it sadly, but when only making GGUFs for a repo I use a Lambda Labs instance with beautiful 10GBit network - my record speed transferring to HF is 950MB/s 🤣

Considering that sliding window attention is not implemented, this shouldn't be added yet.

netrunnereve · 2023-09-27T17:21:29Z

Are there any details available about this model? All I could find about this release is a link to a torrent.

They just produced a press release. It's a 7B model that apparently performs like LLaMA 2 13B and is under an Apache 2 license.

paschembri · 2023-09-27T17:47:09Z

OK all my quants are remade and re-uploaded and are working fine now.

system_info: n_threads = 15 / 30 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 0


 The quick brown fox jumped over the lazy dog.

If you’re a writer who’s been looking for a place to publish, you may have seen this sentence somewhere in the fine print of an online magazine’s submission guidelines. It may also be found on writing sites as an example of how to use the various characters (letters and punctuation) available on your keyboard.

This classic example is sometimes called the “typewriter test.”  But nowadays, it’s a bit of a misnomer. The sentence looks like gibberish even if you copy-and-paste it into an email and send it to yourself.

The problem lies with the letter “J,” which is often mistakenly identified as a lowercase “L” by software programs, including Microsoft Word (which tends to have issues with all of the letters that look like each other).  The issue is not limited to just lowercase J and L; capital I and lower case l are also prone to being confused.

There’s an easy fix, though: simply replace the uppercase J in “dog” with a lowercase j (or vice versa) and you can test that your email program is picking up all 26 letters.

Here’s another example of what we’re talking about: [end of text]

You have to tell us how you can upload this fast to HF. For me it took forever !

10Gbit internet! :) I don't always have it sadly, but when only making GGUFs for a repo I use a Lambda Labs instance with beautiful 10GBit network - my record speed transferring to HF is 950MB/s 🤣

They released the instruct model. I tried quantize but all I got is gibberish ... I'll try again (with the fix you mentioned)

EDIT: that was it (the fix)

TheBloke · 2023-09-27T18:21:46Z

Yeah Instruct is working well for me. Q5_K_M:

system_info: n_threads = 15 / 30 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 0


 <s>[INST]Write a story about llamas [/INST] Once upon a time, high in the Andes Mountains of Peru, there lived a herd of llamas. They roamed freely on the vast green meadows, grazing on the lush grasses that grew there. The llamas were a happy and contented herd, enjoying their simple life in the mountains.

Despite their peaceful nature, however, the llamas were not without their challenges. For one thing, they had to contend with the many predators that lived in the Andes, including mountain lions, coyotes, and eagles. The llamas had to be always alert, ready to defend themselves and their young from harm.

In addition to predators, the llamas also had to deal with harsh weather conditions. The Andes Mountains can be cold and windy, especially at high altitudes. During the winter months, the llamas would huddle together for warmth, seeking shelter in the rocky crevices that offered protection from the elements.

Despite these challenges, the llama herd thrived. They were well adapted to life in the mountains, with strong legs and thick fleece that kept them warm in the cold. And they had each other for company, forming close bonds with their fellow llamas that helped them through the tough times.

As the years passed, the llama herd continued to grow and prosper. They were a proud and majestic sight to behold, roaming freely across the green meadows of the Andes Mountains. And so they lived, happy and contented, enjoying their simple life in the mountains. [end of text]

Dampfinchen · 2023-09-27T22:05:53Z

Does GQA work with it?

ggerganov

Sliding window will be tracked here: #3377

…example * 'master' of github.com:ggerganov/llama.cpp: ggml-cuda : perform cublas mat mul of quantized types as f16 (ggerganov#3412) llama.cpp : add documentation about rope_freq_base and scale values (ggerganov#3401) train : fix KQ_pos allocation (ggerganov#3392) llama : quantize up to 31% faster on Linux and Windows with mmap (ggerganov#3206) readme : update hot topics + model links (ggerganov#3399) readme : add link to grammars app (ggerganov#3388) swift : fix build on xcode 15 (ggerganov#3387) build : enable more non-default compiler warnings (ggerganov#3200) ggml_tensor: update the structure comments. (ggerganov#3283) ggml : release the requested thread pool resource (ggerganov#3292) llama.cpp : split llama_context_params into model and context params (ggerganov#3301) ci : multithreaded builds (ggerganov#3311) train : finetune LORA (ggerganov#2632) gguf : basic type checking in gguf_get_* (ggerganov#3346) gguf : make token scores and types optional (ggerganov#3347) ci : disable freeBSD builds due to lack of VMs (ggerganov#3381) llama : custom attention mask + parallel decoding + no context swaps (ggerganov#3228) docs : mark code as Bash (ggerganov#3375) readme : add Mistral AI release 0.1 (ggerganov#3362) ggml-cuda : perform cublas fp16 matrix multiplication as fp16 (ggerganov#3370)

Added the fact that llama.cpp supports Mistral AI release 0.1

724534e

slaren previously approved these changes Sep 27, 2023

View reviewed changes

slaren mentioned this pull request Sep 27, 2023

remove bug in convert.py permute function #3364

Merged

ggerganov approved these changes Sep 28, 2023

View reviewed changes

ggerganov merged commit 4aea3b8 into ggerganov:master Sep 28, 2023
10 checks passed

drasticactions mentioned this pull request Sep 29, 2023

Mistral-7B-v0.1-GGUF #3391

Closed

yusiwen pushed a commit to yusiwen/llama.cpp that referenced this pull request Oct 7, 2023

readme : add Mistral AI release 0.1 (ggerganov#3362)

836d87a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added the fact that llama.cpp supports Mistral AI release 0.1 #3362

Added the fact that llama.cpp supports Mistral AI release 0.1 #3362

paschembri commented Sep 27, 2023

slaren commented Sep 27, 2023

paschembri commented Sep 27, 2023

slaren commented Sep 27, 2023 •

edited

Loading

slaren commented Sep 27, 2023 •

edited

Loading

jxy commented Sep 27, 2023

paschembri commented Sep 27, 2023

TheBloke commented Sep 27, 2023 •

edited

Loading

paschembri commented Sep 27, 2023

slaren commented Sep 27, 2023

TheBloke commented Sep 27, 2023 •

edited

Loading

TheBloke commented Sep 27, 2023

paschembri commented Sep 27, 2023

TheBloke commented Sep 27, 2023 •

edited

Loading

netrunnereve commented Sep 27, 2023

paschembri commented Sep 27, 2023 •

edited

Loading

TheBloke commented Sep 27, 2023 •

edited

Loading

Dampfinchen commented Sep 27, 2023

ggerganov left a comment

Added the fact that llama.cpp supports Mistral AI release 0.1 #3362

Added the fact that llama.cpp supports Mistral AI release 0.1 #3362

Conversation

paschembri commented Sep 27, 2023

slaren commented Sep 27, 2023

paschembri commented Sep 27, 2023

slaren commented Sep 27, 2023 • edited Loading

slaren commented Sep 27, 2023 • edited Loading

jxy commented Sep 27, 2023

paschembri commented Sep 27, 2023

TheBloke commented Sep 27, 2023 • edited Loading

paschembri commented Sep 27, 2023

slaren commented Sep 27, 2023

TheBloke commented Sep 27, 2023 • edited Loading

TheBloke commented Sep 27, 2023

paschembri commented Sep 27, 2023

TheBloke commented Sep 27, 2023 • edited Loading

netrunnereve commented Sep 27, 2023

paschembri commented Sep 27, 2023 • edited Loading

TheBloke commented Sep 27, 2023 • edited Loading

Dampfinchen commented Sep 27, 2023

ggerganov left a comment

Choose a reason for hiding this comment

slaren commented Sep 27, 2023 •

edited

Loading

slaren commented Sep 27, 2023 •

edited

Loading

TheBloke commented Sep 27, 2023 •

edited

Loading

TheBloke commented Sep 27, 2023 •

edited

Loading

TheBloke commented Sep 27, 2023 •

edited

Loading

paschembri commented Sep 27, 2023 •

edited

Loading

TheBloke commented Sep 27, 2023 •

edited

Loading