Try to fix Baichuan2 models by using vocab size in config.json #3299

KerfuffleV2 · 2023-09-21T15:54:22Z

Use local GGUF package when possible in Baichuan converter

This basically just uses the same approach as #2914

I tested converting https://huggingface.co/baichuan-inc/Baichuan2-7B-Base - seems to work fine now. I don't know if this breaks Baichuan1. One would assume one can't go wrong using the vocab size in the config, but who knows?

While I was in the neighborhood I updated the gguf import to look for the local version like the other scripts.

Hopefully fixes #3270

Use local GGUF package when possible in Baichuan converter

akawrykow · 2023-09-21T18:55:17Z

Seems reasonable to me - same thing we did in #2914

KerfuffleV2 · 2023-09-27T09:20:56Z

This pull does fix the vocab issue, but unfortunately it's not enough to get reasonable results from the 13B model. Also convert.py works for converting it, except for setting the architecture and looking for the correct context length key. So it may make more sense to update convert.py rather than fixing the Baichuan-specific conversion script (which could just be removed).

Anyway, this pull is better than the status quo but may not be the best approach to solving the issue. I still don't know what the issue with Baichuan2 13B is, I suspect it may be something like variations in the ALiBi operation it wants.

ggerganov · 2023-09-30T20:28:41Z

Is Baichuan2 13B different than the Baichuan 13B that we added support for some time ago?
Also, have you tried running the Baichuan 13B that was initially supported, after merging #3228?
At first, I thought that #3228 would break support, but now I think it should actually still work correctly and looking to verify

KerfuffleV2 · 2023-10-01T13:26:44Z

Is Baichuan2 13B different than the Baichuan 13B that we added support for some time ago?

Well, there's this: https://github.com/baichuan-inc/Baichuan2/blob/main/README_EN.md#migrating-inference-optimizations-from-baichuan-1-to-baichuan-2

So one difference is lm_head isn't already normalized. Doing that didn't seem to make a difference for the issues I mentioned.

Also, have you tried running the Baichuan 13B that was initially supported, after merging #3228?

You mean Baichuan1 13B? I haven't tried any Baichuan1 models so far. I'll try to check that later today.

ggerganov · 2023-10-01T14:04:19Z

You mean Baichuan1 13B?

Yes, support for Baichuan 13B was added in #3009 and allegedly it was working, though I haven't tried it.

KerfuffleV2 · 2023-10-02T06:43:30Z

@ggerganov Sorry it's a bit late, but I got a chance to test Baichuan1 13B. Unfortunately, it seems like neither Baichuan1 13B or Baichuan2 13B work at all currently. It just immediately hits an assert and dies:

llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =  400.00 MB
llama_new_context_with_model: compute buffer total size = 262.38 MB
llama_new_context_with_model: VRAM scratch buffer: 256.50 MB
llama_new_context_with_model: total VRAM used: 6006.66 MB (model: 5750.16 MB, context: 256.50 MB)
GGML_ASSERT: ggml.c:12913: ne1 + n_past == ne0

Seems to happen when prompt ingestion starts. Exactly the same error for both Baichuan1 and Baichuan2. Probably an issue with how the Baichuan graph is set up in llama.cpp?

I didn't test with the 7B models again, I'd have to redownload and convert it. It's likely this particular issue would affect them though. Previously the 7B Baichuan2 seemed to work perfectly.

(Note: I converted the Baichuan1 model using the conversion script in master, don't think this is a conversion issue though.)

ggerganov · 2023-10-03T17:04:46Z

@KerfuffleV2 Try to just delete the assert on ggml.c:12913 and see if it works. It was deleted in #3329 as well and the alibi seems to be working

KerfuffleV2 · 2023-10-03T18:30:19Z

Try to just delete the assert on ggml.c:12913 and see if it works.

It seems to run with that change. The output (like Baichuan2 13B) is very repetitive though:

$ ./main -m /blah/baichuan1-13b.gguf -p 'Once upon a time there was a little fox' -ngl 18 --ignore-eos --temp 0.0
[...]
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 Once upon a time there was a little fox.
The Fox said, "I am the best of all my kind!" The other animals were not so sure about this and they didnt think that he could be as good at being clever or wise like themselves but when it came to running fast across fields in order to catch rabbits then there was no doubt.
The Fox said, "I am the best of all my kind!" The other animals were not so sure about this and they didnt think that he could be as good at being clever or wise like themselves but when it came to running fast across fields in order to catch rabbits then there was no doubt.
The Fox said, "I am the best of all my kind!" The other animals were not so sure about this and they didnt think that he could be as good at being clever or wise like themselves but when it came to running fast across fields in order to catch rabbits then there was no doubt.
The Fox said, "I am the best of all my kind!" The other animals were not so sure about this and they didnt

Or with Chinese:

$ ./main -m /blah/baichuan1-13b.gguf -p '从前有一只小狐狸，他' -ngl 18 --ignore-eos --temp 0.0
[...]
 从前有一只小狐狸，他长大了,长高了。我长高了,变大了。我的身体变得更大更重。我的体重和身高都在增加。我的体重在增加,我的身高也在增长。我的体重在增加,我的身高也在增长。我的体重在增加,我的身高也在增长。我的体重在增加,我的身高也在增长。

It's basically just repeating "My weight increases.", "My size increases", "My weight is increasing", "My size is increasing". Baichuan2 13B is worse, same prompt it just outputs "Once upon a time there was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who" or "从前有一只小狐狸，他，他，他，他，他，他，他，他，他，他，他，他，他，他，他，他，他，他，他，他，他，他，他，他，他". (I don't think this is worse than before, I just mean it seems worse than Baichuan1 13B.)

If Baichuan1 13B behaved the same before the recent changes and people were actually using it to good effect... Well all I can say is they appear to know something I don't!

ggerganov · 2023-10-03T18:34:24Z

I think the repetition is normal for --temp 0.0 and such short prompt. Best thing would be to just run a perplexity on wiki text and make sure it is some reasonable number - i.e. less than 10 for example

KerfuffleV2 · 2023-10-03T19:39:26Z

I guess it's fine.

Baichuan1 13B Q4_K_M:

[1]7.8807,[2]10.3365,[3]11.3285,[4]12.7045,[5]13.3817,[6]13.1125,[7]13.6777,[8]13.8214,[9]14.4119,[10]14.7038,[11]15.0939,[12]15.0929,[13]14.8181,[14]14.8787,[15]15.5579

Baichuan2 13B Q6_K:

[1]6.9463,[2]8.8442,[3]11.5516,[4]12.2726,[5]10.9208,[6]11.5958,[7]11.9222,[8]11.2917,[9]11.9582,[10]12.1829,[11]11.9845,[12]11.9964,[13]11.7915

I've never seen models that weren't broken just repeat the same word over and over but I also haven't messed with small models in a while.

Hmm, I'm not sure if Q6_K model had the lm_head normalize thing applied to it though. I will have to mess around and reconvert it, but unfortunately I probably won't get a chance to do that today.

ggerganov

Ok, thanks for helping out with this. Think we can merge this

KerfuffleV2 · 2023-10-04T00:30:45Z

Ok, thanks for helping out with this.

No problem.

I tested without the normalizing stuff. Seems to work fine.

Should

diff --git a/ggml.c b/ggml.c
index bf1426d..b24f7c3 100644
--- a/ggml.c
+++ b/ggml.c
@@ -12905,7 +12905,6 @@ static void ggml_compute_forward_alibi_f32(
     //const int nb3 = src0->nb[3];
 
     GGML_ASSERT(nb0 == sizeof(float));
-    GGML_ASSERT(ne1 + n_past == ne0);
     GGML_ASSERT(n_head == ne2);
 
     // add alibi to src0 (KQ_scaled)

also be included here since it's necessary to actually use the model after conversion? (If not, we can go ahead and merge since I don't have any other changes planned.)

ggerganov · 2023-10-04T14:20:40Z

I have added this change through the #3329 PR

…example * 'master' of github.com:ggerganov/llama.cpp: (24 commits) convert : fix Baichuan2 models by using vocab size in config.json (ggerganov#3299) readme : add project status link ggml : fix build after ggerganov#3329 llm : add Refact model (ggerganov#3329) sync : ggml (conv 1d + 2d updates, UB fixes) (ggerganov#3468) finetune : readme fix typo (ggerganov#3465) ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (ggerganov#3453) main : consistent prefix/suffix coloring (ggerganov#3425) llama : fix session saving/loading (ggerganov#3400) llama : expose model's rope_freq_scale in the API (ggerganov#3418) metal : alibi for arbitrary number of heads (ggerganov#3426) cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (ggerganov#3273) Work on the BPE tokenizer (ggerganov#3252) convert : fix vocab size when not defined in hparams (ggerganov#3421) cmake : increase minimum version for add_link_options (ggerganov#3444) CLBlast: Add broadcast support for matrix multiplication (ggerganov#3402) gguf : add BERT, MPT, and GPT-J arch info (ggerganov#3408) gguf : general usability improvements (ggerganov#3409) cmake : make CUDA flags more similar to the Makefile (ggerganov#3420) finetune : fix ggerganov#3404 (ggerganov#3437) ...

…erganov#3299) Use local GGUF package when possible in Baichuan converter

Try to fix Baichuan2 models by using vocab size in config.json

c746914

Use local GGUF package when possible in Baichuan converter

KerfuffleV2 mentioned this pull request Sep 21, 2023

when will baichuan2 be supported? #3270

Closed

ggerganov approved these changes Oct 3, 2023

View reviewed changes

ggerganov merged commit 019ba1d into ggerganov:master Oct 4, 2023
9 checks passed

yusiwen pushed a commit to yusiwen/llama.cpp that referenced this pull request Oct 7, 2023

convert : fix Baichuan2 models by using vocab size in config.json (gg…

8cf284f

…erganov#3299) Use local GGUF package when possible in Baichuan converter

KerfuffleV2 deleted the fix-baichuan2 branch November 17, 2023 03:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to fix Baichuan2 models by using vocab size in config.json #3299

Try to fix Baichuan2 models by using vocab size in config.json #3299

KerfuffleV2 commented Sep 21, 2023

akawrykow commented Sep 21, 2023

KerfuffleV2 commented Sep 27, 2023

ggerganov commented Sep 30, 2023 •

edited

Loading

KerfuffleV2 commented Oct 1, 2023

ggerganov commented Oct 1, 2023

KerfuffleV2 commented Oct 2, 2023

ggerganov commented Oct 3, 2023

KerfuffleV2 commented Oct 3, 2023

ggerganov commented Oct 3, 2023

KerfuffleV2 commented Oct 3, 2023

ggerganov left a comment

KerfuffleV2 commented Oct 4, 2023

ggerganov commented Oct 4, 2023

Try to fix Baichuan2 models by using vocab size in config.json #3299

Try to fix Baichuan2 models by using vocab size in config.json #3299

Conversation

KerfuffleV2 commented Sep 21, 2023

akawrykow commented Sep 21, 2023

KerfuffleV2 commented Sep 27, 2023

ggerganov commented Sep 30, 2023 • edited Loading

KerfuffleV2 commented Oct 1, 2023

ggerganov commented Oct 1, 2023

KerfuffleV2 commented Oct 2, 2023

ggerganov commented Oct 3, 2023

KerfuffleV2 commented Oct 3, 2023

ggerganov commented Oct 3, 2023

KerfuffleV2 commented Oct 3, 2023

ggerganov left a comment

Choose a reason for hiding this comment

KerfuffleV2 commented Oct 4, 2023

ggerganov commented Oct 4, 2023

ggerganov commented Sep 30, 2023 •

edited

Loading