-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: support baichuan serial models, by now, including Baichuan-7… #3009
Conversation
…B, Baichuan-13B,in the feature, we will support more Baichuan-models
Cool! Lets get some feedback if everything runs smoothly and we can merge |
OK, Looking forward to the feedback. |
I've tried with this PR and had encountered an model conversion issue with model Baichuan-13B-Chat: # Error message
Can not map tensor 'model.layers.15.self_attn.W_pack.weight' After a little investigation, it seems this line of code will break the model modification loop on layer 15, which is an early stop as the left parts have similar layers should be modified: # Original code
for i in itertools.count():
if f"model.layers.{i}.self_attn.W_pack.weight" in model_part:
print(f"Unpacking and permuting layer {i}")
tmp[f"model.layers.{i}.self_attn.q_proj.weight"]=reverse_hf_permute_part(model_part[f"model.layers.{i}.self_attn.W_pack.weight"],0,head_count,head_count)
tmp[f"model.layers.{i}.self_attn.k_proj.weight"]=reverse_hf_permute_part(model_part[f"model.layers.{i}.self_attn.W_pack.weight"],1,head_count,head_count_kv)
tmp[f"model.layers.{i}.self_attn.v_proj.weight"]=reverse_hf_part(model_part[f"model.layers.{i}.self_attn.W_pack.weight"],2)
del tmp[f"model.layers.{i}.self_attn.W_pack.weight"]
else:
continue # <- Breaks on layer 15 A possible fix: for i in range(block_count):
if f"model.layers.{i}.self_attn.W_pack.weight" in model_part:
print(f"Unpacking and permuting layer {i}")
tmp[f"model.layers.{i}.self_attn.q_proj.weight"]=reverse_hf_permute_part(model_part[f"model.layers.{i}.self_attn.W_pack.weight"],0,head_count,head_count)
tmp[f"model.layers.{i}.self_attn.k_proj.weight"]=reverse_hf_permute_part(model_part[f"model.layers.{i}.self_attn.W_pack.weight"],1,head_count,head_count_kv)
tmp[f"model.layers.{i}.self_attn.v_proj.weight"]=reverse_hf_part(model_part[f"model.layers.{i}.self_attn.W_pack.weight"],2)
del tmp[f"model.layers.{i}.self_attn.W_pack.weight"] |
Thanks for your feedback, you are right, we fixed it. |
Besides the above fix, it would be better to provide a sample PROMPT file for Baichuan-13B-Chat. Prompt file:
Then test the model like this ./main \
--model /path/to/ggml-model-q4_0.gguf \
--threads 24 \
--n_predict 2048 \
--color \
--interactive \
--file prompts/chat-with-baichuan.txt \
--reverse-prompt "用户:" |
good advices! |
It got stuck while I was converting the model. Already 24 hours. And I tried again, I got the same problem. |
@MarvinLong This looks something might be wrong with your data. Not sure, but it might be better to retry from scratch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be merging this soon.
Just a heads up - maintaining this model will primarily rely on contributions from the community. Adding some sort of CI in the future would help guarantee that the implementation is stable. But overall, if breaking changes occur, fixing Baichuan will be secondary priority
With time, we will try to refactor the code to reuse common building blocks when building the graphs of different models and this will probably help to keep everything stable together. I just want to get a few more architectures implemented before abstracting things
Is it solved? I will try it in my env. |
This comment was marked as outdated.
This comment was marked as outdated.
Ok, Maybe we can help maintain baichuan models to be stable. For example, If you refactor the architecture, and that cause baichuan models do not work, we can help fix it. Actually, I think it will be great if llama.cpp is been refactored, We are looking forward it. |
No, I tried buid from scratch 3 times, it always the same. I tried https://github.com/ouwei2013/baichuan13b.cpp/tree/master and it works on me. Can you give me a process to build from scratch? I can check whether I did something wrong. |
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script currently does not work.
Could pull it again? |
try again, Its my fault, Sorry. |
parser.add_argument("--vocab-only", action="store_true", help="extract only the vocab") | ||
parser.add_argument("--outfile", type=Path, help="path to write to; default: based on input") | ||
parser.add_argument("model", type=Path, help="directory containing model file, or model file itself (*.bin)") | ||
parser.add_argument("ftype", type=int, choices=[0, 1], help="output format - use 0 for float32, 1 for float16", default = 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default for 'ftype' does not work unless you also use nargs='?'
. Someone should fix this in the other scripts as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I have noticed this problem, We hope keep the consistency with other model converion script by now.Maybe we will fix it in the future.
I can't use this script on a 13B model unless I set TMPDIR to disk. I have 24GB of RAM and 24GB of swap. Is this a general limitation of these simpler convert scripts? I've never had such issues with 33B+ models and the standard convert.py.
|
I tried and I ran it successfully.I used 19.4GB RAM to run the baichuan convert script.So it may be a general limitation. |
How to inference by python after conversion? |
Reference Readme to inference after conversion
This ”/path/to/your_converted“ becomes the path after conversion |
The main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some questions are still pending. After resolving any issues - just merge it
As more and more people begin to use Baichuan's open-source models, the influence of Baichuan models is growing, especially in China. Many community members are interested in adding support for Baichuan models to llama.cpp. Meanwhile, Baichuan is a very open company, and in the future, it plans to open-source more and more models, taking all this into consideration, we would like to add support for the Baichuan model to llama.cpp. To do this, we need to make some changes, which we hope can be merged into the main branch of llama.cpp. In the future, we would be happy to help maintain support for Baichuan models in llama.cpp. We sincerely hope that our pull request can be accepted. Thank you.
By the way, the changes of this time mainly for supporting Baichuan-7B and Baichuan-13B, and the future version.