Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool. #1266

Open
CurtiusSimplus opened this issue Nov 8, 2024 · 8 comments

Comments

@CurtiusSimplus
Copy link

%%capture
!pip install unsloth "xformers==0.0.28.post2"

Also get the latest nightly Unsloth!

!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --upgrade --no-cache-dir --no-deps unsloth transformers git+https://github.com/huggingface/trl.git

AND

%%capture
!pip install unsloth "xformers==0.0.28.post2"

Also get the latest nightly Unsloth!

!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
#!pip install --upgrade --no-cache-dir --no-deps unsloth transformers git+https://github.com/huggingface/trl.git

Error is the same.

if True:
model.push_to_hub_gguf(
"HF/Model", # Change hf to your username!
tokenizer,
quantization_method = ["q4_0","q4_k_m","q5_k_m",],
token = "hf_KCorrect_Token_Here", # Get a token at https://huggingface.co/settings/tokens
)

Unsloth: [0] Installing llama.cpp. This will take 3 minutes...
Unsloth: [1] Converting model at XXX into f16 GGUF format.
The output location will be /content/CCC/dddd/unsloth.F16.gguf
This will take 3 minutes...

TypeError Traceback (most recent call last)
in <cell line: 1>()
1 if True:
----> 2 model.push_to_hub_gguf(
3 "HF/Model", # Change hf to your username!
4 tokenizer,
5 quantization_method = ["q4_0","q4_k_m","q5_k_m",],

4 frames
/usr/local/lib/python3.10/dist-packages/unsloth/save.py in unsloth_push_to_hub_gguf(self, repo_id, tokenizer, quantization_method, first_conversion, use_temp_dir, commit_message, private, token, max_shard_size, create_pr, safe_serialization, revision, commit_description, tags, temporary_location, maximum_memory_usage)
1859
1860 # Save to GGUF
-> 1861 all_file_locations, want_full_precision = save_to_gguf(
1862 model_type, model_dtype, is_sentencepiece_model,
1863 new_save_directory, quantization_method, first_conversion, makefile,

/usr/local/lib/python3.10/dist-packages/unsloth/save.py in save_to_gguf(model_type, model_dtype, is_sentencepiece, model_directory, quantization_method, first_conversion, _run_installer)
1091 vocab_type = "spm,hfft,bpe"
1092 # Fix Sentencepiece model as well!
-> 1093 fix_sentencepiece_gguf(model_directory)
1094 else:
1095 vocab_type = "bpe"

/usr/local/lib/python3.10/dist-packages/unsloth/tokenizer_utils.py in fix_sentencepiece_gguf(saved_location)
402 """
403 from copy import deepcopy
--> 404 from transformers.utils import sentencepiece_model_pb2
405 import json
406 from enum import IntEnum

/usr/local/lib/python3.10/dist-packages/transformers/utils/sentencepiece_model_pb2.py in
26
27
---> 28 DESCRIPTOR = _descriptor.FileDescriptor(
29 name="sentencepiece_model.proto",
30 package="sentencepiece",

/usr/local/lib/python3.10/dist-packages/google/protobuf/descriptor.py in new(cls, name, package, options, serialized_options, serialized_pb, dependencies, public_dependencies, syntax, pool, create_key)
1022 raise RuntimeError('Please link in cpp generated lib for %s' % (name))
1023 elif serialized_pb:
-> 1024 return _message.default_pool.AddSerializedFile(serialized_pb)
1025 else:
1026 return super(FileDescriptor, cls).new(cls)

TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "sentencepiece_model.proto":
sentencepiece_model.proto: A file with this name is already in the pool.

This happened last week too, IIRC it was a transformers thing then, but I can't find a work around.

@CurtiusSimplus
Copy link
Author

Still not change ... will save 16 bit merged and Lora but not the GGUF ... that error comes up ... OH well.

@CurtiusSimplus
Copy link
Author

Using Vanilla new script with ONLY MY TOKEN and HF name model added to save.gguf part of code ..

https://colab.research.google.com/drive/1PrX2o1VXJJfG1n8GXpzpBr3qY9NPgucM?usp=sharing

This works ... So the issue is with my code some place ... Let me see if I can substitute my MODEL and HF TOKEN to test.

Will report back

@CurtiusSimplus
Copy link
Author

My model fails with the identical code ... So it is the model.

IDK

@CurtiusSimplus
Copy link
Author

Still off and on ... And I am using this code to start the scripts:

%%capture
!pip install unsloth "xformers==0.0.28.post2"

Also get the latest nightly Unsloth!

!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

Nemo saved. As expected.

Small Mistral. Failed. Same error as above.

Mistral 7b instruct. Failed. Same error as above. ( used your provided script only added HF user name and tokens at indicated place in code and increased context window to their max. )

But A fine tune of Dolphin 2.9.3 iirc THAT WORKED PERFECTLY. (Changed model in above script. This worked above errored.)

So it seems to be BOTH model and TRL dependent but that is above my level ...

I can save all as Merged 16 bit and Lora so they are not lost. Just GGUF is giving me a go again.

@CurtiusSimplus
Copy link
Author

I have tried a new HF token as well ... NO dice. The old one is valid and working for pushing Merged model to HF well and in order. ONLY GGUF ... Have tried all sorts of things.

The manual save ... I can't figure out how to use ... above my level that is all.

@CurtiusSimplus
Copy link
Author

Still an issue ... Can save using manual but it won't push to HF as GGUF ...

@danielhanchen
Copy link
Contributor

Hmm so Mistral 7b Instruct is the main culprit? @Erland366 Can you take a look at exporting Mistral Instruct thanks

@CurtiusSimplus
Copy link
Author

Yes basically Mistral 7b instruct and its 'clones' seem to have issues.

Thanks a lot again.

Erland366 added a commit to Erland366/unsloth that referenced this issue Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants