Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool. #1266

CurtiusSimplus · 2024-11-08T02:31:05Z

%%capture
!pip install unsloth "xformers==0.0.28.post2"

Also get the latest nightly Unsloth!

!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --upgrade --no-cache-dir --no-deps unsloth transformers git+https://github.com/huggingface/trl.git

AND

%%capture
!pip install unsloth "xformers==0.0.28.post2"

Also get the latest nightly Unsloth!

!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
#!pip install --upgrade --no-cache-dir --no-deps unsloth transformers git+https://github.com/huggingface/trl.git

Error is the same.

if True:
model.push_to_hub_gguf(
"HF/Model", # Change hf to your username!
tokenizer,
quantization_method = ["q4_0","q4_k_m","q5_k_m",],
token = "hf_KCorrect_Token_Here", # Get a token at https://huggingface.co/settings/tokens
)

Unsloth: [0] Installing llama.cpp. This will take 3 minutes...
Unsloth: [1] Converting model at XXX into f16 GGUF format.
The output location will be /content/CCC/dddd/unsloth.F16.gguf
This will take 3 minutes...

TypeError Traceback (most recent call last)
in <cell line: 1>()
1 if True:
----> 2 model.push_to_hub_gguf(
3 "HF/Model", # Change hf to your username!
4 tokenizer,
5 quantization_method = ["q4_0","q4_k_m","q5_k_m",],

4 frames
/usr/local/lib/python3.10/dist-packages/unsloth/save.py in unsloth_push_to_hub_gguf(self, repo_id, tokenizer, quantization_method, first_conversion, use_temp_dir, commit_message, private, token, max_shard_size, create_pr, safe_serialization, revision, commit_description, tags, temporary_location, maximum_memory_usage)
1859
1860 # Save to GGUF
-> 1861 all_file_locations, want_full_precision = save_to_gguf(
1862 model_type, model_dtype, is_sentencepiece_model,
1863 new_save_directory, quantization_method, first_conversion, makefile,

/usr/local/lib/python3.10/dist-packages/unsloth/save.py in save_to_gguf(model_type, model_dtype, is_sentencepiece, model_directory, quantization_method, first_conversion, _run_installer)
1091 vocab_type = "spm,hfft,bpe"
1092 # Fix Sentencepiece model as well!
-> 1093 fix_sentencepiece_gguf(model_directory)
1094 else:
1095 vocab_type = "bpe"

/usr/local/lib/python3.10/dist-packages/unsloth/tokenizer_utils.py in fix_sentencepiece_gguf(saved_location)
402 """
403 from copy import deepcopy
--> 404 from transformers.utils import sentencepiece_model_pb2
405 import json
406 from enum import IntEnum

/usr/local/lib/python3.10/dist-packages/transformers/utils/sentencepiece_model_pb2.py in
26
27
---> 28 DESCRIPTOR = _descriptor.FileDescriptor(
29 name="sentencepiece_model.proto",
30 package="sentencepiece",

/usr/local/lib/python3.10/dist-packages/google/protobuf/descriptor.py in new(cls, name, package, options, serialized_options, serialized_pb, dependencies, public_dependencies, syntax, pool, create_key)
1022 raise RuntimeError('Please link in cpp generated lib for %s' % (name))
1023 elif serialized_pb:
-> 1024 return _message.default_pool.AddSerializedFile(serialized_pb)
1025 else:
1026 return super(FileDescriptor, cls).new(cls)

TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "sentencepiece_model.proto":
sentencepiece_model.proto: A file with this name is already in the pool.

This happened last week too, IIRC it was a transformers thing then, but I can't find a work around.

CurtiusSimplus · 2024-11-08T06:27:42Z

Still not change ... will save 16 bit merged and Lora but not the GGUF ... that error comes up ... OH well.

CurtiusSimplus · 2024-11-08T20:13:07Z

Using Vanilla new script with ONLY MY TOKEN and HF name model added to save.gguf part of code ..

https://colab.research.google.com/drive/1PrX2o1VXJJfG1n8GXpzpBr3qY9NPgucM?usp=sharing

This works ... So the issue is with my code some place ... Let me see if I can substitute my MODEL and HF TOKEN to test.

Will report back

CurtiusSimplus · 2024-11-08T20:55:31Z

My model fails with the identical code ... So it is the model.

IDK

CurtiusSimplus · 2024-11-09T18:58:20Z

Still off and on ... And I am using this code to start the scripts:

%%capture
!pip install unsloth "xformers==0.0.28.post2"

Also get the latest nightly Unsloth!

!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

Nemo saved. As expected.

Small Mistral. Failed. Same error as above.

Mistral 7b instruct. Failed. Same error as above. ( used your provided script only added HF user name and tokens at indicated place in code and increased context window to their max. )

But A fine tune of Dolphin 2.9.3 iirc THAT WORKED PERFECTLY. (Changed model in above script. This worked above errored.)

So it seems to be BOTH model and TRL dependent but that is above my level ...

I can save all as Merged 16 bit and Lora so they are not lost. Just GGUF is giving me a go again.

CurtiusSimplus · 2024-11-09T19:01:50Z

I have tried a new HF token as well ... NO dice. The old one is valid and working for pushing Merged model to HF well and in order. ONLY GGUF ... Have tried all sorts of things.

The manual save ... I can't figure out how to use ... above my level that is all.

CurtiusSimplus · 2024-11-10T22:25:45Z

Still an issue ... Can save using manual but it won't push to HF as GGUF ...

danielhanchen · 2024-11-11T08:09:27Z

Hmm so Mistral 7b Instruct is the main culprit? @Erland366 Can you take a look at exporting Mistral Instruct thanks

CurtiusSimplus · 2024-11-11T20:35:51Z

Yes basically Mistral 7b instruct and its 'clones' seem to have issues.

Thanks a lot again.

unslothai#1266

…ress issue unslothai#1266" This reverts commit 9fc1307.

unslothai#1266

Erland366 added a commit to Erland366/unsloth that referenced this issue Nov 12, 2024

Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue

9fc1307

unslothai#1266

Erland366 added a commit to Erland366/unsloth that referenced this issue Nov 12, 2024

Revert "Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to add…

e222137

…ress issue unslothai#1266" This reverts commit 9fc1307.

Erland366 added a commit to Erland366/unsloth that referenced this issue Nov 12, 2024

Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION to 'python' to address issue

1e3faa9

unslothai#1266

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool. #1266

Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool. #1266

CurtiusSimplus commented Nov 8, 2024

CurtiusSimplus commented Nov 8, 2024

CurtiusSimplus commented Nov 8, 2024

CurtiusSimplus commented Nov 8, 2024

CurtiusSimplus commented Nov 9, 2024

CurtiusSimplus commented Nov 9, 2024

CurtiusSimplus commented Nov 10, 2024

danielhanchen commented Nov 11, 2024

CurtiusSimplus commented Nov 11, 2024

Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool. #1266

Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool. #1266

Comments

CurtiusSimplus commented Nov 8, 2024

Also get the latest nightly Unsloth!

Also get the latest nightly Unsloth!

Unsloth: [0] Installing llama.cpp. This will take 3 minutes... Unsloth: [1] Converting model at XXX into f16 GGUF format. The output location will be /content/CCC/dddd/unsloth.F16.gguf This will take 3 minutes...

CurtiusSimplus commented Nov 8, 2024

CurtiusSimplus commented Nov 8, 2024

CurtiusSimplus commented Nov 8, 2024

CurtiusSimplus commented Nov 9, 2024

Also get the latest nightly Unsloth!

CurtiusSimplus commented Nov 9, 2024

CurtiusSimplus commented Nov 10, 2024

danielhanchen commented Nov 11, 2024

CurtiusSimplus commented Nov 11, 2024

Unsloth: [0] Installing llama.cpp. This will take 3 minutes...
Unsloth: [1] Converting model at XXX into f16 GGUF format.
The output location will be /content/CCC/dddd/unsloth.F16.gguf
This will take 3 minutes...