Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dolly example no longer works ... #156

Open
rcherukuri12 opened this issue Nov 19, 2023 · 19 comments
Open

Dolly example no longer works ... #156

rcherukuri12 opened this issue Nov 19, 2023 · 19 comments

Comments

@rcherukuri12
Copy link

rcherukuri12 commented Nov 19, 2023

This example doesn't work : https://info.juliahub.com/large-language-model-llm-tutorial-with-julias-transformers.jl

Looks like this package ..is no longer current and doesn't work with LLMs in HuggingFace

textenc = hgf"databricks/dolly-v2-12b:tokenizer"
Warning: fuse_unk is unsupported, the tokenization result might be slightly different in some cases

julia> model = todevice(hgf"databricks/dolly-v2-12b:ForCausalLM")
ERROR: unknown type: torch.BFloat16Storage

@chengchingwen
Copy link
Owner

Which version of Transformers.jl and Pickle.jl?

@rcherukuri12
Copy link
Author

rcherukuri12 commented Nov 20, 2023

(@v1.9) pkg> status Transformers
Status ~/.julia/environments/v1.9/Project.toml
Transformers v0.2.8
Pickle v0.3.2

@chengchingwen
Copy link
Owner

Ok, so you would need to update Pickle.jl to 0.3.3 which adds support for bfloat16.

@rcherukuri12
Copy link
Author

pkg>update Pickle
wouldn't do it.

@rcherukuri12
Copy link
Author

on linux / ubuntu ..is this platform specific ?

@chengchingwen
Copy link
Owner

Not really. What happens if you explicitly add [email protected]? I'm guessing there are some compat issues that block the update.

@rcherukuri12
Copy link
Author

(@v1.9) pkg> add [email protected]
Resolving package versions...
ERROR: Unsatisfiable requirements detected for package ReinforcementLearningZoo [d607f57d]:

@rcherukuri12
Copy link
Author

I will create a local project and activate and see if it works ..

@rcherukuri12
Copy link
Author

Similar warning as before ...
textenc = hgf"databricks/dolly-v2-12b:tokenizer"
┌ Warning: fuse_unk is unsupported, the tokenization result might be slightly different in some cases.
└ @ Transformers.HuggingFace ~/.julia/packages/Transformers/lD5nW/src/huggingface/tokenizer/utils.jl:42

will load the model and see if it works.

@rcherukuri12
Copy link
Author

This time ...no error , but model hangs ..
model = todevice(hgf"databricks/dolly-v2-12b":ForCausalLM")

@chengchingwen
Copy link
Owner

The warning can usually be ignored

model = todevice(hgf"databricks/dolly-v2-12b":ForCausalLM")

That is a big model, which takes time to be moved to GPU. And there is an extra ".

@rcherukuri12
Copy link
Author

rcherukuri12 commented Nov 20, 2023

I have a 3080 ti ..will wait to see and let you know . Thank you. I was able to run Ollama with other 13b models fairly quickly.

@rcherukuri12
Copy link
Author

finally it put out an error:
ERROR: LoadError: syntax: cannot juxtapose string literal

@chengchingwen
Copy link
Owner

That is because the extra "

@rcherukuri12
Copy link
Author

Let me see where it is happening ... it is this line:
model = todevice(hgf"databricks/dolly-v2-12b:ForCausalLM")

@rcherukuri12
Copy link
Author

The other thing , I noticed ..nvidia-smi was constant ...almost telling it was not copying there ..

@rcherukuri12
Copy link
Author

rcherukuri12 commented Nov 20, 2023

I will copy into a vim editor and try it out ..that way , double quotes sometimes comes up with special chars...that can be eliminated as a cause.

@rcherukuri12
Copy link
Author

Still no luck. It just kills my shell after a while . Will come back and try later .
For now I will stick to just using OpenAI.jl and continue my work.
Thank you for trying to help.

@chengchingwen
Copy link
Owner

chengchingwen commented Nov 20, 2023

It sounds like the process might be killed due to OOM.

So currently you would need about 70GB CPU memory to load the 12B model. This is actually larger than the size of model weights, due to our implementation detail. The model weight is directly copied from disk to memory, and during the construction of the model object, it makes another copy on CPU. So at the end it would take at least 2 times (or more, depending on the data type) larger memory than the size of model weights

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants