Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wizard Coder 15b Support? #1901

Closed
Asory2010 opened this issue Jun 16, 2023 · 10 comments · Fixed by #3399
Closed

Wizard Coder 15b Support? #1901

Asory2010 opened this issue Jun 16, 2023 · 10 comments · Fixed by #3399

Comments

@Asory2010
Copy link

I have tried running the GGML version of it but it gives this error:

main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --repeat_penalty 1.2 --instruct --color --memory_f32 -m WizardCoder-15B-1.0.ggmlv3.q4_0.bin
main: build = 686 (ac3b886)
main: seed = 1686975019
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4050 Laptop GPU
llama.cpp: loading model from WizardCoder-15B-1.0.ggmlv3.q4_0.bin
error loading model: missing tok_embeddings.weight
llama_init_from_file: failed to load model

@johnson442
Copy link
Contributor

WizardCoder-15B-1.0.ggmlv3.q5_1.bin works fine for me using the starcoder ggml example: https://github.com/ggerganov/ggml/tree/master/examples/starcoder.

Llama.cpp doesn't support it yet.

@mirek190
Copy link

but it is not llama.cpp ;)

@giridharreddy7
Copy link

can anyone explain how to use other model such as WizardVicuna with this privateGPT is that model supported?

@spikespiegel
Copy link

WizardCoder-15B-1.0.ggmlv3.q5_1.bin works fine for me using the starcoder ggml example: https://github.com/ggerganov/ggml/tree/master/examples/starcoder.

Llama.cpp doesn't support it yet.

I cannot make it work with starcoder.cpp. I downloaded the 4-bit ggml model from huggingface, but it gives ggml error.
./main -m ./models/WizardCoder-15B-1.0.ggmlv3.q4_1.bin -p "def fibonacci(" --temp 0.2

Error:
main: seed = 1686965178
starcoder_model_load: loading model from './models/WizardCoder-15B-1.0.ggmlv3.q4_1.bin'
starcoder_model_load: n_vocab = 49153
starcoder_model_load: n_ctx = 8192
starcoder_model_load: n_embd = 6144
starcoder_model_load: n_head = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype = 2003
starcoder_model_load: qntvr = 2
starcoder_model_load: ggml ctx size = 28956.48 MB
GGML_ASSERT: ggml.c:3874: ctx->mem_buffer != NULL
Aborted (core dumped)

More information: I have 16 GB RAM, and the model is about 11 GB, so it should probably be fit into the memory, if that was the issue?

It may not be the place to ask, but now that you said you can run it, can you give me some help/reference what is going on?

@johnson442
Copy link
Contributor

Are you monitoring memory use when you run starcoder? Running the 14.3GB Q5_1 with 32GB of ram:

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                        
  24811 root      20   0   29.1g  13.4g   4352 R 397.3  42.8   1:03.45 starcoder 

From

starcoder_model_load: ggml ctx size = 28956.48 MB
GGML_ASSERT: ggml.c:3874: ctx->mem_buffer != NULL

seems pretty likely you are running out of memory.

I dont think any of the mmap magic in llamacpp has made it into ggml yet.

@spikespiegel
Copy link

Thanks for the reply. Yes, the model does not fit into the memory, it seems. I assumed the model would fit into the Ram since it is smaller, but it seems that is not the case with ggml. Good to know, thanks!

@Asory2010
Copy link
Author

When would it be supported????

@mirek190
Copy link

That model for coding is better than anything offline so far.
It is level of gpt 3.5.

@howard0su
Copy link
Collaborator

Wizard Coder 15b is no more LLAMA family model. its graph has several different nodes than LLAMA models.

@johnson442
Copy link
Contributor

@spikespiegel I cobbled together basic mmap (and gpu) support for the starcoder example if you'd like to test:
https://github.com/johnson442/ggml/tree/starcoder-mmap

There is probably something wrong with it, but it seems to run ok for me on a system with 16GB of ram.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants