-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] dequantize_row_q4_0 segfaults #791
Comments
Thread 1 "main" received signal SIGSEGV, Segmentation fault. and without AVX2 the crash is here: |
You cannot eval with a vocab only model. |
where can I get a proper model? |
I cannot help you with that, but there are some details in the official repository: https://github.com/facebookresearch/llama/ |
Environment and Context
Linux 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
g++ (Debian 10.2.1-6) 10.2.1 20210110
GNU Make 4.3
Failure Information (for bugs)
main segfaults at dequantize_row_q4_0+48
Steps to Reproduce
./main -m models/ggml-vocab-q4_0.bin
~/s/llama.cpp ❯❯❯ gdb main
(gdb) r -m models/ggml-vocab-q4_0.bin
Starting program: /home/sha0/soft/llama.cpp/main -m models/ggml-vocab-q4_0.bin
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
main: seed = 1680724006
llama_model_load: loading model from 'models/ggml-vocab-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: ggml map size = 0.41 MB
llama_model_load: ggml ctx size = 81.25 KB
llama_model_load: mem required = 1792.49 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from 'models/ggml-vocab-q4_0.bin'
llama_model_load: model size = 0.00 MB / num tensors = 0
llama_model_load: WARN no tensors loaded from model file - assuming empty model for testing
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 128, n_keep = 0
[New Thread 0x7fff77560700 (LWP 142639)]
[New Thread 0x7fff76d5f700 (LWP 142640)]
[New Thread 0x7fff7655e700 (LWP 142641)]
[New Thread 0x7fff75d5d700 (LWP 142642)]
[New Thread 0x7fff7555c700 (LWP 142643)]
[New Thread 0x7fff74d5b700 (LWP 142644)]
[New Thread 0x7fff7455a700 (LWP 142645)]
[New Thread 0x7fff73d59700 (LWP 142646)]
[New Thread 0x7fff73558700 (LWP 142647)]
[New Thread 0x7fff72d57700 (LWP 142648)]
[New Thread 0x7fff72556700 (LWP 142649)]
[New Thread 0x7fff71d55700 (LWP 142650)]
[New Thread 0x7fff71554700 (LWP 142651)]
[New Thread 0x7fff70d53700 (LWP 142652)]
[New Thread 0x7fff70552700 (LWP 142653)]
Thread 1 "main" received signal SIGSEGV, Segmentation fault.
0x000055555555e430 in dequantize_row_q4_0 ()
(gdb) bt
#0 0x000055555555e430 in dequantize_row_q4_0 ()
#1 0x0000555555567585 in ggml_compute_forward_get_rows ()
#2 0x000055555556fba3 in ggml_graph_compute ()
#3 0x0000555555578eca in llama_eval_internal(llama_context&, int const*, int, int, int) ()
#4 0x000055555557919f in llama_eval ()
#5 0x000055555555c1aa in main ()
(gdb) x/i $pc
=> 0x55555555e430 <dequantize_row_q4_0+48>: vpmovzxbw 0x4(%rdi),%ymm1
(gdb) i r rdi
rdi 0xa00 2560
(gdb) i r ymm1
ymm1 {v16_bfloat16 = {0x180, 0x0, 0x0, 0x0, 0x180, 0x0 <repeats 11 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0xc0, 0x43, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x43, 0x0 <repeats 22 times>}, v16_int16 = {0x43c0, 0x0, 0x0, 0x0, 0x43c0, 0x0 <repeats 11 times>}, v8_int32 = {0x43c0, 0x0, 0x43c0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x43c0, 0x43c0, 0x0, 0x0}, v2_int128 = {0x43c000000000000043c0, 0x0}}
(gdb)
The text was updated successfully, but these errors were encountered: