Segmentation fault during IQ3_XS generation. #6597

schmorp · 2024-04-10T22:47:14Z

While quantizing this model:

https://huggingface.co/ibivibiv/hydra-moe-120b

Using the imatrix in this repository: https://huggingface.co/mradermacher/hydra-moe-120b-i1-GGUF

quantize segfaulted after successfully generating other quants (such as Q6_K and IQ3_XXS):

A GDB session on the corefile is below, tell me if you want some more information:

[ 144/1143] blk.7.ffn_down.1.weight - [20480, 7168, 1, 1], type = f32, converting to iq3_s .. size = 560.00 MiB -> 60.16 MiB
[ 145/1143] blk.7.ffn_up.1.weight - [ 7168, 20480, 1, 1], type = f32, converting to iq3_xxs .. size = 560.00 MiB -> 53.59 MiB
[ 146/1143] blk.7.ffn_gate.2.weight - [ 7168, 20480, 1, 1], type = f32, converting to iq3_xxs .. size = 560.00 MiB -> 53.59 MiB
[ 147/1143] blk.7.ffn_down.2.weight - [20480, 7168, 1, 1], type = f32, converting to iq3_s .. size = 560.00 MiB -> 60.16 MiB
[ 148/1143] blk.7.ffn_up.2.weight - [ 7168, 20480, 1, 1], type = f32, converting to iq3_xxs .. size = 560.00 MiB -> 53.59 MiB
[ 149/1143] blk.7.ffn_gate.3.weight - [ 7168, 20480, 1, 1], type = f32, converting to iq3_xxs .. size = 560.00 MiB -> 53.59 MiB
[ 150/1143] blk.7.ffn_down.3.weight - [20480, 7168, 1, 1], type = f32, converting to iq3_s .. size = 560.00 MiB -> 60.16 MiB
[ 151/1143] blk.7.ffn_up.3.weight - [ 7168, 20480, 1, 1], type = f32, converting to iq3_xxs .. /root/s2/quantize: line 166: 348502 Segmentation fault (core dumped) "$QUANTIZE" --allow-requantize $IMATRIX "$srcgguf" ./"$OUT.
-> $HOSTNAME~" "$qmethod"
[Exit 139 (SEGV)]
kaos /tmp# gdb llama.cpp/build/bin/quantize core
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
https://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from llama.cpp/build/bin/quantize...
[New LWP 348502]
[...]
[New LWP 348512]

This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.debian.net
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Downloading separate debug info for /lib/x86_64-linux-gnu/libblas.so.3
Downloading separate debug info for system-supplied DSO at 0x7fff319f5000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `llama.cpp/build/bin/quantize --allow-requantize --imatrix hydra-moe-1'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 quantize_row_iq3_xxs_impl.constprop.1 (x=, vy=, n=, quant_weights=0x558f469c74c0, grid_size=256) at llama.cpp/ggml-quants.c:11414
11414 int grid_index = kmap_q3xs[u];
[Current thread is 1 (Thread 0x7fb22bae7740 (LWP 348502))]
(gdb) bt
#0 quantize_row_iq3_xxs_impl.constprop.1 (x=, vy=, n=, quant_weights=0x558f469c74c0, grid_size=256) at llama.cpp/ggml-quants.c:11414
#1 0x0000558f44c1d6b3 in quantize_iq3_xxs (quant_weights=0x558f469c74c0, n_per_row=, nrow=, dst=, src=) at llama.cpp/ggml-quants.c:11463
#2 ggml_quantize_chunk (type=GGML_TYPE_IQ3_XXS, src=0x7f56556bc060, dst=0x7f479d7ff010, start=0, nrows=3, n_per_row=, imatrix=0x558f469c74c0) at llama.cpp/ggml.c:20367
#3 0x0000558f44bfd856 in operator() (__closure=) at llama.cpp/llama.cpp:13381
#4 llama_tensor_quantize_internal (nthread=8, workers=std::vector of length 7, capacity 8 = {...}, imatrix=, n_per_row=7168, nrows=20480, chunk_size=21504, new_data=0x7f479d7ff010,
f32_data=0x7f56556bc060, new_type=) at llama.cpp/llama.cpp:13387
#5 llama_model_quantize_internal (fname_inp="./hydra-moe-120b.gguf", fname_out="./hydra-moe-120b-i1-GGUF/hydra-moe-120b.i1-IQ3_XS.gguf.kaos~", params=params@entry=0x7fff319426c0)
at llama.cpp/llama.cpp:13698
#6 0x0000558f44bec19b in llama_model_quantize (params=0x7fff319426c0, fname_out=, fname_inp=0x558f45909c40 "./hydra-moe-120b.gguf") at llama.cpp/llama.cpp:14697
#7 main (argc=, argv=) at llama.cpp/examples/quantize/quantize.cpp:403
(gdb)
#0 quantize_row_iq3_xxs_impl.constprop.1 (x=, vy=, n=, quant_weights=0x558f469c74c0, grid_size=256) at llama.cpp/ggml-quants.c:11414
#1 0x0000558f44c1d6b3 in quantize_iq3_xxs (quant_weights=0x558f469c74c0, n_per_row=, nrow=, dst=, src=) at llama.cpp/ggml-quants.c:11463
#2 ggml_quantize_chunk (type=GGML_TYPE_IQ3_XXS, src=0x7f56556bc060, dst=0x7f479d7ff010, start=0, nrows=3, n_per_row=, imatrix=0x558f469c74c0) at llama.cpp/ggml.c:20367
#3 0x0000558f44bfd856 in operator() (__closure=) at llama.cpp/llama.cpp:13381
#4 llama_tensor_quantize_internal (nthread=8, workers=std::vector of length 7, capacity 8 = {...}, imatrix=, n_per_row=7168, nrows=20480, chunk_size=21504, new_data=0x7f479d7ff010,
f32_data=0x7f56556bc060, new_type=) at llama.cpp/llama.cpp:13387
#5 llama_model_quantize_internal (fname_inp="./hydra-moe-120b.gguf", fname_out="./hydra-moe-120b-i1-GGUF/hydra-moe-120b.i1-IQ3_XS.gguf.kaos~", params=params@entry=0x7fff319426c0)
at llama.cpp/llama.cpp:13698
#6 0x0000558f44bec19b in llama_model_quantize (params=0x7fff319426c0, fname_out=, fname_inp=0x558f45909c40 "./hydra-moe-120b.gguf") at llama.cpp/llama.cpp:14697
#7 main (argc=, argv=) at llama.cpp/examples/quantize/quantize.cpp:403
(gdb)
(gdb) inf thr
Id Target Id Frame

1 Thread 0x7fb22bae7740 (LWP 348502) quantize_row_iq3_xxs_impl.constprop.1 (x=, vy=, n=, quant_weights=0x558f469c74c0, grid_size=256)
at llama.cpp/ggml-quants.c:11414
2 Thread 0x7fb2247fc6c0 (LWP 348514) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb22b60ede0 <thread_status+480>) at ./nptl/futex-internal.c:57
3 Thread 0x7f479a7fc6c0 (LWP 356080) 0x0000558f44c1a159 in iq3_find_best_neighbour(const uint16_t * restrict, const uint32_t * restrict, const float * restrict, const float * restrict, float, int8_t * restrict) (neighbours=, grid=0x7f478c005390, xval=, weight=, scale=0.00249774638, L=0x7f479a7ec790 "") at llama.cpp/ggml-quants.c:11243
4 Thread 0x7fb225ffd6c0 (LWP 348513) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb22b60ed60 <thread_status+352>) at ./nptl/futex-internal.c:57
5 Thread 0x7fb228fff6c0 (LWP 348511) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb22b60ec60 <thread_status+96>) at ./nptl/futex-internal.c:57
6 Thread 0x7f47977fa6c0 (LWP 356082) iq3_find_best_neighbour(const uint16_t * restrict, const uint32_t * restrict, const float * restrict, const float * restrict, float, int8_t * restrict) (neighbours=, grid=0x7f478c005390, xval=, weight=, scale=0.00154302374, L=0x7f47977ea790 "\002") at llama.cpp/ggml-quants.c:11232
7 Thread 0x7fb20fff96c0 (LWP 348518) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb22b60ef60 <thread_status+864>) at ./nptl/futex-internal.c:57
8 Thread 0x7f4795ff96c0 (LWP 356077) clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62
9 Thread 0x7fb2117fa6c0 (LWP 348517) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb22b60eee0 <thread_status+736>) at ./nptl/futex-internal.c:57
10 Thread 0x7f47947f86c0 (LWP 356083) nearest_int (fval=) at llama.cpp/ggml-quants.c:1316
11 Thread 0x7f479d7fe6c0 (LWP 356078) clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62
12 Thread 0x7f479bffd6c0 (LWP 356079) quantize_row_iq3_xxs_impl.constprop.1 (x=, vy=, n=, quant_weights=0x558f469c74c0, grid_size=256)
at llama.cpp/ggml-quants.c:11367
13 Thread 0x7fb212ffb6c0 (LWP 348516) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb22b60ee60 <thread_status+608>) at ./nptl/futex-internal.c:57
14 Thread 0x7f4798ffb6c0 (LWP 356081) nearest_int (fval=0.0372944102) at llama.cpp/ggml-quants.c:1316
15 Thread 0x7fb2277fe6c0 (LWP 348512) __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb22b60ece0 <thread_status+224>) at ./nptl/futex-internal.c:57
(gdb) l
11409 for (int k = 0; k < 4; ++k) block_signs[k] = (~block_signs[k]) & 127;
11410 }
11411 for (int k = 0; k < 8; ++k) {
11412 uint16_t u = 0;
11413 for (int i = 0; i < 4; ++i) u |= (L[4k+i] << 3i);
11414 int grid_index = kmap_q3xs[u];
11415 if (grid_index < 0) {
11416 printf("Oops: found point %u not on grid:", u);
11417 for (int i = 0; i < 4; ++i) printf(" %d", L[4*k+i]);
11418 printf("\n");

github-actions · 2024-05-25T01:06:33Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

schmorp added the bug-unconfirmed label Apr 10, 2024

schmorp mentioned this issue Apr 27, 2024

Is it normal that ROCm+HIPBLAS produces different results than on CPU or breaks completely? #6841

Closed

github-actions bot added the stale label May 11, 2024

github-actions bot closed this as completed May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault during IQ3_XS generation. #6597

Segmentation fault during IQ3_XS generation. #6597

schmorp commented Apr 10, 2024

github-actions bot commented May 25, 2024

Segmentation fault during IQ3_XS generation. #6597

Segmentation fault during IQ3_XS generation. #6597

Comments

schmorp commented Apr 10, 2024

github-actions bot commented May 25, 2024