Bug: A crash occurs when llama-bench is running on multiple cann devices. #9250
Labels
Ascend NPU
issues specific to Ascend NPUs
bug-unconfirmed
critical severity
Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)
stale
What happened?
when i use Llama3-8B-Chinese-Chat-f16-v2_1.gguf to run llama.cpp, here is a crash:
here is my cmd:
./llama-cli -m /home/c00662745/llama3/llama3/llama3_chinese_gguf/Llama3-8B-Chinese-Chat-f16-v2_1.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm layer
here is the error:
’‘’
CANN error: EE9999: Inner Error!
EE9999: [PID: 2750884] 2024-08-30-16:20:38.196.490 Stream destroy failed, stream is not in current ctx, stream_id=2.[FUNC:StreamDestroy][FILE:api_impl.cc][LINE:1032]
TraceBack (most recent call last):
rtStreamDestroy execute failed, reason=[stream not in current context][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
destroy stream failed, runtime result = 107003[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
current device: 1, in function ~ggml_backend_cann_context at /home/zn/new-llama/llama.cpp/ggml/src/ggml-cann/common.h:235
aclrtDestroyStream(streams[i])
/home/zn/new-llama/llama.cpp/ggml/src/ggml-cann.cpp:123: CANN error
[New LWP 2750924]
[New LWP 2750937]
[New LWP 2753277]
[New LWP 2753281]
[New LWP 2753615]
[New LWP 2753616]
[New LWP 2753623]
[New LWP 2753626]
[New LWP 2753900]
[New LWP 2753901]
[New LWP 2757030]
[New LWP 2757031]
[New LWP 2757032]
[New LWP 2757033]
[New LWP 2757034]
[New LWP 2757035]
[New LWP 2757036]
[New LWP 2757037]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
0x0000ffff8e7edc00 in wait4 () from /usr/lib64/libc.so.6
#0 0x0000ffff8e7edc00 in wait4 () from /usr/lib64/libc.so.6
#1 0x0000ffff8ec019f0 in ggml_print_backtrace () at /home/zn/new-llama/llama.cpp/ggml/src/ggml.c:253
253 waitpid(pid, &wstatus, 0);
#2 0x0000ffff8ec01b20 in ggml_abort (file=0xffff8eccfd58 "/home/zn/new-llama/llama.cpp/ggml/src/ggml-cann.cpp", line=123, fmt=0xffff8eccfd48 "CANN error") at /home/zn/new-llama/llama.cpp/ggml/src/ggml.c:280
280 ggml_print_backtrace();
#3 0x0000ffff8ec94ab8 in ggml_cann_error (stmt=0xffff8eccfcb0 "aclrtDestroyStream(streams[i])", func=0xffff8eccfc70 "~ggml_backend_cann_context", file=0xffff8eccfc18 "/home/zn/new-llama/llama.cpp/ggml/src/ggml-cann/common.h", line=235, msg=0x3bcc0668 "EE9999: Inner Error!\nEE9999: [PID: 2750884] 2024-08-30-16:20:38.196.490 Stream destroy failed, stream is not in current ctx, stream_id=2.[FUNC:StreamDestroy][FILE:api_impl.cc][LINE:1032]\n Trace"...) at /home/zn/new-llama/llama.cpp/ggml/src/ggml-cann.cpp:123
warning: Source file is more recent than executable.
123 GGML_ABORT("CANN error");
#4 0x0000ffff8ec97b74 in ggml_backend_cann_context::~ggml_backend_cann_context (this=0x33af8680, __in_chrg=) at /home/zn/new-llama/llama.cpp/ggml/src/ggml-cann/common.h:235
235 ACL_CHECK(aclrtDestroyStream(streams[i]));
#5 0x0000ffff8ec964ac in ggml_backend_cann_free (backend=0x2a8d71a0) at /home/zn/new-llama/llama.cpp/ggml/src/ggml-cann.cpp:1412
1412 delete cann_ctx;
#6 0x0000ffff8ec49394 in ggml_backend_free (backend=0x2a8d71a0) at /home/zn/new-llama/llama.cpp/ggml/src/ggml-backend.c:180
180 backend->iface.free(backend);
#7 0x0000ffff8f18a30c in llama_context::~llama_context (this=0x29fc9fc0, __in_chrg=) at /home/zn/new-llama/llama.cpp/src/llama.cpp:3069
3069 ggml_backend_free(backend);
#8 0x0000ffff8f16b744 in llama_free (ctx=0x29fc9fc0) at /home/zn/new-llama/llama.cpp/src/llama.cpp:17936
17936 delete ctx;
#9 0x0000000000476d48 in main (argc=12, argv=0xfffffc7fe828) at /home/zn/new-llama/llama.cpp/examples/main/main.cpp:1020
1020 llama_free(ctx);
[Inferior 1 (process 2750884) detached]
Aborted (core dumped)
‘’‘
seems like in the final stream free,cann did't get the right ctx id.
Name and Version
(base) [root@localhost bin]# ./llama-cli --version
version: 3645 (7ea8d80)
built with cc (GCC) 10.3.1 for aarch64-linux-gnu
What operating system are you seeing the problem on?
No response
Relevant log output
No response
The text was updated successfully, but these errors were encountered: