Bug: A crash occurs when llama-bench is running on multiple cann devices. #9250

znzjugod · 2024-08-30T09:16:48Z

What happened?

when i use Llama3-8B-Chinese-Chat-f16-v2_1.gguf to run llama.cpp, here is a crash:
here is my cmd:
./llama-cli -m /home/c00662745/llama3/llama3/llama3_chinese_gguf/Llama3-8B-Chinese-Chat-f16-v2_1.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm layer

here is the error:
’‘’
CANN error: EE9999: Inner Error!
EE9999: [PID: 2750884] 2024-08-30-16:20:38.196.490 Stream destroy failed, stream is not in current ctx, stream_id=2.[FUNC:StreamDestroy][FILE:api_impl.cc][LINE:1032]
TraceBack (most recent call last):
rtStreamDestroy execute failed, reason=[stream not in current context][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
destroy stream failed, runtime result = 107003[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]

current device: 1, in function ~ggml_backend_cann_context at /home/zn/new-llama/llama.cpp/ggml/src/ggml-cann/common.h:235
aclrtDestroyStream(streams[i])
/home/zn/new-llama/llama.cpp/ggml/src/ggml-cann.cpp:123: CANN error
[New LWP 2750924]
[New LWP 2750937]
[New LWP 2753277]
[New LWP 2753281]
[New LWP 2753615]
[New LWP 2753616]
[New LWP 2753623]
[New LWP 2753626]
[New LWP 2753900]
[New LWP 2753901]
[New LWP 2757030]
[New LWP 2757031]
[New LWP 2757032]
[New LWP 2757033]
[New LWP 2757034]
[New LWP 2757035]
[New LWP 2757036]
[New LWP 2757037]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
0x0000ffff8e7edc00 in wait4 () from /usr/lib64/libc.so.6
#0 0x0000ffff8e7edc00 in wait4 () from /usr/lib64/libc.so.6
#1 0x0000ffff8ec019f0 in ggml_print_backtrace () at /home/zn/new-llama/llama.cpp/ggml/src/ggml.c:253
253 waitpid(pid, &wstatus, 0);
#2 0x0000ffff8ec01b20 in ggml_abort (file=0xffff8eccfd58 "/home/zn/new-llama/llama.cpp/ggml/src/ggml-cann.cpp", line=123, fmt=0xffff8eccfd48 "CANN error") at /home/zn/new-llama/llama.cpp/ggml/src/ggml.c:280
280 ggml_print_backtrace();
#3 0x0000ffff8ec94ab8 in ggml_cann_error (stmt=0xffff8eccfcb0 "aclrtDestroyStream(streams[i])", func=0xffff8eccfc70 "~ggml_backend_cann_context", file=0xffff8eccfc18 "/home/zn/new-llama/llama.cpp/ggml/src/ggml-cann/common.h", line=235, msg=0x3bcc0668 "EE9999: Inner Error!\nEE9999: [PID: 2750884] 2024-08-30-16:20:38.196.490 Stream destroy failed, stream is not in current ctx, stream_id=2.[FUNC:StreamDestroy][FILE:api_impl.cc][LINE:1032]\n Trace"...) at /home/zn/new-llama/llama.cpp/ggml/src/ggml-cann.cpp:123
warning: Source file is more recent than executable.
123 GGML_ABORT("CANN error");
#4 0x0000ffff8ec97b74 in ggml_backend_cann_context::~ggml_backend_cann_context (this=0x33af8680, __in_chrg=) at /home/zn/new-llama/llama.cpp/ggml/src/ggml-cann/common.h:235
235 ACL_CHECK(aclrtDestroyStream(streams[i]));
#5 0x0000ffff8ec964ac in ggml_backend_cann_free (backend=0x2a8d71a0) at /home/zn/new-llama/llama.cpp/ggml/src/ggml-cann.cpp:1412
1412 delete cann_ctx;
#6 0x0000ffff8ec49394 in ggml_backend_free (backend=0x2a8d71a0) at /home/zn/new-llama/llama.cpp/ggml/src/ggml-backend.c:180
180 backend->iface.free(backend);
#7 0x0000ffff8f18a30c in llama_context::~llama_context (this=0x29fc9fc0, __in_chrg=) at /home/zn/new-llama/llama.cpp/src/llama.cpp:3069
3069 ggml_backend_free(backend);
#8 0x0000ffff8f16b744 in llama_free (ctx=0x29fc9fc0) at /home/zn/new-llama/llama.cpp/src/llama.cpp:17936
17936 delete ctx;
#9 0x0000000000476d48 in main (argc=12, argv=0xfffffc7fe828) at /home/zn/new-llama/llama.cpp/examples/main/main.cpp:1020
1020 llama_free(ctx);
[Inferior 1 (process 2750884) detached]
Aborted (core dumped)
‘’‘

seems like in the final stream free，cann did't get the right ctx id.

Name and Version

(base) [root@localhost bin]# ./llama-cli --version
version: 3645 (7ea8d80)
built with cc (GCC) 10.3.1 for aarch64-linux-gnu

What operating system are you seeing the problem on?

No response

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

hipudding · 2024-09-14T09:12:09Z

Thanks for the bug report.

…ces(ggerganov#9250)

hipudding · 2024-10-22T08:17:24Z

This issue has been fixed.

znzjugod added bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss) labels Aug 30, 2024

hipudding self-assigned this Sep 14, 2024

hipudding added the Ascend NPU issues specific to Ascend NPUs label Sep 14, 2024

Dou-Git added a commit to Dou-Git/llama.cpp that referenced this issue Sep 24, 2024

fix: A crash occurs when llama-bench is running on multiple cann devi…

4814bdf

…ces(ggerganov#9250)

Dou-Git mentioned this issue Sep 24, 2024

[CANN]: Fix crash when running on multiple cann devices #9627

Merged

4 tasks

github-actions bot added the stale label Oct 15, 2024

hipudding closed this as completed Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: A crash occurs when llama-bench is running on multiple cann devices. #9250

Bug: A crash occurs when llama-bench is running on multiple cann devices. #9250

znzjugod commented Aug 30, 2024

hipudding commented Sep 14, 2024

hipudding commented Oct 22, 2024

Bug: A crash occurs when llama-bench is running on multiple cann devices. #9250

Bug: A crash occurs when llama-bench is running on multiple cann devices. #9250

Comments

znzjugod commented Aug 30, 2024

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

hipudding commented Sep 14, 2024

hipudding commented Oct 22, 2024