Failed to use self created stream for a new cuda session #8578

sunhmy · 2021-08-02T00:30:37Z

Discussed in #8460

^{Originally posted by sunhmy July 22, 2021}
Hi,

I'd like to pick up the multi-stream feature added by using the has_user_compute_stream flag . However, I run into the following a segfault with the following code:

Ort::SessionOptions session_options;
session_options.SetIntraOpNumThreads(10);

session_options.SetGraphOptimizationLevel(ORT_ENABLE_BASIC);
#ifdef USE_CUDA
printf("Use cuda\n");

cudaStream_t stream;
cudaStreamCreate(&stream);

OrtCUDAProviderOptions cuda_options{
0,
OrtCudnnConvAlgoSearch::EXHAUSTIVE,
std::numeric_limits<size_t>::max(),
0,
false,
true,
stream};

session_options.AppendExecutionProvider_CUDA(cuda_options);

my OnnxRuntime version is 1.7.0, CUDA Toolkit version is 10.2, backtrace is as below:

2021-07-22 19:57:46.640710391 [E:onnxruntime:, inference_session.cc:1294 operator()] Exception during initialization: /home/smy/onnxruntime/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:123 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudnnStatus_t; bool THRW = true] /home/smy/onnxruntime/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:117 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudnnStatus_t; bool THRW = true] CUDNN failure 7: CUDNN_STATUS_MAPPING_ERROR ; GPU=0 ; hostname=dg01-baymax-k8s-test001-node-10-52-138-206 ; expr=cudnnSetStream(cudnn_handle_, stream);

terminate called after throwing an instance of 'Ort::Exception'
what(): Exception during initialization: /home/smy/onnxruntime/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:123 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudnnStatus_t; bool THRW = true] /home/smy/onnxruntime/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:117 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudnnStatus_t; bool THRW = true] CUDNN failure 7: CUDNN_STATUS_MAPPING_ERROR ; GPU=0 ; hostname=dg01-baymax-k8s-test001-node-10-52-138-206 ; expr=cudnnSetStream(cudnn_handle_, stream);

Aborted (core dumped)
Program received signal SIGABRT, Aborted.
0x00007fffeff5d207 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libstdc++-4.8.5-36.el7_6.2.x86_64
(gdb) bt
#0 0x00007fffeff5d207 in raise () from /lib64/libc.so.6
#1 0x00007fffeff5e8f8 in abort () from /lib64/libc.so.6
#2 0x00007ffff086c7d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3 0x00007ffff086a746 in ?? () from /lib64/libstdc++.so.6
#4 0x00007ffff086a773 in std::terminate() () from /lib64/libstdc++.so.6
#5 0x00007ffff086a993 in __cxa_throw () from /lib64/libstdc++.so.6
#6 0x000000000040672f in Ort::ThrowOnError(OrtApi const&, OrtStatus*) ()
#7 0x000000000040678e in Ort::ThrowOnError(OrtStatus*) ()
#8 0x0000000000406a87 in Ort::Session::Session(Ort::Env&, char const*, Ort::SessionOptions const&) ()
#9 0x0000000000405a44 in main ()
(gdb) ^CQuit

Any suggestion for how I should overcome this?

Thanks!

yuslepukhin · 2021-08-02T16:02:10Z

Looks like terminate() is called because your main() is not handling the exception. This is according to the C++ standard.

The cause for the exception is an error that cudnnSetStream() returns: CUDNN_STATUS_MAPPING_ERROR. There may be multiple reasons for this.

sunhmy · 2021-08-03T02:58:06Z

The cause for the exception is an error that cudnnSetStream() returns: CUDNN_STATUS_MAPPING_ERROR. There may be multiple reasons for this.

Thanks for the reply. Any suggestion for how to overcome this error?

yuslepukhin · 2021-08-06T19:06:50Z

I would suggest first to check error code from cudaCreateStream() to see that it did not fail and you are feeding a valid stream. Also, make sure you read this

sunhmy · 2021-08-09T01:56:21Z

Thank you for the suggestion. I see what's going on finally. I didn't realize that the CUDA Stream is device specific, so that I have to set device properly before using those streams on another devices. Now it's working for me and I'm closing this issue.

edgchen1 added the ep:CUDA issues related to the CUDA execution provider label Aug 2, 2021

sunhmy closed this as completed Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to use self created stream for a new cuda session #8578

Failed to use self created stream for a new cuda session #8578

sunhmy commented Aug 2, 2021

yuslepukhin commented Aug 2, 2021

sunhmy commented Aug 3, 2021

yuslepukhin commented Aug 6, 2021

sunhmy commented Aug 9, 2021

Failed to use self created stream for a new cuda session #8578

Failed to use self created stream for a new cuda session #8578

Comments

sunhmy commented Aug 2, 2021

Discussed in #8460

yuslepukhin commented Aug 2, 2021

sunhmy commented Aug 3, 2021

yuslepukhin commented Aug 6, 2021

sunhmy commented Aug 9, 2021