Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to use self created stream for a new cuda session #8578

Closed
sunhmy opened this issue Aug 2, 2021 Discussed in #8460 · 4 comments
Closed

Failed to use self created stream for a new cuda session #8578

sunhmy opened this issue Aug 2, 2021 Discussed in #8460 · 4 comments
Labels
ep:CUDA issues related to the CUDA execution provider

Comments

@sunhmy
Copy link

sunhmy commented Aug 2, 2021

Discussed in #8460

Originally posted by sunhmy July 22, 2021
Hi,

I'd like to pick up the multi-stream feature added by using the has_user_compute_stream flag . However, I run into the following a segfault with the following code:

Ort::SessionOptions session_options;
session_options.SetIntraOpNumThreads(10);

session_options.SetGraphOptimizationLevel(ORT_ENABLE_BASIC);
#ifdef USE_CUDA
printf("Use cuda\n");

cudaStream_t stream;
cudaStreamCreate(&stream);

OrtCUDAProviderOptions cuda_options{
0,
OrtCudnnConvAlgoSearch::EXHAUSTIVE,
std::numeric_limits<size_t>::max(),
0,
false,
true,
stream};

session_options.AppendExecutionProvider_CUDA(cuda_options);


my OnnxRuntime version is 1.7.0, CUDA Toolkit version is 10.2, backtrace is as below:

2021-07-22 19:57:46.640710391 [E:onnxruntime:, inference_session.cc:1294 operator()] Exception during initialization: /home/smy/onnxruntime/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:123 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudnnStatus_t; bool THRW = true] /home/smy/onnxruntime/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:117 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudnnStatus_t; bool THRW = true] CUDNN failure 7: CUDNN_STATUS_MAPPING_ERROR ; GPU=0 ; hostname=dg01-baymax-k8s-test001-node-10-52-138-206 ; expr=cudnnSetStream(cudnn_handle_, stream);

terminate called after throwing an instance of 'Ort::Exception'
what(): Exception during initialization: /home/smy/onnxruntime/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:123 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudnnStatus_t; bool THRW = true] /home/smy/onnxruntime/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:117 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudnnStatus_t; bool THRW = true] CUDNN failure 7: CUDNN_STATUS_MAPPING_ERROR ; GPU=0 ; hostname=dg01-baymax-k8s-test001-node-10-52-138-206 ; expr=cudnnSetStream(cudnn_handle_, stream);

Aborted (core dumped)
Program received signal SIGABRT, Aborted.
0x00007fffeff5d207 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libstdc++-4.8.5-36.el7_6.2.x86_64
(gdb) bt
#0 0x00007fffeff5d207 in raise () from /lib64/libc.so.6
#1 0x00007fffeff5e8f8 in abort () from /lib64/libc.so.6
#2 0x00007ffff086c7d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3 0x00007ffff086a746 in ?? () from /lib64/libstdc++.so.6
#4 0x00007ffff086a773 in std::terminate() () from /lib64/libstdc++.so.6
#5 0x00007ffff086a993 in __cxa_throw () from /lib64/libstdc++.so.6
#6 0x000000000040672f in Ort::ThrowOnError(OrtApi const&, OrtStatus*) ()
#7 0x000000000040678e in Ort::ThrowOnError(OrtStatus*) ()
#8 0x0000000000406a87 in Ort::Session::Session(Ort::Env&, char const*, Ort::SessionOptions const&) ()
#9 0x0000000000405a44 in main ()
(gdb) ^CQuit

Any suggestion for how I should overcome this?

Thanks!

@edgchen1 edgchen1 added the ep:CUDA issues related to the CUDA execution provider label Aug 2, 2021
@yuslepukhin
Copy link
Member

Looks like terminate() is called because your main() is not handling the exception. This is according to the C++ standard.

The cause for the exception is an error that cudnnSetStream() returns: CUDNN_STATUS_MAPPING_ERROR. There may be multiple reasons for this.

@sunhmy
Copy link
Author

sunhmy commented Aug 3, 2021

The cause for the exception is an error that cudnnSetStream() returns: CUDNN_STATUS_MAPPING_ERROR. There may be multiple reasons for this.

Thanks for the reply. Any suggestion for how to overcome this error?

@yuslepukhin
Copy link
Member

I would suggest first to check error code from cudaCreateStream() to see that it did not fail and you are feeding a valid stream. Also, make sure you read this

@sunhmy
Copy link
Author

sunhmy commented Aug 9, 2021

Thank you for the suggestion. I see what's going on finally. I didn't realize that the CUDA Stream is device specific, so that I have to set device properly before using those streams on another devices. Now it's working for me and I'm closing this issue.

@sunhmy sunhmy closed this as completed Aug 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider
Projects
None yet
Development

No branches or pull requests

3 participants