-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorRT execution provider SEGFAULT #7757
Comments
Hi Patrik, just want to check have you tried rebooting your Jetson? |
Hi Olivia, Yes, I have and unfortunately, It didn't help. I know that this segfault is model-related, because on the same Jetson device, with the same environment, other models are working properly. I'm just trying to find out if there is a known issue with TensorRT and some models that have a specific architecture or contain a particular layer. This issue mentions a similar problem, but I don't know if this is my case since my Jetson Xavier has a TensorRT 7.1.3 installed. |
Hi Olivia, I tested this behavior also on RTX2080 and the same issue occurred. The segfault was made by my inattention when I put Ort::Session constructor into a try-catch block, caught an exception, logged an error but didn't exit the process. After that, my code was executing further and it hit a line with Ort::Session::GetInputCount(), which caused the SIGSEGV. However, one problem still bothers me and it's a message I get from Onnxruntime when I call Ort::Session constructor:
As I mentioned before, the same model works with CUDA and CPU EPs so TensorRT is the problem here. Do you have any suggestions? Does this question relate to onnxruntime? |
Hello, I just wanted to ask if you can get me some hints. I'm quite stuck on this issue right now. Thanks a lot. |
because Jetson only supports TensorRT 7.1.x , can you try this prior to building OnnxRuntime with TensorRT EP? |
Hi George, As I mentioned above, the issue was also simulated on NVIDIA RTX2080 so it does not relate to Jetson. However, I did onnx-tensorrt submodule checkout to 7.1 and tried it on Jetson and RTX. It did not help. |
thanks for trying. just to confirm, on RTX2080, you're using TensorRT 7.1 or 7.2 ? |
Both, a system with the RTX (CentOS7) and Jetson have the same versions of cuda/cudnn/tensorrt libraries: I'm not going to promise you but I'll do my best with the minimal repro. |
Hello again, I've attached a smaller model which causes the same error as described above. Please, let me know if you were able to simulate the problem. |
Hi everyone, I just wanted to know if you were able to simulate the issue. Thank you so much. |
sorry haven't been able to test this model yet. will do so. e.g. |
I confirmed that building TensorRT EP with rel-1.8.0 branch and running the symbolic_shape_infer.py on 'net_SSH_ter_1000.onnx' first, the model could be loaded and session created successfully. |
Thank you for the answer. Running the script on my target model did the trick and now the TensorRT EP works. How is that possible? |
outputs may vary due to differences in implementations across TensorRT and CUDA kernels. |
sorry for my inactivity. I was able to find a workaround. Erasing the Reshape layers with -1 in input shapes fixed the TensorRT inference. |
This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
I encountered this problem when operating a UNet model on ORT-GPU 1.15. It works fine if using CUDAExecutionProvider. According to the documentation, I specified all input dynamic axis dimensions, but it doesn't work, below is the main code. sess_options = ort.SessionOptions()
sess_options.add_free_dimension_override_by_name("unet_sample_batch", 2)
sess_options.add_free_dimension_override_by_name("unet_sample_channels", 4)
sess_options.add_free_dimension_override_by_name("unet_sample_height", 64)
sess_options.add_free_dimension_override_by_name("unet_sample_width", 64)
sess_options.add_free_dimension_override_by_name("unet_time_batch", 1)
sess_options.add_free_dimension_override_by_name("unet_hidden_batch", 2)
sess_options.add_free_dimension_override_by_name("unet_hidden_sequence", 77)
trt_ep_options = {
"trt_fp16_enable": True,
"trt_engine_cache_enable": True,
"trt_profile_min_shapes": "sample:2x4x64x64,timestep:1,encoder_hidden_states:2x77x768",
"trt_profile_max_shapes": "sample:32x4x64x64,timestep:1,encoder_hidden_states:32x77x768",
"trt_profile_opt_shapes": "sample:2x4x64x64,timestep:1,encoder_hidden_states:2x77x768",
}
providers = [('TensorrtExecutionProvider', trt_ep_options)]
sess = ort.InferenceSession(model, providers=providers, sess_options=sess_options) |
what is the error encountered? |
The error is
Although the environment is different, it looks like the error is the same. The following is the operating environment Win10 x64, ONNX Runtime 1.15.0 Released Package, Python, TensorrtExecutionProvider, CUDA 11.8+Tensorrt 8.6 |
I tried 2023-06-04 01:08:26.8714279 [E:onnxruntime:Default, tensorrt_execution_provider.h:73
onnxruntime::TensorrtLogger::log] [2023-06-03 17:08:26 ERROR] 3:
getPluginCreator could not find plugin: GroupNorm version: 1
2023-06-04 01:08:39.4348212 [E:onnxruntime:, inference_session.cc:1645
onnxruntime::InferenceSession::Initialize::
<lambda_eb486adf513608dcd45c034ea7ffb8e8>::operator ()] Exception during initialization:
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 :
RUNTIME_EXCEPTION : Exception during initialization: tensorrt_execution_provider.cc:1352
onnxruntime::TensorrtExecutionProvider::GetSupportedList graph_build.Resolve().IsOK() was false. |
Can ORT |
Hi guys,
I'm experiencing an issue with the TensorRT execution provider on Jetson Xavier with JetPack 4.4. Unfortunately, I can't share my model with you, but I was hoping if any of you faced the same issue.
Describe the bug
Once the TensorRT execution provider is added to session options, loading a model fails with the following output.
InitSession Error: ! Exception during initialization: onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:756 SubGraphCollection_t onnxruntime::TensorrtExecutionProvider::GetSupportedList(SubGraphCollection_t, int, int, const onnxruntime::GraphViewer&, bool*) const graph_build.Resolve().IsOK() was false.
Stack trace:
#0 0x0000007f8b59844c in std::__atomic_base::compare_exchange_strong (__m2=std::memory_order_relaxed, __m1=std::memory_order_acquire, __i2=1, __i1=@0x7f7ba4ac94: 0,
this=0x850) at /opt/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/include/c++/7.3.1/bits/atomic_base.h:477
#1 std::atomic_compare_exchange_strong_explicit (__a=0x850, __i1=0x7f7ba4ac94, __i2=1, __m1=std::memory_order_acquire, _m2=std::memory_order_relaxed)
at /opt/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/include/c++/7.3.1/atomic:1125
#2 0x0000007f8b598818 in nsync::atm_cas_acq_u32 (p=0x850, o=0, n=1)
at onnxruntime/cmake/external/nsync/platform/c++11/atomic.h:73
#3 0x0000007f8b598c14 in nsync::nsync_mu_lock (mu=0x850) at external/nsync/cpp/internal/mu.c:148
#4 0x0000007f8a4f19bc in onnxruntime::OrtMutex::lock (this=0x850)
at onnxruntime/include/onnxruntime/core/platform/ort_mutex.h:119
#5 0x0000007f8a4f252c in std::lock_guardonnxruntime::OrtMutex::lock_guard (this=0x7f7ba4ad50, __m=...)
at /opt/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/include/c++/7.3.1/bits/std_mutex.h:162
#6 0x0000007f8a5319bc in onnxruntime::InferenceSession::GetModelInputs (this=0x0)
at onnxruntime/onnxruntime/core/session/inference_session.cc:1658
#7 0x0000007f8a4c1208 in <lambda(const onnxruntime::InferenceSession*)>::operator()(const onnxruntime::InferenceSession ) const (__closure=0x0, session=0x0)
at onnxruntime/onnxruntime/core/session/onnxruntime_c_api.cc:896
#8 0x0000007f8a4c123c in <lambda(const onnxruntime::InferenceSession)>::_FUN(const onnxruntime::InferenceSession ) ()
at onnxruntime/onnxruntime/core/session/onnxruntime_c_api.cc:896
#9 0x0000007f8a4c1398 in GetNodeDefListCountHelper (sess=0x0, get_fn=0x7f8a4c1218 <<lambda(const onnxruntime::InferenceSession)>::_FUN(const onnxruntime::InferenceSession *)>,
out=0x7f7ba4b0f8) at onnxruntime/onnxruntime/core/session/onnxruntime_c_api.cc:903
#10 0x0000007f8a4c14dc in OrtApis::SessionGetInputCount (sess=0x0, out=0x7f7ba4b0f8)
at onnxruntime/onnxruntime/core/session/onnxruntime_c_api.cc:912
#11 0x0000007fb4f37e2c in Ort::Session::GetInputCount (this=0x7f74002888)
at onnxruntime/include/onnxruntime/core/session/onnxruntime_cxx_inline.h:534
...
...
System information
Additional context
If I try other EPs ( Cuda or CPU), the error disappears so it only relates to TensorRT.
The text was updated successfully, but these errors were encountered: