Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BertQA sample throws segementation fault (TensorRT 10.3) when running GPU Jetson Orin Nano #4220

Open
krishnarajk opened this issue Oct 23, 2024 · 0 comments

Comments

@krishnarajk
Copy link

krishnarajk commented Oct 23, 2024

Description

I tired running the bertQA sample in Jetson Orin nano with jetpack 6.1
I used Bert Base, because Bert Large kills itself when building the engine(may be because of memory issue).

[10/23/2024-13:27:53] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +7, GPU +67, now: CPU 2160, GPU 6001 (MiB)
[10/23/2024-13:27:53] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[10/23/2024-13:28:39] [TRT] [I] Detected 3 inputs and 1 output network tensors.
[10/23/2024-13:28:42] [TRT] [I] Total Host Persistent Memory: 316288
[10/23/2024-13:28:42] [TRT] [I] Total Device Persistent Memory: 110592
[10/23/2024-13:28:42] [TRT] [I] Total Scratch Memory: 0
[10/23/2024-13:28:42] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 164 steps to complete.
[10/23/2024-13:28:43] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 3.28999ms to assign 5 blocks to 164 nodes requiring 1378304 bytes.
[10/23/2024-13:28:43] [TRT] [I] Total Activation Memory: 1378304
[10/23/2024-13:28:43] [TRT] [I] Total Weights Memory: 170059792
[10/23/2024-13:28:43] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU -1, now: CPU 2372, GPU 6707 (MiB)
[10/23/2024-13:28:43] [TRT] [I] Engine generation completed in 51.1302 seconds.
[10/23/2024-13:28:43] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 4 MiB, GPU 384 MiB
[10/23/2024-13:28:43] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3087 MiB
[10/23/2024-13:28:43] [TRT] [I] build engine in 52.969 Sec
[10/23/2024-13:28:44] [TRT] [I] Saving Engine to engines/bert_base_128.engine
[10/23/2024-13:28:44] [TRT] [I] Done.

The I used the inference.py, with the same sample given in the examples.
python3 inference.py -e engines/bert_base_128.engine -p "TensorRT is a high performance deep learning inference platform that delivers low latency and high throughput for apps such as recommenders, speech and image/video on NVIDIA GPUs. It includes parsers to import models, and plugins to support novel ops and layers before applying optimizations for inference. Today NVIDIA is open-sourcing parsers and plugins in TensorRT so that the deep learning community can customize and extend these components to take advantage of powerful TensorRT optimizations for your apps." -q "What is TensorRT?" -v models/fine-tuned/bert_tf_ckpt_base_qa_squad2_amp_128_v19.03.1/vocab.txt
It throws segmenation fault
`
[10/23/2024-13:30:07] [TRT] [I] Loaded engine size: 208 MiB
[10/23/2024-13:30:08] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +8, GPU +70, now: CPU 317, GPU 4590 (MiB)
[10/23/2024-13:30:08] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +7, GPU +64, now: CPU 109, GPU 4379 (MiB)
[10/23/2024-13:30:08] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1, now: CPU 0, GPU 163 (MiB)

Passage: TensorRT is a high performance deep learning inference platform that delivers low latency and high throughput for apps such as recommenders, speech and image/video on NVIDIA GPUs. It includes parsers to import models, and plugins to support novel ops and layers before applying optimizations for inference. Today NVIDIA is open-sourcing parsers and plugins in TensorRT so that the deep learning community can customize and extend these components to take advantage of powerful TensorRT optimizations for your apps.

Question: What is TensorRT?
Segmentation fault (core dumped)
`
** https://github.com/NVIDIA/TensorRT/tree/release/10.3/demo/BERT#model-overview
** I dont use the OSS container, but installed these on device
Image

Please help me over here.

Environment

TensorRT Version: 10.3

NVIDIA GPU: Amper, Jetson Orin nano

NVIDIA Driver Version: Jetpack 6.1

CUDA Version: 12.6

CUDNN Version:

Operating System: 22.04

Python Version (if applicable): 3.10

@krishnarajk krishnarajk changed the title BertQA sample throws segementation fault on TensorRT 10.3 when running GPU Jetson Orin Nano BertQA sample throws segementation fault (TensorRT 10.3) when running GPU Jetson Orin Nano Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant