-
-
Notifications
You must be signed in to change notification settings - Fork 16.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add trtexec TensorRT export #6984
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 Hello @triple-Mu, thank you for submitting a YOLOv5 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:
- ✅ Verify your PR is up-to-date with upstream/master. If your PR is behind upstream/master an automatic GitHub Actions merge may be attempted by writing /rebase in a new comment, or by running the following code, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
# git checkout feature # <--- replace 'feature' with local branch name
git merge upstream/master
git push -u origin -f
- ✅ Verify all Continuous Integration (CI) checks are passing.
- ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee
@triple-Mu thanks for the PR! Was not familiar with trtexec. What's the main difference with the default tensorrt export? BTW note that the default TRT export will always be in FP16 mode regardless of --half. We use this by default as we did not observe any mAP drops but did observe significant speedup in --half mode. Full benchmarking results are in #6963 Colab++ V100 High-RAM Results
|
@glenn-jocher Thank you for your reply. trtexec has some optimizations for the machine gpu to export the engine, so the export time may be longer. At the same time, we can view the detailed information in the export process, such as the inference time of random inputs and the time consumption between various layers of the network, which is very convenient. |
@triple-Mu got it, thanks! TRTexec export actually seems faster in your results, i.e. 180 seconds instead of 380 seconds. I tried to run PR but get this error: /bin/sh: 1: /usr/src/tensorrt/bin/trtexec: not found Existing pip install does not appear to install trtexec: pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com |
@glenn-jocher |
@triple-Mu this is the full code I'm using to clone the PR, install requirements and run export. I'm running this in Colab: !git clone https://github.com/triple-Mu/yolov5 -b tripleMu # clone
%cd yolov5
%pip install -qr requirements.txt # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com # install
!python export.py --weights yolov5s.pt --include engine --device 0 --trtexec |
@glenn-jocher |
@triple-Mu this I'm using this code to clone the PR, install requirements and run export. I'm running this in Colab: !git clone https://github.com/triple-Mu/yolov5 -b tripleMu # clone
%cd yolov5
%pip install -qr requirements.txt # install
%pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com # install
!python export.py --weights yolov5s.pt --include engine --device 0 --trtexec But there's no |
Just FYI @glenn-jocher ,
And seems that |
@glenn-jocher |
Actually the |
@triple-Mu quick questions. Is this PR compatible with your new PR #7736 or does the new PR replace this one? |
New pr has nothing to do with the old one. All right,just as you wish is ok.It does not matter for me. |
I tried to add the trtexec tensorrt export and got very interesting results as follows.
1:Use the original export method
The mAP results:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.374
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.570
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.401
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.216
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.423
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.489
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.311
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.516
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.566
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.377
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.627
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.718
The FPS results:
Class Images Labels P R [email protected] [email protected]:.95: 100%|██████████| 5000/5000 [00:37<00:00, 134.16it/s]
all 5000 36335 0.661 0.524 0.615 0.439
Speed: 0.2ms pre-process, 1.5ms inference, 0.5ms NMS per image at shape (1, 3, 640, 640)
2:Use the trtexec export method
The mAP results:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.374
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.571
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.401
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.216
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.423
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.489
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.311
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.516
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.566
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.378
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.718
The FPS results:
Class Images Labels P R [email protected] [email protected]:.95: 100%|██████████| 5000/5000 [00:46<00:00, 108.09it/s]
all 5000 36335 0.659 0.525 0.616 0.44
Speed: 0.2ms pre-process, 3.5ms inference, 0.5ms NMS per image at shape (1, 3, 640, 640)
3:Summarize
Maybe the trtexec export method get good [email protected] and mAR(small/medium)
But it increases inference time from 1.5ms to 3.5ms.
All result images and logs are shown below.
So it will help us to get more accurate results if we use trtexec.
Thanks!
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
WARNING⚠️ this PR is very large, summary may not cover all changes.
🌟 Summary
This PR introduced enhancements to TensorRT export functionality in YOLOv5.
📊 Key Changes
🎯 Purpose & Impact