Add TensorRT `IErrorRecorder` Implementation #54

maedtb · 2024-07-13T06:42:51Z

Adds a new thread-safe python class TrTErrorRecorder which implements the TensorRT IErrorRecorder interface. This class captures errors to display to the user, and can optionally terminate TensorRT processing when errors occur.
We now set the error_recorder field on the TensorRT tensorrt.Builder and tensorrt.Runtime classes to an instance of TrTErrorRecorder.
We now check for errors while initializing TensorRT engines, raising exceptions if TensorRT reports any errors to us.

- Adds a new thread-safe python class `TrTErrorRecorder` which implements the TensorRT `IErrorRecorder` interface. This class captures errors to display to the user, and can optionally terminate TensorRT processing when errors occur. - We now set the `error_recorder` field on the TensorRT `tensorrt.Builder` and `tensorrt.Runtime` classes to an instance of `TrTErrorRecorder`. - We now check for errors while initializing TensorRT engines, raising exceptions if TensorRT reports any errors to us.

maedtb · 2024-07-13T06:45:11Z

I ran into several errors using TensorRT, and it was unclear why these errors were happening. Adding the TensorRT error reporter cleared up the issues for me entirely. The error messages TensorRT report to us look something like this:

[defaultAllocator.cpp::allocate::31] Error Code 1: Cuda Runtime (out of memory)
[executionContext.cpp::ExecutionContext::565] Error Code 2: OutOfMemory (Requested size was 30152807424 bytes.)

It's not the most user friendly, but it's a lot more helpful of an error than a 'NoneType' object has no attribute 'set_input_shape' :^)

tensorrt_loader.py

mcmonkey4eva · 2024-07-16T03:30:52Z

Tested this - it works properly for converting and generating, but an error during generation while it logs properly doesn't raise an exception

so it just generates a black image while spamming console with a new error every step

comfyanonymous · 2024-07-16T04:09:57Z

[07/15/2024-23:30:09] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 13075 MiB
[07/15/2024-23:30:09] [TRT] [I] Serialized 16605 bytes of code generator cache.
[07/15/2024-23:30:09] [TRT] [I] Serialized 4704604 bytes of compilation cache.
[07/15/2024-23:30:09] [TRT] [I] Serialized 2218 timing cache entries
Segmentation fault

This PR gives me a segfault when I try to convert SD1.5

maedtb · 2024-08-07T22:56:12Z

Just leaving an note here that I'm planning on investigating these issues this weekend and I haven't abandoned this. Going to pull it into Draft in the mean time.

maedtb commented Jul 13, 2024

View reviewed changes

tensorrt_loader.py Outdated Show resolved Hide resolved

Fixed error message prefix in tensorrt_loader.py to be more general.

c03b9c8

maedtb marked this pull request as draft August 7, 2024 22:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TensorRT `IErrorRecorder` Implementation #54

Add TensorRT `IErrorRecorder` Implementation #54

maedtb commented Jul 13, 2024

maedtb commented Jul 13, 2024 •

edited

Loading

mcmonkey4eva commented Jul 16, 2024

comfyanonymous commented Jul 16, 2024

maedtb commented Aug 7, 2024

Add TensorRT IErrorRecorder Implementation #54

Are you sure you want to change the base?

Add TensorRT IErrorRecorder Implementation #54

Conversation

maedtb commented Jul 13, 2024

maedtb commented Jul 13, 2024 • edited Loading

mcmonkey4eva commented Jul 16, 2024

comfyanonymous commented Jul 16, 2024

maedtb commented Aug 7, 2024

Add TensorRT `IErrorRecorder` Implementation #54

Add TensorRT `IErrorRecorder` Implementation #54

maedtb commented Jul 13, 2024 •

edited

Loading