Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TensorRT IErrorRecorder Implementation #54

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

maedtb
Copy link
Contributor

@maedtb maedtb commented Jul 13, 2024

  • Adds a new thread-safe python class TrTErrorRecorder which implements the TensorRT IErrorRecorder interface. This class captures errors to display to the user, and can optionally terminate TensorRT processing when errors occur.
  • We now set the error_recorder field on the TensorRT tensorrt.Builder and tensorrt.Runtime classes to an instance of TrTErrorRecorder.
  • We now check for errors while initializing TensorRT engines, raising exceptions if TensorRT reports any errors to us.

- Adds a new thread-safe python class `TrTErrorRecorder` which implements the TensorRT `IErrorRecorder` interface. This class captures errors to display to the user, and can optionally terminate TensorRT processing when errors occur.
- We now set the `error_recorder` field on the TensorRT `tensorrt.Builder` and `tensorrt.Runtime` classes to an instance of `TrTErrorRecorder`.
- We now check for errors while initializing TensorRT engines, raising exceptions if TensorRT reports any errors to us.
@maedtb
Copy link
Contributor Author

maedtb commented Jul 13, 2024

I ran into several errors using TensorRT, and it was unclear why these errors were happening. Adding the TensorRT error reporter cleared up the issues for me entirely. The error messages TensorRT report to us look something like this:

[defaultAllocator.cpp::allocate::31] Error Code 1: Cuda Runtime (out of memory)
[executionContext.cpp::ExecutionContext::565] Error Code 2: OutOfMemory (Requested size was 30152807424 bytes.)

It's not the most user friendly, but it's a lot more helpful of an error than a 'NoneType' object has no attribute 'set_input_shape' :^)

tensorrt_loader.py Outdated Show resolved Hide resolved
@mcmonkey4eva
Copy link

Tested this - it works properly for converting and generating, but an error during generation while it logs properly doesn't raise an exception
image

so it just generates a black image while spamming console with a new error every step

@comfyanonymous
Copy link
Owner

[07/15/2024-23:30:09] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 13075 MiB
[07/15/2024-23:30:09] [TRT] [I] Serialized 16605 bytes of code generator cache.
[07/15/2024-23:30:09] [TRT] [I] Serialized 4704604 bytes of compilation cache.
[07/15/2024-23:30:09] [TRT] [I] Serialized 2218 timing cache entries
Segmentation fault

This PR gives me a segfault when I try to convert SD1.5
image

@maedtb
Copy link
Contributor Author

maedtb commented Aug 7, 2024

Just leaving an note here that I'm planning on investigating these issues this weekend and I haven't abandoned this. Going to pull it into Draft in the mean time.

@maedtb maedtb marked this pull request as draft August 7, 2024 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants