-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: TRT Acceleration API improvement #8018
Comments
Hi @borisfom , Thanks |
Hi @binliunls ! Thanks for the input! |
@binliunls : updated TRTWrapper PR with torch-tensorrt option (not tested yet, may need refinements). |
Here, I implemented first POC for brats_mri_generative_diffusion from model-zoo : |
@binliunls : I did some refactoring of TRTWrapper/trt_wrap API after working on a few use cases : now TRTWrapper is not supposed to replace the original module, but instead it's being saved in the original module and modules's forward() is being monkey-patched with a method that calls TRTWrapper. This way, wrapping can happen before checkpoint load (as model structure and therefore, state dict is unchanged). trt_wrap still returns the original model to facilitate use in configs. |
Hi @borisfom I have some suggestions for this new implement:
Thanks |
|
There is a number of issues with current TRT acceleration path in MONAI:
Describe the solution you'd like
I would like to have a self-contained recipe in inference.json for TRT conversion that would work for multiple inputs with different data types and also would work for cases where only some submodules can be converted to TRT.
Ideally, there should be no explicit TRT export pass, and conversion would happen 'on demand' during inference and cached.
Persistent TRT engines would be rebuilt if I change config or update TRT version.
I would like to have a single inference.json for TRT and non-TRT inference, and have TRT on/off switch on the command line.
Describe alternatives you've considered
I explored few of those zoo models that still need TRT and I think we can come up with an elegant TRT acceleration solution, using extended network definitions in typical inference.json configs utilizing TRTWrapper (from #7990)
network_def would have an optional ‘trt’ section(s) describing which submodules (or the whole net) should be wrapped in TRTWrapper.
convert_to_trt() can probably be fixed for multi-input/multi-datatype nets, especially if we add network_data_format to metadata.json for - but that would still require additional conversion pass(which has to be explicitly rerun each time TRT is upgraded or config changed) etc. Should be way more user-friendly with TRTWrapper.
In a nutshell, sample network_def would look like:
So for each net, one can specify set of submodules that should be wrapped in TRTWrapper (or whole net if no submodule). “path” basename can probably also generated automatically from checkpoint name. but explicit would do as well.
No other configuration or explicit TRT conversion pass would be needed for export! And generated .plan files would be automatically rebuilt on config file change.
We would not need two separate inference.json and inference_trt.json configs - just a top-level ‘trt’ config flag, overridable by command line.
At workflow initialization time, the above config would result in replacement of “image_encoder” attribute of network with :
The text was updated successfully, but these errors were encountered: