[Performance] How to reduce gpu memory consumption ? #22130
Labels
performance
issues related to performance regressions
stale
issues that have not been addressed in a while; categorized by a bot
Describe the issue
I have a onnx model whose size is only 204.57MB,but when I create the session, gpu memory consumpation comes 1.16GB, when inferencing, the gpu memory consumpation comes to 2.25GB, this result in high inference cost, so how to reduce gpu memory consumption ?
To reproduce
just simply create onnxruntime session with default options. the gpu memory consumption function:
Urgency
No response
Platform
Linux
OS Version
ubuntu 20.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-gpu 1.11.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
11.4
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: