v12: latest CUDA libraries
Compared to v11, this release updated CUDA dependencies to CUDA 11.8.0, cuDNN 8.6.0 and TensorRT 8.5.1:
- Added support for the NVIDIA 40 series GPUs.
- Added support for RIFE on the
trt
backend.
Known issue
- Performance of the
OV_CPU
orORT_CUDA(fp16=True)
backends forRIFE
is lower than expected, which is under investigation. Please considerORT_CPU
orORT_CUDA(fp16=False)
for now. - The
NCNN_VK
backend does not supportRIFE
.
Installation Notes
For some advanced features, vsmlrt.py
requires numpy
and onnx
packages to be available. You might need to run pip install onnx numpy
.
Benchmark
Configuration: NVIDIA RTX 3090, driver 526.47, windows server 2019, vs r60, python 3.11.0, 1080p fp16
Backends: ort-cuda, trt from vs-mlrt v12.
For the trt
backend, the engine is created without CUDA_MODULE_LOADING=LAZY
environment variable and with it during benchmarking to reduce device memory consumption.
Data format: fps / GPU memory usage (MB)
rife(model=44, 1920x1088)
backend | 1 stream | 2 streams |
---|---|---|
ort-cuda | 53.62/1771 | 83.34/2748 |
trt | 71.30/ 626 | 107.3/ 962 |
dpir color
backend | 1 stream | 2 streams |
---|---|---|
ort-cuda | 4.64/3230 | |
trt | 10.32/1992 | 11.61/3475 |
waifu2x upconv_7
backend | 1 stream | 2 streams |
---|---|---|
ort-cuda | 11.07/5916 | 15.04/10899 |
trt | 18.38/2092 | 31.64/ 3848 |
waifu2x cunet
backend | 1 stream | 2 streams |
---|---|---|
ort-cuda | 4.63/8541 | 5.32/16148 |
trt | 11.44/4771 | 15.59/ 8972 |
realesrgan v2/v3
backend | 1 stream | 2 streams |
---|---|---|
ort-cuda | 8.84/2283 | 11.10/4202 |
trt | 14.59/1324 | 21.37/2174 |