Skip to content

v15.5: latest TensorRT library, CoreML backend

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 01 Oct 00:12
· 2 commits to master since this release

TRT

  • Upgraded to TensorRT 10.5.0.
  • Volta GPUs (TITAN V, V100) are no longer supported.

ORT

  • Fix MacOS CoreML support for vsort by @yuygfgg in #106.

    This pull request also added theORT_COREML backend to vsmlrt.py.

General

  • Upgraded to CUDA 12.6.1.

vsmlrt.py

  • Added support for RIFE v4.25 and v4.26 models.

  • Added automatic batch inference support via batch_size option in inference() and flexible_inference(), which may improve device utilization for inference on small inputs using some small models.

    • On the one hand, batching improves utilization by creating more work for each kernel invocation and reducing quantization inefficiency of kernel tiles in bulk parallelism. It also reduces average kernel launch and synchronization overhead per work.
    • On the other hand, however, batching causes cache misses and inserts bubbles in the pipeline that may degrade performance.

    This feature requires flexible output support starting with vs-mlrt v15 and is inspired by styler00dollar/VSGAN-tensorrt-docker@ac47012.

    Note that not all onnx models are supported.

    • Future RIFE v2 models will be fixed to support batch inference.

    benchmark:

    • NVIDIA GeForce RTX 4090
    • driver 560.94
    • Windows Server 2019
    • python 3.12.6, vapoursynth-classic R57.A10, vs-mlrt v15.4
    • input: 720x480 RGBS
    • backend: TRT(fp16=True, use_cuda_graph=True)

    Measurements: FPS / Device Memory (MB)

    model batch 1 batch 2
    realesrgan compact (stream 1) 73.01 / 708 138.68 / 950
    realesrgan compact (streams 2) 107.81 / 914 263.87 / 1347
    realesrgan compact (streams 3) 108.30 / 1128 348.23 / 1738
    realesrgan ultracompact (stream 1) 99.43 / 702 165.52 / 950
    realesrgan ultracompact (streams 2) 184.48 / 908 302.56 / 1344
    realesrgan ultracompact (streams 3) 184.69 / 1114 458.18 / 1738

Full Changelog: v15.4...v15.5