Skip to content
WolframRhodium edited this page Mar 15, 2023 · 19 revisions

Real-CUGAN is a super-resolution neural network for anime-style arts, based on the waifu2x-cunet network and trained by bilibili on millions of anime images with a RealESRGANv2-like approach.

Link:

Models

The models support upscaling by 2x/3x/4x and also denoising.

  • scale: 2 or 3 or 4
  • noise: -1, 0, 1, 2, 3 (like waifu2x), 1/2 is only supported by scale=2.

vsmlrt.py wrapper Usage

In order to simplify usage, we provided a Python wrapper module vsmlrt (release v7 or above).

from vsmlrt import CUGAN, Backend

src = core.std.BlankClip(format=vs.RGBS) # only supports RGBS input formats

# clamp src to be safe as out of range values will produce large negative output.
src = core.akarin.Expr(src, "x 0 1 clamp")

# backend could be:
#  - CPU Backend.OV_CPU(): the recommended CPU backend; generally faster than ORT-CPU.
#  - CPU Backend.ORT_CPU(num_streams=1, verbosity=2): vs-ort cpu backend.
#  - GPU Backend.ORT_CUDA(device_id=0, cudnn_benchmark=True, num_streams=1, verbosity=2)
#     - use device_id to select device
#     - set cudnn_benchmark=False to reduce script reload latency when debugging, but with slight throughput performance penalty.
#  - GPU Backend.TRT(fp16=True, device_id=0, num_streams=1): TensorRT runtime, the fastest NV GPU runtime.
flt = CUGAN(src, noise=-1, scale=2, backend=Backend.ORT_CUDA())

Notes

  1. Make sure your RGBS input to CUGAN is within [0,1] range. Out of range values will trip the NN into producing large negative values.

Benchmarking

Measurements: FPS / Device Memory (MB)

Device memory:

  • CPU: private memory including VapourSynth
  • GPU: device memory including context

RTX 3090

Software: VapourSynth R57, Windows 10 LTSC 2021, Graphics Driver 511.23.

Input size: 1920x1080

Backends

  1. vs-mlrt v7
  2. Real-CUGAN 7e77b85
  3. vs-mlrt v8 (driver 511.79)

Performance

FP32

Model [1] ort-cuda [2] pytorch [3] ort-cuda
2x 3.30 / 10445 2.36 / 20076 3.24 / 10251
3x (540p patch) 1.52 / 9978 0.77 / 19304
4x 1.96 / 18377 1.25 / 22353 1.93 / 18183

FP16

Model [1] ort-cuda [2] pytorch [3] ort-cuda
2x 4.27 / 10185 3.29 / 12258 4.40 / 9991
3x 1.61 / 19007 1.55 / 21816 1.62 / 23442
4x 2.30 / 10181 1.43 / 13616 2.40 / 9987

Tesla A100 (SXM4, 80 GB)

Software: VapourSynth R57-A4, Windows Server 2022, Graphics Driver 516.94.

Input size: 1920x1080

Backends

  1. vs-mlrt v9

Performance

FP16

Model [1] trt [1] trt (2 streams)
2x 19.4 / 4647 26.9 / 8558

EPYC Milan

Hardware: EPYC Milan 32C64T @2.55 GHz

Software: VapourSynth R57, Windows Server 2019.

Input size: 1920x1080

Backends

  1. vs-mlrt v7

Performance

FP32

Model [1] ov-cpu
2x 0.20 / 22627
3x 0.094 / 40358
4x 0.18 / 53174