nvidia hardware decoder support #1296

bradh · 2024-09-03T01:55:14Z

No description provided.

avc: additional avcC box parsing

minor encoder reporting cleanup

…heif-dec

farindk · 2024-09-03T22:10:08Z

I get this cmake error when trying to compile:

Compiling 'nvdec' as built-in backend
Not compiling 'libsharpyuv'
-- Configuring done

CMake Error at libheif/CMakeLists.txt:97 (add_library):
  Target "heif" links to target "CUDA::cuda_driver" but the target was not
  found.  Perhaps a find_package() call is missing for an IMPORTED target, or
  an ALIAS target is missing?

From where do I get the cmake config script? I have installed nvidia-cuda-dev.

farindk · 2024-09-03T22:32:10Z

It compiles correctly despite the error above. Probably that dependency is not needed.

bradh · 2024-09-03T23:14:27Z

The cmake config script comes with cmake, I think. nvidia-cuda-toolkit is probably the package on ubuntu.

Possibly the dependency is only needed for runtime, and perhaps only for JPEG decoding.

farindk · 2024-09-03T23:24:10Z

I didn't manage to make it work, but that is probably due to the very old laptop (2011) which I could not convince to use the GPU instead of the built-in Intel graphics (cuDeviceGet() failed). I'll have to find a more recent computer.

farindk · 2024-09-05T14:48:10Z

We might also need an extended version of does_support_format() in the plugin interface because some hardware decoders may support only subsets of the standards, like only 8 bit and 10 bit decoding, but not 12 bit.

bradh · 2024-09-06T01:44:24Z

We might also need an extended version of does_support_format() in the plugin interface because some hardware decoders may support only subsets of the standards, like only 8 bit and 10 bit decoding, but not 12 bit.

I think we probably need to just be tolerant of a software or hardware decoder that claims to support a format, but can't quite do it after all. I'm thinking of cases where it needs a special codec feature that only shows up once you're down at the NAL unit level. In that case, we'd just try another decoder if there was one, and the user hadn't asked for a specific implementation. So maybe that is enough.

If we do want to does_support_format2(...), NVIDIA decoding capabilities are on:

chroma format
bit size
max coded height
max coded width
min coded height
min coded width
maximum number of macro blocks (which we can derive from height and width limits)

I'm not sure about other hardware / accelerated implementation. Will see if I can find out what Intel does.

…path (strukturag#1307)

… bradh-nvdev_merge_2

farindk · 2024-09-13T08:48:05Z

I could now test the nvidia decoder on a GT1030 with h.265.

When parallel tile decoding is enabled, it stops somewhere in the middle with a "cannot get CUDA context" error. Probably the number of parallel decoders is limited.

Without parallel decoding, it works, but I was surprised how slow it is. NVidia hardware decoding: 9.0s versus 0.7s with the libde265 software decoder (both without parallel tile decoding). My guess is that the hardware setup time is slow.

AVC and JPEG also work, but they are also much slower than the software decoders. I could not test AV1.

What are your experiences?

farindk · 2024-09-13T09:07:43Z

The problem is that cuCtxCreate takes 0.065 secs and cuCtxDestroy another 0.035 secs. That makes 0.1 secs for each call to does_support_format or decode_image. If we call these two functions for every tile, this adds up.

The supported formats should be easy to cache. Caching a decoder context is not so easy. That might require a plugin function to release the cached decoder at the end of each image. Or maybe we should even keep the cached decoder for even longer in case we are doing batch conversion of many images. Then, we'd call the cache cleanup function after a short time delay.

farindk · 2024-09-13T09:42:19Z

I've measure the actual decoding time (excluding all initializations and conversions) for a tiled h265 image.

nvdec: 0.42sec
libde265: 0.55sec
ffmpeg: 0.40sec (I think this is the software codec)

That is what we can expect with perfect caching.

Neoclassic · 2024-10-20T19:28:04Z

Thanks bradh, Functionally working fine as i have a requirement to decode heic image on GPU.
But slow for heic image with tiles (6x8 tile heics that iphone produces). Mostly as cuda needs to be initilaized for each 48 grid tile.

static int nvdec_does_support_format(enum heif_compression_format format)
Do we really need to check it everytime? Can it be cached ?
Because i hardcoded the method to return 120 and now time is down to 8seconds

export CUDA_DEVICE_MAX_CONNECTIONS=2 or higher
Else the time will be around 45 seconds

`time ./examples/heif-dec ~/ffmpeg_test/testfiles/LiveOff.HEIC out.png
[istream] request_range 0 - 1024
[istream] request_range 24 - 3946
[istream] request_range 15119 - 17157
File contains 1 image
[istream] request_range 17157 - 24447
[istream] request_range 24447 - 46783
GPU in use: Tesla T4
[istream] request_range 78025 - 103598
GPU in use: Tesla T4
[istream] request_range 46783 - 78025
GPU in use: Tesla T4
GPU in use: Tesla T4
[istream] request_range 103598 - 126917
[istream] request_range 126917 - 146390
GPU in use: Tesla T4
GPU in use: Tesla T4
[istream] request_range 146390 - 162298
[istream] request_range 162298 - 174934
GPU in use: Tesla T4
GPU in use: Tesla T4
[istream] request_range 174934 - 178765
GPU in use: Tesla T4
[istream] request_range 178765 - 209539
[istream] request_range 209539 - 256166
GPU in use: Tesla T4
GPU in use: Tesla T4
[istream] request_range 256166 - 297666
GPU in use: Tesla T4
[istream] request_range 297666 - 333953
GPU in use: Tesla T4
[istream] request_range 333953 - 350529
[istream] request_range 362678 - 375496
GPU in use: Tesla T4
[istream] request_range 350529 - 362678
GPU in use: Tesla T4
GPU in use: Tesla T4
[istream] request_range 375496 - 380664
GPU in use: Tesla T4
[istream] request_range 380664 - 407373
GPU in use: Tesla T4
[istream] request_range 407373 - 435430
[istream] request_range 435430 - 475399
GPU in use: Tesla T4
[istream] request_range 475399 - 500341
GPU in use: Tesla T4
GPU in use: Tesla T4
[istream] request_range 500341 - 521156
GPU in use: Tesla T4
[istream] request_range 521156 - 534960
GPU in use: Tesla T4
[istream] request_range 534960 - 548606
[istream] request_range 548606 - 554956
GPU in use: Tesla T4
GPU in use: Tesla T4
[istream] request_range 554956 - 579136
GPU in use: Tesla T4
[istream] request_range 579136 - 612623
GPU in use: Tesla T4
[istream] request_range 612623 - 639585
GPU in use: Tesla T4
[istream] request_range 639585 - 672260
GPU in use: Tesla T4
[istream] request_range 672260 - 692049
GPU in use: Tesla T4
[istream] request_range 692049 - 707221
GPU in use: Tesla T4
[istream] request_range 707221 - 718533
GPU in use: Tesla T4
[istream] request_range 718533 - 736781
[istream] request_range 736781 - 764323
GPU in use: Tesla T4
GPU in use: Tesla T4
[istream] request_range 764323 - 787729
GPU in use: Tesla T4
[istream] request_range 787729 - 809625
GPU in use: Tesla T4
[istream] request_range 809625 - 832233
GPU in use: Tesla T4
[istream] request_range 832233 - 852712
GPU in use: Tesla T4
[istream] request_range 852712 - 872658
GPU in use: Tesla T4
[istream] request_range 872658 - 886348
GPU in use: Tesla T4
[istream] request_range 886348 - 904055
GPU in use: Tesla T4
[istream] request_range 904055 - 922677
GPU in use: Tesla T4
[istream] request_range 922677 - 930679
GPU in use: Tesla T4
[istream] request_range 930679 - 937882
[istream] request_range 937882 - 946358
GPU in use: Tesla T4
GPU in use: Tesla T4
[istream] request_range 946358 - 971564
GPU in use: Tesla T4
[istream] request_range 971564 - 990224
GPU in use: Tesla T4
[istream] request_range 990224 - 1003954
GPU in use: Tesla T4
Written to out.png

real 0m17.878s
user 0m5.555s
sys 0m25.650s`

bradh · 2024-10-20T23:06:31Z

Mostly as cuda needs to be initilaized for each 48 grid tile.

Yes. This can be optimised, as identified by Dirk last month. That work hasn't been done yet though.

Neoclassic · 2024-10-21T21:33:49Z

@bradh @farindk Any other planned/unplanned optimizations in this specially with regard to grid based images as there is a lot to be decoded there as well as to and fro from GPU->CPU.

bradh · 2024-10-21T21:45:37Z

@bradh @farindk Any other planned/unplanned optimizations in this specially with regard to grid based images as there is a lot to be decoded there as well as to and fro from GPU->CPU.

Nothing specific.

There is no guarantee this will even be added to libheif, let alone when or in what form. Please be realistic in your expectations - right now there is no customer requirement for it, and no specific funding.

bradh added 14 commits September 3, 2024 08:40

minor encoder reporting cleanup

6e82380

nvdec: initial merge

956ba09

nvdec: enable HEVC support

8c7cbe0

nvdec: clean up HEVC

6540c04

nvdec: add AVC support

346bea6

nvidia: add to CI

d57eebf

nvdec: additional CI fixes

b4134f1

ci: fix nvdec option

5cd9fc3

nvidia: add to cmake presets

41c129a

avc: additional avcC box parsing

ec2bb0b

nvdec: additional CI packages

b8f8ffb

nvdec: remove mandatory requirement for CUDA

e2768b4

nvdec: only link cuda if found

348a37e

nvdec: check if we can build without cuda-dev

81fc370

bradh marked this pull request as ready for review September 3, 2024 04:54

farindk added 7 commits September 3, 2024 23:19

Merge pull request strukturag#1297 from bradh/avcc_parse

cc359f2

avc: additional avcC box parsing

AVC: pass SPS-Ext to decoder (strukturag#1297)

015c9cf

AVC: dump seq-ext parameters (strukturag#1297)

0672d7d

Merge pull request strukturag#1295 from bradh/encoder_cleanup_2024-09-03

c22db22

minor encoder reporting cleanup

heif-enc: similar output of uncompressed codec in encoder list as in …

321ae91

…heif-dec

heif-dec sort list of decoders

541782e

avc_box: adapt test output

e65c9a3

tild: fix parsing of 'tiles_are_sequential' and more dump output

c178400

heif_reader_range_request_result: allow 'overreading' a range request

d9928fe

tild: omit writing image size and take it from ispe instead

8cc6c79

farindk mentioned this pull request Sep 9, 2024

Windows fixes #1302

Closed

bradh and others added 19 commits September 10, 2024 15:26

Windows build fix for C++ version.

aab4c9f

use C++17 [[fallthrough]]

83c0954

Windows alternatives for unistd and friends.

831532a

fix constness of getopt* implementation for Windows

4af34ee

safe integer check in ftyp parsing

24a2435

avcC: fix test, write extended avcC fields, error handling

7b7ac8e

fix windows compilation (strukturag#1302)

252ed4a

remove unnecessary check for _WIN64 (strukturag#1302)

9bd33c0

fix windows compilation, undefined 'cnt' (strukturag#1302)

64c8cae

tild: remove support for 64bit dimensions and cleanup

2395082

limit maximum memory allocation (should fix ClusterFuzz 71389)

9483961

iden: make sure that references image item exists

10e455b

iovl: detect self-references

f50ef3b

define chroma-420 sample position enum

4e9eb8e

fix HeifContext::has_alpha() for broken input (fixes strukturag#1305)

e962b59

move get_tile_size() into ImageItem class

4563a2f

url-box: parse 'data-in-same-file' flag

487318c

add option to set plugin install directory independently from search …

3a043d9

…path (strukturag#1307)

Merge branch 'nvdev_merge_2' of https://github.com/bradh/libheif into…

808fbd7

… bradh-nvdev_merge_2

nvdec: show gfxcard name in plugin description

3608adf

bradh mentioned this pull request Oct 5, 2024

Use hevc_cuvid instead of software decoding if available #1322

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvidia hardware decoder support #1296

nvidia hardware decoder support #1296

bradh commented Sep 3, 2024

farindk commented Sep 3, 2024 •

edited

Loading

farindk commented Sep 3, 2024

bradh commented Sep 3, 2024

farindk commented Sep 3, 2024 •

edited

Loading

farindk commented Sep 5, 2024

bradh commented Sep 6, 2024

farindk commented Sep 13, 2024

farindk commented Sep 13, 2024

farindk commented Sep 13, 2024 •

edited

Loading

Neoclassic commented Oct 20, 2024 •

edited

Loading

bradh commented Oct 20, 2024

Neoclassic commented Oct 21, 2024

bradh commented Oct 21, 2024

nvidia hardware decoder support #1296

Are you sure you want to change the base?

nvidia hardware decoder support #1296

Conversation

bradh commented Sep 3, 2024

farindk commented Sep 3, 2024 • edited Loading

farindk commented Sep 3, 2024

bradh commented Sep 3, 2024

farindk commented Sep 3, 2024 • edited Loading

farindk commented Sep 5, 2024

bradh commented Sep 6, 2024

farindk commented Sep 13, 2024

farindk commented Sep 13, 2024

farindk commented Sep 13, 2024 • edited Loading

Neoclassic commented Oct 20, 2024 • edited Loading

bradh commented Oct 20, 2024

Neoclassic commented Oct 21, 2024

bradh commented Oct 21, 2024

farindk commented Sep 3, 2024 •

edited

Loading

farindk commented Sep 3, 2024 •

edited

Loading

farindk commented Sep 13, 2024 •

edited

Loading

Neoclassic commented Oct 20, 2024 •

edited

Loading