Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing dGPU - ARC DG2 - decoding errors - edge cases - 4:4:4 12bit #100

Open
bavdevc opened this issue Nov 16, 2022 · 6 comments
Open

testing dGPU - ARC DG2 - decoding errors - edge cases - 4:4:4 12bit #100

bavdevc opened this issue Nov 16, 2022 · 6 comments

Comments

@bavdevc
Copy link

bavdevc commented Nov 16, 2022

Hello @rigaya

atm. I'm testing the Intel ARC dGPU (A380), everything working brilliantly using windows/current windows beta driver (31.0.101.3793) - but linux is a bit troublesome so far (intel devs: 6x kernel driver not ready, backport-i915 some errors, intel media-driver not en par with windows etc.)

linux --check-features output differs from windows:

  • less features available for encoding (AV1)
  • decoding 4:4:4 missing

btw. could you test the 4:4:4 decode so far?

I tried (high bitrate):

  • HEVC 4.4.4 12 bit
    --> that works, but it looks like there are some bitrate limits, lossless HEVC produces some reproducible errors when bitrate gets too high
  • AV1 4:4:4 12 bit
    --> I could not get that working at all - did you?

==> everything else (low bitrate) is working fine, except VC1 decoding, that is painfully slow because of no hardware support in libvpl...and all those mem copy things

btw. if you need some samples/test material, I can provide you those - just tell me where to send those files/links

Kind regards

edit: we need a party in qsvenc - issue #100 now ;-) #100

@rigaya
Copy link
Owner

rigaya commented Nov 17, 2022

Thank you for sharing decode isssues.

  • HEVC 4:4:4 12bit
    I've tested file with 86Mbps HEVC 4:4:4 12bit encoded by x265, but seemed fine. Will you give me an example of bitrate which failes? I'll like to create a file near to that and test.

  • AV1 4:4:4 12bit
    Not working either for me.

    I'm not sure, but it seems like AV1 4:4:4 or 12bit is not supported yet, and Query function (MFXVideoDECODE_Query, which --check-features uses for checking) might be returning false result saying it supports AV1 4:4:4 12bit decode even though it actually does not.

    However, I'll like to keep it as-is, as I want to have --check-features to return raw results of Query functions. The result might be changed in the future driver release.

@bavdevc
Copy link
Author

bavdevc commented Nov 17, 2022

ok, I was testing 4K HDR P3 PQ 444 60fps material - perhaps that was too much for the hardware decoder - avsw working fine with all input files.

source file is Prores 4444 xq working fine with avsw:
plotbitrate_4k_hdr_prores_4444_xq_yuv444p12le
lossless x265 yuv420p10le working fine with avhw:
plotbitrate_4k_hdr_X265_yuv420p10le
lossless x265 yuv422p10le working fine with avhw:
plotbitrate_4k_hdr_X265_yuv422p10le
lossless x265 yuv444p12le crashes hw decoder, only avsw possible:
plotbitrate_4k_hdr_X265_yuv444p12le

but I think those are only edge cases for testing the hardware features - production workflow would not re-encode with libx265 or libaom-av1 444 12bit lossless before further processing

@bavdevc
Copy link
Author

bavdevc commented Nov 17, 2022

  • However, I'll like to keep it as-is, as I want to have --check-features to return raw results of Query functions. The result might be changed in the future driver release.

I think so, too - software stack is getting better and more complete with every version, it's still development in progress

btw. I'm really surprised this little dg2 card can handle 1,493,818 kbit/s input with ease
edit: I think the hardware limitation is below 4,294,967,295 ;-) smells like uint32 in bit/s, last working frame is 1126 in my sample:
plotbitrate_4k_hdr_X265_yuv444p12le_1126

@bavdevc
Copy link
Author

bavdevc commented Nov 18, 2022

just to really complete the decoder test, I also tested all (most combinations) of the other input formats (every format works with avsw, the following list is only for avhw/avqsv):

  • H264 8bit yuv420p - profile main - level 4.0
  • H264 8bit yuv420p - profile main - level 5.0
  • H264 8bit yuv420p - profile high - level 4.0
  • H264 8bit yuv420p - profile high - level 4.1
  • H264 8bit yuv420p - profile predictive 4:4:4 - level 5.1
    Failed to initialize decoder.
    : invalid video parameters.
  • H264 8bit yuv420p - profile predictive 4:4:4 - level 5.2
    Failed to initialize decoder.
    : invalid video parameters.
  • HEVC 10bit yuv420p10le
  • HEVC 10bit yuv422p10le
  • HEVC 12bit yuv444p12le
    going to insane bitrate/lossless:
    MFXDEC: DecodeFrameAsync error: device operation failure..,
    Break in task MFXDEC: device operation failure..
  • MPEG2 8bit yuv420p
  • VP9 10bit yuv420p10le
    MFXDEC: DecodeFrameAsync error: failed to allocate memory..
    Break in task MFXDEC: failed to allocate memory..
    that should be VP9 profile 2 - perhaps not all levels work
  • VP9 12bit yuv444p12le
    MFXDEC: DecodeFrameAsync error: failed to allocate memory..
    Break in task MFXDEC: failed to allocate memory..
    that should be VP9 profile 3 - perhaps not all levels work
  • AV1 10bit yuv420p10le
  • AV1 12bit yuv444p12le
    Failed to initialize decoder.
    : invalid video parameters.
    not implemented yet

btw. I think I'm done decoder testing atm. - I'll keep those ffmpeg/generated test files to test them with all the future driver/qsvencc releases - perhaps I'll automate that step with a little script for windows/linux

@rigaya
Copy link
Owner

rigaya commented Dec 2, 2022

I was able to reproduce the HEVC 12bit 4:4:4 created myself using x265 lossless, running into "device operation failure".

It seems like it might be hardware limitation (or driver issue?), as there were no problem found in the application side, the bitrate of the input file was 4317Mbps, way too high...

@bavdevc
Copy link
Author

bavdevc commented Dec 3, 2022

thank you @rigaya for the confirmation - as you can see in my previous post I could make everything to work with hardware decoding except VP9 decode (tested profile 2+3) - either it is just my test files that go too far or there is still an error somewhere in the complete software stack. (btw. VP9 encoding works, slow but it works - but decoding no chance so far).

btw. I would close that issue #100 at the current state and create a new one if something noteworthy would change to the better or worse in the future if that is ok with you.

btw. one last technical question, perhaps you know the answer or can tell me where I can find some more info:
-> using windows driver and Dx11va I notice there are several threads for GPU tasks:
HWINFO64:
hwinfo_gpu_engines
Taskmanager:
taskmanager_gpu_engines

--> crop/resize and vpp-deinterlace uses the the 1st or the 2nd "Video processing" engines
--> vpp-yadif uses the "GPU compute" engine

==> but why do some movies use "Video decode 1" engine and some others use both "Video Decode" engines? even if the first one is not saturated at all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants