Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] HwFramesContext config method support #1

Closed
wants to merge 1 commit into from

Conversation

hbiyik
Copy link

@hbiyik hbiyik commented Jan 1, 2024

This is a first attempt to provide a way for mpp and kodi to work with the decoder.

With above changes mpv works, however there still seems to be copies involved it is slower than expected.

I have some ideas on whats going on but I will update when i have a proper understanding

@nyanmisaka
Copy link
Owner

Can MPV and Kodi handle DRM_PRIME (NV15) natively?

@hbiyik
Copy link
Author

hbiyik commented Jan 1, 2024

i think this is more of task for EGL interface (mesa), what both kodi and mpv does is receive an DRMPRimeFrame, and import this to EGL interface using attribs, so they should both support as long as mesa does, but current mesa-panfork does not, may be panfork panthor will do.

On the otherhand, kodi can bypass EGL and directly render on DRM planes thorugh kernel but thats always buggy, i dont use it.

@nyanmisaka
Copy link
Owner

Then the following options are available:

  • Modify MPV and Kodi to allow them to use RGA filters.
  • Glue RGA and MPP_DEC together in FFmpeg.
  • Add proper RGA support in MPP libs, allowing MPP_DEC to output converted images, just like IEP de-interlacing.
  • Wait for panthor to support NV15.

@hbiyik
Copy link
Author

hbiyik commented Jan 1, 2024

Then the following options are available:

  • Modify MPV and Kodi to allow them to use RGA filters.

not maintainable

  • Glue RGA and MPP_DEC together in FFmpeg.

i already tried, this works to certain extend with spagetti code

  • Add proper RGA support in MPP libs, allowing MPP_DEC to output converted images, just like IEP de-interlacing.

i doubt this will ever happen

  • Wait for panthor to support NV15.

i think this is the only reasonable, future-proof way forward at the moment. Meanwhile i can try to take advantage of HwFramesContext config method to take advantage of NV12/NV16 DRM Prime frames

@hbiyik
Copy link
Author

hbiyik commented Jan 1, 2024

What i am trying to do in this PR is:

When you define the HWAccel of the decoder with flag AV_CODEC_HW_CONFIG_METHOD_HW_DEVICE_CTX, the client is responsible for creating creating the device and providing its context. Since RKMPP device is newly invented no client supports this.

However at least for decoder (not rga related things), RKMPP device is actually a DRM device its own quirks. There are 2 ways to accomplish the compliancy.

  1. Provide multiple HWDeviceConfigs:

See .hwconfigs is actually an array, so one decoder can have multiple hwaccel device. So it is possible to add another hwaccel here but with .device_type = AV_HWDEVICE_TYPE_DRM. However i think FFMpeg has some issues there. Ie:

ff_get_format is actually only checking the first matched HWAccel but not the array and quits here. I think this is either a bug, or multiple hwaccel device is not a very common practice so i did not go this path.

  1. Allow client to use HWFramesCtx.

When the decoder claims that it supports AV_CODEC_HW_CONFIG_METHOD_HW_FRAMES_CTX method, the client does not necessarily need to give the device context, instead, codec should create its own hwdevice context, and the client will receive DRMPRIME frames over the hw_frames_ctx. This fits better to this decoder and this is what i am trying to implement.

@nyanmisaka
Copy link
Owner

ff_get_format is actually only checking the first matched HWAccel but not the array and quits here. I think this is either a bug, or multiple hwaccel device is not a very common practice so i did not go this path.

Generally speaking, in FFmpeg whenever a new hardware acceleration is added, a corresponding HW pix_fmt is also added, and frames_derive_to/from is used to import/export to other underlying HW pix_fmt. So it is a rare usage when you want to add hwaccel for rkmppdec but still borrow AV_PIX_FMT_DRM_PRIME.

@nyanmisaka
Copy link
Owner

nyanmisaka commented Jan 1, 2024

Manually merged in 6584518

@nyanmisaka nyanmisaka closed this Jan 1, 2024
@hbiyik
Copy link
Author

hbiyik commented Jan 1, 2024

ooops, please do not merge yet, this hardly work as of now.

@hbiyik
Copy link
Author

hbiyik commented Jan 1, 2024

one question:
49cc93e

is this a good idea? those headers are exported to the system and it will cause all drm based other software to be recompiled due to this change. basically an API change for drm interface here.

@hbiyik
Copy link
Author

hbiyik commented Jan 1, 2024

May be use one layer with a specific format like FOURCC("MBUF") and assign it to the some object index, and get the ptr from the mppbuffer? Still a hack but a non breaking one.

@hbiyik
Copy link
Author

hbiyik commented Jan 1, 2024

Or add the new data to the end of structure, so the old interface variables will not be repostioned

UPDATE:
it will not help to move to the end of AVDRMObjectDescriptor,
new stuff should be at the end of AVDRMFrameDescriptor, because it has multiple members of AVDRMObjectDescriptors, and the new oversized AVDRMObjectDescriptor will overwrite the .nb_planes member.

Even if the ptr and opaque would be a member of AVDRMObjectDescriptor, it would only work when AVDRMObjectDescriptor is not used as an array. But that unconventional, a cleaner way would be to use layers may be...

@nyanmisaka
Copy link
Owner

Yeah it’s indeed a breaking change for upstreaming. So, either append an new array after layers to correspond to the objects, or create a new format AV_PIX_FMT_RKMPP instead.

@nyanmisaka
Copy link
Owner

nyanmisaka commented Jan 1, 2024

It would be nice if they could reserve some bits when defining AVDRMObjectDescriptor. It's not very intuitive to separate ptr and opaque from it.

@hbiyik
Copy link
Author

hbiyik commented Jan 1, 2024

with some code clean up, there will be no need for MppBuffer object, then only thing necessary is to check if fd is mmapped or not (thats why u need ptr i think).

I remember there was a way to do this without any flags and just by mmap but i need to dig in little bit.

Particularly below part, does it really have to check desc->objects[i].ptr?

static int rkmpp_map_frame(AVHWFramesContext *hwfc,
                           AVFrame *dst, const AVFrame *src, int flags)
{
 ....
    av_assert0(desc->nb_objects <= AV_DRM_MAX_PLANES);
    for (i = 0; i < desc->nb_objects; i++) {
        if (desc->objects[i].ptr) {
            addr = desc->objects[i].ptr;
            map->unmap[i] = 0;

i think you are doing an optimization above to prevent unnecessary unmap and mmaps.

In that case may be we can introduce a format modifier to flag the descriptor as mmapped or not.

@hbiyik
Copy link
Author

hbiyik commented Jan 1, 2024

ok i created a seperate Pr for this, i think this is the cleaniest way, no hacks involved.

@hbiyik
Copy link
Author

hbiyik commented Jan 2, 2024

this works pretty fine.
here is the below command to get mpv working:

--vo=gpu-next is optional but it seems it works rather better
--swapchain-depth is optional but helps to sync the buffers better

simple drm_prime output:
mpv --swapchain-depth=8 --vo=gpu-next --hwdec=rkmpp pathtofile

avg hevc performance:
8bit: fps= 8k@30fps
10bit: not supported

AFBC->non afbc (10bit to 8bit)
mpv --swapchain-depth=8 --vo=gpu-next --vd-lavc-o=buf_mode=ext,afbc=on --vf=scale_rkrga=format=nv12 --hwdec=rkmpp pathtofile

avg hevc performance:
8bit afbc -> 8bit: fps= 8k@60fps
10bit afbc->8bit: 8k@60fps

AFBC->non afbc (HDR to HDR)
mpv --swapchain-depth=8 --vo=gpu-next --vd-lavc-o=buf_mode=ext,afbc=on --vf=scale_rkrga=format=p010 --hwdec=rkmpp pathtofile

avg hevc performance:
10bit afbc-> 10bit: 8k@30fps

mpv requires below path to play P010 drm frames.

diff --git a/video/out/hwdec/hwdec_drmprime.c b/video/out/hwdec/hwdec_drmprime.c
index 290f11c..9c63ab4 100644
--- a/video/out/hwdec/hwdec_drmprime.c
+++ b/video/out/hwdec/hwdec_drmprime.c
@@ -119,6 +119,7 @@
     MP_TARRAY_APPEND(p, p->formats, num_formats, IMGFMT_NV12);
     MP_TARRAY_APPEND(p, p->formats, num_formats, IMGFMT_420P);
     MP_TARRAY_APPEND(p, p->formats, num_formats, pixfmt2imgfmt(AV_PIX_FMT_NV16));
+    MP_TARRAY_APPEND(p, p->formats, num_formats, pixfmt2imgfmt(AV_PIX_FMT_P010));
     MP_TARRAY_APPEND(p, p->formats, num_formats, 0); // terminate it
 
     p->hwctx.hw_imgfmt = IMGFMT_DRMPRIME;

may be it would be a good idea to add a format option like 8bit for scale_rkrga so that it will auto select the correct 8bit variant of a 10bit frame variant, ie: NV15->NV12 and NV20->NV16

@nyanmisaka
Copy link
Owner

may be it would be a good idea to add a format option like 8bit for scale_rkrga so that it will auto select the correct 8bit variant of a 10bit frame variant, ie: NV15->NV12 and NV20->NV16

The swscale format filter of ffmpeg can accept multiple pixel formats and choose the appropriate one, like this format=nv12|nv16. So it is possible to modify the rkrga filter options and parse them manually.

@hbiyik
Copy link
Author

hbiyik commented Jan 2, 2024

fmt1|fmt2 looks cleaner.

One another thing, mesa EGL requires 32 byte aligned frames however decoder outputs 16byte aligned frames and this causes drm prime frames not to be imported properly and the screen outputs green or blue.

Is it possible to add a hstride option to scale_rkrga filter? so that after rga chain the frame could be imported to EGL properly.

@nyanmisaka
Copy link
Owner

Is it possible to add a hstride option to scale_rkrga filter? so that after rga chain the frame could be imported to EGL properly.

Sure.

layer->planes[i-1].pitch * (hwfc->height >> (i > 1 ? pixdesc->log2_chroma_h : 0));

Will this bypass the panfork issue?

layer->planes[i].offset =
    layer->planes[i-1].offset +
    layer->planes[i-1].pitch * (FFALIGN(hwfc->height, 64) >> (i > 1 ? pixdesc->log2_chroma_h : 0));

@hbiyik
Copy link
Author

hbiyik commented Jan 2, 2024

No, i just tested it. Because having a different pitch value for the buffer will only change the buffer size and offsets, however, those initial values will be overwritten by mpp like here.

To my knowledge mpp does not support preconfigured strides/pitches. If it would, may be setting the hstride of the mppframe before decode_get_frame by manually allocing the input mppframe would do the trickjust by the decoder. But thats just a theory, i was working it around with realigning to 64byte hstride with rga on next chain.

@hbiyik
Copy link
Author

hbiyik commented Jan 3, 2024

i think this is better to be handled by mpp, i asked them if they can support a decoding flow like this, may be thorugh some decccfg..

rockchip-linux/mpp#507

@nyanmisaka
Copy link
Owner

i think this is better to be handled by mpp, i asked them if they can support a decoding flow like this, may be thorugh some decccfg..

rockchip-linux/mpp#507

We really need this. For the H264 High10 use case, MPP is using hor_align=16 for both NV12 and NV15, while RGA3 requires hor_align=64 for NV15.

nyanmisaka/mpp@a986997

@hbiyik
Copy link
Author

hbiyik commented Jan 18, 2024

@nyanmisaka i tested this extensively, from my pov it is ready to be merged, there is no side affects until now i could see.

@nyanmisaka
Copy link
Owner

Merged in d0a1484

@nyanmisaka nyanmisaka closed this Jan 18, 2024
@hbiyik hbiyik deleted the ffmpeg-rockchip-hwframes branch January 19, 2024 15:04
nyanmisaka pushed a commit that referenced this pull request Apr 17, 2024
In close_output(), a dummy frame is created with format NONE passed
to enc_open(), which isn't prepared for it. The NULL pointer
dereference happened at
av_pix_fmt_desc_get(enc_ctx->pix_fmt)->comp[0].depth.

When fgt.graph is NULL, skip fg_output_frame() since there is
nothing to output.

frame #0: 0x0000005555bc34a4 ffmpeg_g`enc_open(opaque=0xb400007efe2db690, frame=0xb400007efe2d9f70) at ffmpeg_enc.c:235:44
frame #1: 0x0000005555bef250 ffmpeg_g`enc_open(sch=0xb400007dde2d4090, enc=0xb400007e4e2daad0, frame=0xb400007efe2d9f70) at ffmpeg_sched.c:1462:11
frame #2: 0x0000005555bee094 ffmpeg_g`send_to_enc(sch=0xb400007dde2d4090, enc=0xb400007e4e2daad0, frame=0xb400007efe2d9f70) at ffmpeg_sched.c:1571:19
frame #3: 0x0000005555bee01c ffmpeg_g`sch_filter_send(sch=0xb400007dde2d4090, fg_idx=0, out_idx=0, frame=0xb400007efe2d9f70) at ffmpeg_sched.c:2154:12
frame #4: 0x0000005555bcf124 ffmpeg_g`close_output(ofp=0xb400007e4e2d85b0, fgt=0x0000007d1790eb08) at ffmpeg_filter.c:2225:15
frame #5: 0x0000005555bcb000 ffmpeg_g`fg_output_frame(ofp=0xb400007e4e2d85b0, fgt=0x0000007d1790eb08, frame=0x0000000000000000) at ffmpeg_filter.c:2317:16
frame #6: 0x0000005555bc7e48 ffmpeg_g`filter_thread(arg=0xb400007eae2ce7a0) at ffmpeg_filter.c:2836:15
frame #7: 0x0000005555bee568 ffmpeg_g`task_wrapper(arg=0xb400007d8e2db478) at ffmpeg_sched.c:2200:21

Signed-off-by: Zhao Zhili <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants