-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Video frame type #30
Comments
Yeah, the ImageData is just what was convenient to stick there at the time. |
I'm not sure if this should be pushed into another bug, or the discussion started here will do. But here goes my $0.02 ImageData is totally unsuitable for modern day applications. The only way to access the content is via a 8 bits RGB data buffer. To access that data, should the decoder be a GPU one, would require to perform a memory readback which would kill performance. We need to be able to directly retrieve a decoded image such that it can be accessed via a handle such as a surface ID, so to be usable directly with WebGL or be able to access it via GPU methods only (such as a GL shader) Additionally, we need to know what the format of that image would be. Most hardware decoder would output NV12 (8 bits) P010 or P016 (respectively 10 and 12/16 bits). Software decoder would output YUV 420, etc. |
Yes, the more I've looked into it, the more I have to agree. Unfortunately, the same may be true also of ImageBitmap. Currently, I'm leaning toward defining a new VideoFrame type that has: .format: enum of "i420", "nv12", etc And then the same WebGL methods that work with ImageBitmap and HtmlVideoElement (such as texImage2D) would just work with VideoFrame passed in, and no readback would be required. If one really wanted to do a readback, we could support that with something like: |
I'm not good at JS well, so I have one question relative to ImageData, ImageBitMap, and VideFrame data. How does WebCodecs take care of stride of lines in picture data? So the object of picture data needs to have some offset information to access in each line in each format. While GPU has some API to extract into aligned memory from CPU's domain to GPU's domain as a transfer function for its streaming processors. I think that WebCodes should treat memory alignment for effective video processing. |
At this point I am leaning towards only supporting ImageBitmap in the first version of WebCodecs. This sidesteps plane and alignment questions (by not providing mappable buffers at all) while maintaining good performance for playback and transcode cases. This does requires a readback to access pixel data, and conversion to RGB would be done as part of that process. For now the best approach may be to ensure that future versions of WebCodecs can easily add new image representations. |
It makes you wonder then whom is this API targeted for then, and who has shown interest to implement such API. Any APIs that requires dealing with RGB and perform read backs and conversion, will not be used for videos. |
ImageBitmap provides a relatively efficient (GPU->GPU texture copy) path to shader access in WebGL, and a very efficient display path (ImageBitmapRenderingContext.transferFromImageBitmap()). Especially given that we don't have YUV data from all decoders (Android MediaCodec in particular), I don't think I want to start designing a new image primitive for the web for the first version of WebCodecs. I do think we should design to allow opting-in to such a planar image primitive in the future, when it exists. |
From my point of view, being able to provide an efficient way of manipulating the video frame is a must for webcodecs, funny hats or head tracking being the obvious use cases. Given encoder/decoder typical apis, i think we should provide direct access to a planar image with stride. kind of what is already available on blink native video frame object: https://cs.chromium.org/chromium/src/media/base/video_frame.h |
Efficient manipulation of video frames implies GPU-only operation (hardware buffer or texture primitive). There exist platforms where uncompressed frames are stored in CPU memory, in which case it is convenient to offer that access, but in general it implies a very expensive GPU readback operation. |
by effective video manipulation I meant with as less mem copy/conversion as possible. For example in the case of funny hat:
(note that this code works on chrome right now, except obviously the image manipulation) The image bytes would be already in cpu mem in I420P memory (on most cases) or has to be converted to I420P for the webrtc encoders. (let's ignore vp9 mode 2 for now). It would be desirable to be able to expose the underlying image data (if in memory) and if not, be able to export it to ImageBitmap. We would also need a way to create a VideoFrame from an ImageBitmap, ImageData or yuv raw data. |
My answer probably reference #47 too. I also don't think it should be a feature within webcodec or substitute ImageBitmap because ImageBitmap is a really good piece of interoperability between the different API available for rendering and displaying. But specific plane access from the start is a necessary feature, especially given that if not done from the first hand, you'll probably get into the same status as Android and never implement it, or struggle to implement it afterwards. Likewise, an On solution to match both the webcodec API which could stream ImageBitmap and the ImageBitmap API would be to be able to get an ImageBitmap reference for a plane of the ImageBitmap, and expose more metadata on it. That way, you could have
And then you are still backward compatible regarding ImageBitmap and elegantly handle cases where you cannot extract the plane (Android for example) by exposing a RGB chroma directly, letting the underlying graphic system handle the chroma conversion, without extending vulkan or OpenGL API. This is also in line with API like GBM, if you take a look at To get back to #47, we probably don't care about colorspace within the ImageBitmap, it is an information designed for the display system (so it can be private data here) and the processing systems, which will probably generate code or use extension for this so it can be given by webcodec or even the previous layers within a different object than the ImageBitmap itself. That's what we would expect here in VLC at least as the information comes from the demuxer and not the decoder, and it could evolve quickly whenever you want to add colorspace, mastering data, etc. |
The spec now offers a VideoFrame interface with Plane interfaces for accessing the pixel data. An ImageBitmap can be generated from a VideoFrame for painting to canvas. With this now defined, I'd like to close this issue and have new sub-issues filed for remaining gaps. Some known issues/plans described below. Please file a new issue for anything I've neglected. We intend to add new features to this interface shortly, including
Planar access to GPU backed frames is still a problem. In the short term we intend to at least make this transparent by having GPU backed VideoFrame's not initially offer any planar access, but provide a converter function that performs the copy to cpu memory when invoked. Down the road we would like GPU backed frames to have some "buffer" type from WebGPU, such that inspection/manipulation of the pixels can happen without a GPU:CPU copy using WebGPU APIs. |
At TPAC we got a lot of questions about what type to use for unencoded video frame. Currently the explainer uses the ImageData type, but that might not be efficiently implemented.
How to efficiently process video frames is really a larger problem than what we want to solve in WebCodecs, but we're getting a lot of questions about it so should try to figure something out (or find a different group to work on it).
The text was updated successfully, but these errors were encountered: