Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video frame type #30

Closed
steveanton opened this issue Sep 19, 2019 · 13 comments
Closed

Video frame type #30

steveanton opened this issue Sep 19, 2019 · 13 comments
Labels
2020 Q1 Solve by end of Q1 2020

Comments

@steveanton
Copy link
Contributor

At TPAC we got a lot of questions about what type to use for unencoded video frame. Currently the explainer uses the ImageData type, but that might not be efficiently implemented.

How to efficiently process video frames is really a larger problem than what we want to solve in WebCodecs, but we're getting a lot of questions about it so should try to figure something out (or find a different group to work on it).

@pthatcherg
Copy link
Contributor

Yeah, the ImageData is just what was convenient to stick there at the time.

@guest271314
Copy link
Contributor

@jyavenard
Copy link
Member

I'm not sure if this should be pushed into another bug, or the discussion started here will do. But here goes my $0.02

ImageData is totally unsuitable for modern day applications. The only way to access the content is via a 8 bits RGB data buffer. To access that data, should the decoder be a GPU one, would require to perform a memory readback which would kill performance.

We need to be able to directly retrieve a decoded image such that it can be accessed via a handle such as a surface ID, so to be usable directly with WebGL or be able to access it via GPU methods only (such as a GL shader)

Additionally, we need to know what the format of that image would be. Most hardware decoder would output NV12 (8 bits) P010 or P016 (respectively 10 and 12/16 bits). Software decoder would output YUV 420, etc.
Additionally, need to know if it's 4:2:0, 4:2:2, 4:4:4 etc

@pthatcherg
Copy link
Contributor

Yes, the more I've looked into it, the more I have to agree. Unfortunately, the same may be true also of ImageBitmap. Currently, I'm leaning toward defining a new VideoFrame type that has:

.format: enum of "i420", "nv12", etc
.planes: int (convenience; could be inferred by format)
.onGpu: bool
.getPixelData(plane): returns raw pixel data of one plane; blows up if .onGpu

And then the same WebGL methods that work with ImageBitmap and HtmlVideoElement (such as texImage2D) would just work with VideoFrame passed in, and no readback would be required.

If one really wanted to do a readback, we could support that with something like:
.readFromGpu(): returns a new VideoFrame (async) that has .onGpu == false.

@ytakio
Copy link

ytakio commented Dec 23, 2019

I'm not good at JS well, so I have one question relative to ImageData, ImageBitMap, and VideFrame data.

How does WebCodecs take care of stride of lines in picture data?
Some HW accelerations such as SIMD like SSE, NEON, etc. expect picture data in which each line is aligned with BUS-Bandwidth. It may not equal in a multiple of MB unit sometimes. (e.g. 720x480 -> MB:16x16, AVX:256bit/512bit register)

So the object of picture data needs to have some offset information to access in each line in each format.

While GPU has some API to extract into aligned memory from CPU's domain to GPU's domain as a transfer function for its streaming processors.

I think that WebCodes should treat memory alignment for effective video processing.
(Should I create a new issue? 😅 )

@sandersdan
Copy link
Contributor

At this point I am leaning towards only supporting ImageBitmap in the first version of WebCodecs. This sidesteps plane and alignment questions (by not providing mappable buffers at all) while maintaining good performance for playback and transcode cases.

This does requires a readback to access pixel data, and conversion to RGB would be done as part of that process.

For now the best approach may be to ensure that future versions of WebCodecs can easily add new image representations.

@jyavenard
Copy link
Member

It makes you wonder then whom is this API targeted for then, and who has shown interest to implement such API.

Any APIs that requires dealing with RGB and perform read backs and conversion, will not be used for videos.

@sandersdan
Copy link
Contributor

ImageBitmap provides a relatively efficient (GPU->GPU texture copy) path to shader access in WebGL, and a very efficient display path (ImageBitmapRenderingContext.transferFromImageBitmap()). Especially given that we don't have YUV data from all decoders (Android MediaCodec in particular), I don't think I want to start designing a new image primitive for the web for the first version of WebCodecs.

I do think we should design to allow opting-in to such a planar image primitive in the future, when it exists.

@chcunningham chcunningham added the 2020 Q1 Solve by end of Q1 2020 label Feb 12, 2020
@murillo128
Copy link

From my point of view, being able to provide an efficient way of manipulating the video frame is a must for webcodecs, funny hats or head tracking being the obvious use cases.

Given encoder/decoder typical apis, i think we should provide direct access to a planar image with stride.

kind of what is already available on blink native video frame object:

https://cs.chromium.org/chromium/src/media/base/video_frame.h

@sandersdan
Copy link
Contributor

Efficient manipulation of video frames implies GPU-only operation (hardware buffer or texture primitive). There exist platforms where uncompressed frames are stored in CPU memory, in which case it is convenient to offer that access, but in general it implies a very expensive GPU readback operation.

@murillo128
Copy link

by effective video manipulation I meant with as less mem copy/conversion as possible.

For example in the case of funny hat:

       //Get cam
	const cam = await navigator.mediaDevices.getUserMedia({video:true,audio:false});
	//Get video track reader
	const reader = window.reader = new VideoTrackReader(cam.getVideoTracks()[0]);
	//Create writter
	const writer = new VideoTrackWriter({});
	//Create transform stream
	const  transformer = window.transformer = new TransformStream({
		transform : (frame, controller) => {
			//paint something on img 
			controller.enqueue(frame);
		}
	})
	reader.readable.pipeTo(transformer.writable);
	transformer.readable.pipeTo(writer.writable);
	//Send it
	peerconnection.AddTrack(writer.track);

(note that this code works on chrome right now, except obviously the image manipulation)

The image bytes would be already in cpu mem in I420P memory (on most cases) or has to be converted to I420P for the webrtc encoders. (let's ignore vp9 mode 2 for now).

It would be desirable to be able to expose the underlying image data (if in memory) and if not, be able to export it to ImageBitmap. We would also need a way to create a VideoFrame from an ImageBitmap, ImageData or yuv raw data.

@alexandre-janniaux
Copy link

I do think we should design to allow opting-in to such a planar image primitive in the future, when it exists.

My answer probably reference #47 too.

I also don't think it should be a feature within webcodec or substitute ImageBitmap because ImageBitmap is a really good piece of interoperability between the different API available for rendering and displaying.

But specific plane access from the start is a necessary feature, especially given that if not done from the first hand, you'll probably get into the same status as Android and never implement it, or struggle to implement it afterwards. Likewise, an onGpu flag probably makes little abstraction in comparison with opaque picture in general.

On solution to match both the webcodec API which could stream ImageBitmap and the ImageBitmap API would be to be able to get an ImageBitmap reference for a plane of the ImageBitmap, and expose more metadata on it. That way, you could have

// pic is an ImageBitmap with format NV12 
glTexImage2D(...., pic); // perform the NV12 -> RGB conversion like before
ImageBitmap plane = pic.getPlane(0);
glTexImage2D(...., plane); // no conversion, this is a GL_LUMINANCE texture

And then you are still backward compatible regarding ImageBitmap and elegantly handle cases where you cannot extract the plane (Android for example) by exposing a RGB chroma directly, letting the underlying graphic system handle the chroma conversion, without extending vulkan or OpenGL API.

This is also in line with API like GBM, if you take a look at gbm_bo_get_plane_fd for example.

To get back to #47, we probably don't care about colorspace within the ImageBitmap, it is an information designed for the display system (so it can be private data here) and the processing systems, which will probably generate code or use extension for this so it can be given by webcodec or even the previous layers within a different object than the ImageBitmap itself. That's what we would expect here in VLC at least as the information comes from the demuxer and not the decoder, and it could evolve quickly whenever you want to add colorspace, mastering data, etc.

@chcunningham
Copy link
Collaborator

The spec now offers a VideoFrame interface with Plane interfaces for accessing the pixel data. An ImageBitmap can be generated from a VideoFrame for painting to canvas.

With this now defined, I'd like to close this issue and have new sub-issues filed for remaining gaps. Some known issues/plans described below. Please file a new issue for anything I've neglected.

We intend to add new features to this interface shortly, including

Planar access to GPU backed frames is still a problem. In the short term we intend to at least make this transparent by having GPU backed VideoFrame's not initially offer any planar access, but provide a converter function that performs the copy to cpu memory when invoked.

Down the road we would like GPU backed frames to have some "buffer" type from WebGPU, such that inspection/manipulation of the pixels can happen without a GPU:CPU copy using WebGPU APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2020 Q1 Solve by end of Q1 2020
Projects
None yet
Development

No branches or pull requests

9 participants