What should the shape of the API be? #35

pthatcherg · 2019-12-18T16:51:12Z

A number of considerations all combine in somewhat complex ways, such as (re)initialization, buffering, failure recovery, and flushing. Obviously we'd like a good API for all of these, but there are tradeoffs between different options. This issue is for tracking the discussion of what we want the API to look like.

Here are some options. Note that it may be possible to have different options for encode and decode. For example, we could do a combination of B for encode and D for decode.

Option A: new Encoder/Decoder for each change

Every time you want to change something that requires (re)initialization, such as changing the codec or resolution, create a new Encoder/Decoder. Also reinit every time a flush is desired.

Pros:

Simple API
It's clear when (re)init fails, and recovering is straightforward
If we have buffered frames, it's clear which initialization applies to which frames.
Flushing is just closing the .writable

Cons:

Dealing with downstream changes to the pipeline may be difficult (because you have a new .readable that you need to pipe somewhere).
Dealing with upstream changes to the pipeline may be difficult (because you have a new .writable that you need to pipe into).
It may be much more efficient to have only one encoder/decoder around at a time which is difficult to manage when the JS is creating new ones for every resolution change and/or flush.

Option B: An Initialize() method for each change

If a change requires a reinitialization, call Initialize(), as many times as you want. The .writable and .readable are stable.

Pros:

It's clear when (re)init fails, and recovering is straightforward
.readable is the same across multiple initializations, which makes the downstream consumption easier (nothing to re-pipe).
.writable is the same across multiple initializations, which makes the upstream production easier (nothing to re-pipe).
Reinitialization can be efficient because the implementation can keep only one encoder/decoder around at a time.

Cons:

Flushing mush be a separate method (since you can only call .close() on the .writable once)
If buffering is used, which initialization applies to which frames becomes unclear. If we don't buffer on the .writable, we can avoid this, but that means that we must require the JS to respect .ready and we must deal with what happens when it does not.

Option C: An Initialize() method that produces a new WritableStream

If a change requires a reinitialization, call Initialize(), as many times as you want. The .readable is stable, but not the .writable (if there is one).

Pros:

It's clear when (re)init fails, and recovering is straightforward
.readable is the same across multiple initializations, which makes the downstream consumption easier (nothing to re-pipe).
Reinitialization can be efficient because the implementation can keep only one encoder/decoder around at a time.

Flushing is just closing the WritableStream
If we have buffered frames, it's clear which initialization applies to which frames.

Cons:

Dealing with upstream changes to the pipeline may be difficult (because you have a new Writable that you need to pipe into).
It's not exactly a TransformStream any more
Transferring streams may be tricky

Option D: In-band parameters

To reinitialize, put new parameters on the chunk passed into the .writable. Init failure is conveyed via a write failure.

Pros:

For decode, the source of frames (such as a media container) is likely related to the decoding parameters desired, making this a convenient/natural fit.
Cleaner encoder/decoder API (no extra Initialize/Flush methods)
Transferring streams is likely less tricky
If we have buffered frames, it's clear which initialization applies to which frames.
Upstream and downstream piping is easy (since both .readable and .writable are stable)
Flushing is just closing the .writable

Cons:

(Re)init failure recovery isn't straightforward. You'll likely need to catch an exception on .pipeTo and probably want to use preventCancel.
For encode, the source of frames (such as a MediaStreamTrack tied to a VideoTrackReader) likely isn't related to the encoding parameters desired, making this an inconvenient fit.
More complex chunk types (not just EncodedVideoFrame and VideoFrame; more things need to be on there)

Option E: Internal reinitialization

Instead of asking for an init, just give it what you want and have it (re)init when it needs. There is a fine line between this and Option D. But consider resolution changes. Instead of specifying that the codec reinit with a new size, you just give it whatever frame comes from a MediaStreamTrack and it reinits based on that size. Similarly, an EncodedVideoFrame could simple express what codec it is and the decoder deals with whatever it is.

Pros:

API is easier to use, and simple
Transferring streams is likely less tricky
No problems with buffering and which settings apply to which frames
Upstream and downstream piping is easy (since both .readable and .writable are stable)
Flushing is just closing the .writable

Cons:

(Re)init failure recovery isn't straightforward. You'll likely need to catch an exception on .pipeTo and probably want to use preventCancel.
It's easy for performance issues to creap in easily and become too automative/implicit. For example, if a codec switch happens and the new codec is only available via software, not hardware, should we reinit internally from hardware to software? This leads to a higher level, more automatic API with constraints that are difficult to specify and keep consistent across browser, somewhat like getUserMedia, whereas this API was initially intended to be low-level and explicit about performance.

sandersdan · 2020-02-08T00:24:27Z

Based on our experience implementing VideoDecoder in Chromium, @chcunningham and I reviewed these options and we are implementing option D.

A (replacing streams every time): Replumbing is less elegant than in-band signaling in the cases we considered. Consider a potential app implementation of seeking:

demuxer.readFrom(time).then((config, readable) => {
  decoder.configure(config).then(writable => readable.pipeTo(writable));
});

B (out-of-band with same stream): Buffering is inherent in WritableStream, we cannot reliably synchronize in- and out-of-band signals without replacing streams.
C (out-of-band with new stream): Basically the same as A.
D (in-band): Using preventCancel and preventAbort, the whole pipeline does not need to be torn down on failure. Main downside is that streams contain multiple types of messages. Here the app implementation is much simpler because the configuration does not need to be separately plumbed:

demuxer.seek(time)

E (in-band with implicit configuration): Same as D with more complexity.

It still remains to be seen if preventAbort/preventCancel/signal solutions are intuitive enough to be required for first-time use of WebCodecs. It would be a bad outcome if apps always wrap decoders to provide a different interface.

sandersdan · 2020-02-10T21:28:07Z

After thinking on this over the weekend, I think option A/C is worth considering further. If we provide a configure() that replaces the streams (and aborts the old ones), then clients can use preventAbort/preventCancel to polyfill any of the other options. This may be the safest path while we wait to see what happens with flush in streams.

chcunningham · 2020-06-04T20:16:36Z

Obsolete. See explainer updates (decouple from streams).

pthatcherg mentioned this issue Dec 18, 2019

Changing settings in frame chunks vs. separate chunks #20

Closed

chcunningham added the 2020 Q1 Solve by end of Q1 2020 label Feb 12, 2020

chcunningham mentioned this issue Apr 10, 2020

Update explainer (decouple from streams) #48

Merged

chcunningham closed this as completed Jun 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What should the shape of the API be? #35

What should the shape of the API be? #35

pthatcherg commented Dec 18, 2019

sandersdan commented Feb 8, 2020 •

edited

Loading

sandersdan commented Feb 10, 2020

chcunningham commented Jun 4, 2020

What should the shape of the API be? #35

What should the shape of the API be? #35

Comments

pthatcherg commented Dec 18, 2019

Option A: new Encoder/Decoder for each change

Option B: An Initialize() method for each change

Option C: An Initialize() method that produces a new WritableStream

Option D: In-band parameters

Option E: Internal reinitialization

sandersdan commented Feb 8, 2020 • edited Loading

sandersdan commented Feb 10, 2020

chcunningham commented Jun 4, 2020

sandersdan commented Feb 8, 2020 •

edited

Loading