Capture, receive, and RTP timestamp concept definitions & normative requirements for gUM/gDM #156

handellm · 2024-09-11T09:22:56Z

Partly addresses w3c/webcodecs#813 (review).

Defines capturetime in mediacapture-extensions.

Partly addresses w3c/webcodecs#813 (review).

jan-ivar

Why isn't the existing videoFrame.timestamp good enough for AV sync, especially when it's defined for both audio and video, and has a higher resolution to boot?

jan-ivar · 2024-09-12T13:29:42Z

index.html

+          Some video sources can supply information about when a video frame was captured.
+          This information is useful for example for AV sync and end-to-end delay measurement.


How does it help with AV sync if it's only on video frames?

Absolute audio capture timestamps are available via other web APIs right?
To try to answer the overarching question, presentation timestamps alone are insufficient if you want to do accurate AV sync on a local (think MediaRecorder) or on a remote receiver, especially in the presence of delays in the capture process, or insert VTGs in the capture pipeline to do video processing.
Is this answer enough or do you want to see further changes to clarify this in the spec text based on latest uploaded revision?

jan-ivar · 2024-09-12T13:32:57Z

index.html

+              <dt><dfn><code>captureTime</code></dfn> of type <span
+                class="idlMemberType">DOMHighResTimeStamp, readonly</span></dt>
+              <dd>
+                <p>
+                  The capture time of the frame, defined as {{Performance.timeOrigin}} + {{Performance.now()}}.
+                  This is the user agent's best estimate of the instant the frame content was captured or
+                  generated.


What is the videoFrame.timestamp of these frames? Why do we need another one?

This is in milliseconds when the videoFrame.timestamp is in microseconds. That seems inconsistent.

Hi, @jan-ivar - I was a little trigger happy in uploading this PR, I will upload changes soon.

captureTime is absolute which enables delay measurements in a video pipeline. Also, video capture OS APIs typically provide media sampling timestamps which I've measured to be sometimes tens of milliseconds before they're available in a capture sink. I've tried to update what the use of the timestamps are in non-normative text.

I don't know why VideoFrame.timestamp was specified in micros. DOMHighResTimestamp with resolution around 0.1ms seems fine.

index.html

jan-ivar · 2024-09-12T13:59:15Z

The presentation timestamp seems a bit under-specified, but I gather it starts at 0 at the start of capture, and follows the "media timeline" which I suppose in theory can get out of sync (what happens to it with track.enabled = false?). So I can see a need here.

I just think it would be good to write down the differences and why we need a video capture timestamp.

handellm · 2024-09-20T10:48:55Z

41da924 is uploaded state prior to serving Youenn's request of extending with timestamp writing algorithms for local sources.

index.html

aboba

If you're going to cover remote tracks as well as local ones, then some clarifications are needed. But it might be simplest to focus on locally captured tracks.

index.html

Co-authored-by: Dominique Hazael-Massieux <[email protected]>

index.html

youennf · 2024-10-07T11:41:59Z

index.html

+      </p>
+      <p>
+        Each video frame has a <dfn class="export">presentation timestamp</dfn>
+        which is relative to the first frame appearing on the track. The timestamp


Should we state that the first video frame of a {{MediaStreamTrack}} has a presentation timestamp of 0?
Or is it the first frame of the track's source?
Does this mean that the first video frame of a cloned track will have a timestamp of 0, or will cloned tracks have the same video frame timestamps?

Should we state that the first video frame of a {{MediaStreamTrack}} has a presentation timestamp of 0?

This - updated with clarification that the first frame's timestamp is 0.

Does this mean that the first video frame of a cloned track will have a timestamp of 0, or will cloned tracks have the same video frame timestamps?

Having it start with 0 for cloned tracks is consistent with the definition as written here. Trying to specify 0 for the first frame from a source could be interpreted as the same timestamp sequence is replicated across all tabs - we don't want this for privacy reasons, right?

index.html

youennf · 2024-10-07T12:28:18Z

index.html

+        The user agent MUST set the [= capture timestamp =] of each video frame that is sourced from
+        {{MediaDevices/getUserMedia()}} and {{MediaDevices/getDisplayMedia()}} to its best estimate of the time that
+        the frame was captured, which MUST be in the past relative to the time the frame is appearing on the track.
+        This value MUST be monotonically increasing.


At TPAC, I think there was consensus on having camera tracks timestamp == capture timestamp.
It does not seem to be the case here though I would guess we keep the idea that the same clock is used.

Sorry can you elaborate on what you mean here?

As per https://jsfiddle.net/4yzmwnsL/, it seems both Chrome and Safari use the same frame.timestamp for cloned tracks, even though the second track was cloned 2 seconds after the first one. This seems to be in contradiction with this PR.

Instead, implementations seem to be aligned with the idea that capture timestamp and presentation timestamp are both set by the source, not by the track.

I am also not totally clear of the difference between these two timestamps for local sources.
It seems that the time the frame is appearing on the track would be thepresentation timestamp, and would be slightly greater than the capture timestamp.
If that is the case, we should refer to both capture timestamp and presentation timestamp in that section of the PR.

Yeah that's right, I spoke to this on TPAC. Currently, Chrome emits absolute capture timestamps from gUM/gDM capture to the webcodecs timestamp slot in MSTP. We believed we could do this since the nature of the timestamp isn't really well specified in webcodecs. It was later found that there are web apps that depend on a 0-based timeline and we have heuristics to support both cases in MSTG. The purpose of this PR series is to clean this whole topic up and stop using 1 field for 2 things.

presentation timestamp is 0-based and increments by frame duration. We intend to change Chrome back to 0-based here after the PR series is landed.
capture timestamp is absolute & unobservable on MediaStreamTracks but there are behavior requirements stated for gDM/gUM in this PR. It becomes observable first in VideoFrameMetadata creation at the MSTP.

I am also not totally clear of the difference between these two timestamps for local sources.

For gUM/gDM sources, assume frame sequence indices 0, 1, 2, i ... N. presentation timestamp[i + 1] - presentation timestamp[i] = capture timestamp[i + 1] - capture timestamp[i] = VideoFrame.duration[i] i.e. same clock rate, different origin.

Suggestions on how to make this clearer?

The purpose of this PR series is to clean this whole topic up and stop using 1 field for 2 things.

That is fine to me.

presentation timestamp is 0-based and increments by frame duration.

This PR assumes that 0 is per track. Meaning that track and track.clone() would not have the same definition of 0. This seems unnatural to me. I would tend to go with 0 as the time of the first frame emitted by the source.
For VTG, 0 would refer to the time the first video frame is enqueued.

For this PR, that would mean changing which is relative to the first frame appearing on the track. to which is relative to the first frame emitted by the track's [[Source]].

Or it could be 0 per track's sink, meaning that MediaStreamTrackProcessor would always provide its first frame to the web app with a timestamp of 0 (provided the web app reads the frames quickly enough).

It was later found that there are web apps that depend on a 0-based timeline

Maybe these web apps (and the heuristics you mentioned) could tell us whether 0 should be per track, per track's sink or per track's source.

For gUM/gDM sources, assume frame sequence indices 0, 1, 2, i ... N. presentation timestamp[i + 1] - presentation timestamp[i] = capture timestamp[i + 1] - capture timestamp[i] = VideoFrame.duration[i] i.e. same clock rate, different origin.

That makes sense to me.

Suggestions on how to make this clearer?

I would add some wording stating that presentation timestamp and capture timestamp are using the same clock and have a constant offset.

presentation timestamp is 0-based and increments by frame duration.

This PR assumes that 0 is per track. Meaning that track and track.clone() would not have the same definition of 0. This seems unnatural to me. I would tend to go with 0 as the time of the first frame emitted by the source. For VTG, 0 would refer to the time the first video frame is enqueued.

For this PR, that would mean changing which is relative to the first frame appearing on the track. to which is relative to the first frame emitted by the track's [[Source]].

Or it could be 0 per track's sink, meaning that MediaStreamTrackProcessor would always provide its first frame to the web app with a timestamp of 0 (provided the web app reads the frames quickly enough).

I think 0 per track source is what makes most sense. WDYT?

The purpose of this PR series is to clean this whole topic up and stop using 1 field for 2 things.

That is fine to me.

presentation timestamp is 0-based and increments by frame duration.

This PR assumes that 0 is per track. Meaning that track and track.clone() would not have the same definition of 0. This seems unnatural to me. I would tend to go with 0 as the time of the first frame emitted by the source. For VTG, 0 would refer to the time the first video frame is enqueued.

For this PR, that would mean changing which is relative to the first frame appearing on the track. to which is relative to the first frame emitted by the track's [[Source]].

I think 0 per track source is what makes most sense. WDYT?

Done in a8fc811.

Or it could be 0 per track's sink, meaning that MediaStreamTrackProcessor would always provide its first frame to the web app with a timestamp of 0 (provided the web app reads the frames quickly enough).

It was later found that there are web apps that depend on a 0-based timeline

Maybe these web apps (and the heuristics you mentioned) could tell us whether 0 should be per track, per track's sink or per track's source.

The usages we found would work great if MSTP exposes 0-based from creation.

Suggestions on how to make this clearer?

I would add some wording stating that presentation timestamp and capture timestamp are using the same clock and have a constant offset.

Done in a8fc811.

Co-authored-by: youennf <[email protected]>

youennf · 2024-10-08T13:49:34Z

I think 0 per track source is what makes most sense.

That seems indeed better compared to per track.

The usages we found would work great if MSTP exposes 0-based from creation.

That is closer to per track's sink timestamps, but...
MSTP can be specified so that the presentation timestamp of the video frames it is exposing to web pages will be relative to the first video frame it is exposing.

If we do that change in MSTP, do we need presentation timestamp and capture timestamp to be different for local tracks?

handellm · 2024-10-08T14:42:16Z

I think 0 per track source is what makes most sense.

That seems indeed better compared to per track.

The usages we found would work great if MSTP exposes 0-based from creation.

That is closer to per track's sink timestamps, but... MSTP can be specified so that the presentation timestamp of the video frames it is exposing to web pages will be relative to the first video frame it is exposing.

I was expressing myself sloppily, I meant what you wrote. So for this PR, we could prepare for that by adding an |offset| parameter to the "Initialize Video Frame Timestamps From Internal MediaStreamTrack Video Frame" algorithm, which is specified later from mediacapture-transform as "the [=presentation timestamp=] of the first frame". WDYT?

If we do that change in MSTP, do we need presentation timestamp and capture timestamp to be different for local tracks?

In general yes, it's not possible to recreate any accurate capture timestamp given presentation timestamp alone since the origin is unknown.

youennf · 2024-10-08T15:55:40Z

we could prepare for that by adding an |offset|

We could consider that MSTP is receiving a VideoFrame object and use https://w3c.github.io/webcodecs/#videoframe-initialize-frame-from-other-frame to create a new VideoFrame with an updated timestamp.

If we do that change in MSTP, do we need presentation timestamp and capture timestamp to be different for local tracks?

In general yes, it's not possible to recreate any accurate capture timestamp given presentation timestamp alone since the origin is unknown.

It seems that presentation timestamp will be exposed in a few places:

Via MSTP where it would be updated by MSTP before exposure to JS.
Via rvfc where it could also be updated in the same way (offset by the timestamp of the first video frame being rendered by the media element).

Based on this, the fact that the presentation timestamp of the first video frame of a track's source is 0 may not be observable to JavaScript, since the only presentation timestamps exposed to JS would be computed by each sink and would be relative to that sink.

The only requirement could be that presentation timestamps have the same clock as capture timestamps for local sources. presentation timestamp == capture timestamp for local sources seems then sufficient and easier to understand.

handellm · 2024-10-08T19:06:12Z

Via MSTP where it would be updated by MSTP before exposure to JS.

Via rvfc where it could also be updated in the same way (offset by the timestamp of the first video frame being rendered by the media element).

Yes, VideoFrame.timestamp for the former and mediaTime for the latter (they're currently both described to be "presentation timestamp" or "media presentation timestamp (PTS)").

The only requirement could be that presentation timestamps have the same clock as capture timestamps for local sources. presentation timestamp == capture timestamp for local sources seems then sufficient and easier to understand.

I buy requiring the same clock rate, but I don't see the requirement to require equality for the offset.

Given sinks that expose presentation timestamp subtract an |offset| being the first used presentation timestamp to produce a 0-based observable sequence it doesn't seem to me the actual origin of presentation timestamp in the UA matters and hence don't need to be strictly defined?

Wdyt we just add a note explaining this and that it's allowable to have the same clock (rate and offset) for presentation timestamp and capture timestamp as a special case?

youennf · 2024-10-09T06:18:35Z

it doesn't seem to me the actual origin of presentation timestamp in the UA matters and hence don't need to be strictly defined?

Agreed.

Wdyt we just add a note explaining this and that it's allowable to have the same clock (rate and offset) for presentation timestamp and capture timestamp as a special case?

Right, this seems editorial.
I would tend to define presentation timestamp equal to capture timestamp for simplicity/readability, but either is fine really.

…deo frame writing algorithm.

handellm · 2024-10-09T12:55:52Z

@dontcallmedom I can't parse what respec complains about, ideas?

dontcallmedom · 2024-10-09T13:02:44Z

looks like it was a temporary hiccup, re-running the job fixed it

youennf

LGTM once we say something of the [=presentation timestamp=] of local video capture, either same value as [=capture timestamp=] or with a fixed offset.
Plus the removal of a MUST we cannot really test.

index.html

Co-authored-by: youennf <[email protected]>

youennf · 2024-10-10T14:10:28Z

e [=presentation timestamp=] of local video capture, either same value as [=capture timestamp=]

Only for local tracks of course.

jan-ivar · 2024-10-10T14:11:51Z

Thanks for all the activity! Let me have a look. I'd also love for @padenot to look this over.

henbos · 2024-10-10T14:16:14Z

From editor's meeting:

@handellm to address @youennf 's comment
@jan-ivar to read this PR
@handellm to file issues on RVFC

padenot · 2024-10-10T14:53:57Z

All good from my perspective, with comments from @youennf addressed. Please remember to put this upstream and not here, thanks.

handellm · 2024-10-11T13:48:19Z

@padenot thanks - can you just clarify what you meant by "upstream and not here"? We had one interpretation on the editors meeting but reading again I'm not sure that was right.

…e same for local capture

happylinks · 2024-10-14T18:59:02Z

Just want to share my appreciation for adding capture timestamp. This is exactly what we are missing for having accurate multi-stream recording in tella.tv
Right now we start 2 mediarecorders and basically hope for the best in terms of sync between screen and camera, there is no way to know when the first frame was actually captured.
Seems like this will fix that problem, so thanks for working on it!

handellm · 2024-10-16T10:50:13Z

@padenot @jan-ivar - friendly ping!

jan-ivar

Thanks! I approve of overall direction. Happy to file followups if I find something.

handellm added 3 commits September 9, 2024 14:40

Define captureTime.

6f40605

Partly addresses w3c/webcodecs#813 (review).

Goldplate

ae946a9

Address comments from internal review.

2dc8783

jan-ivar requested changes Sep 12, 2024

View reviewed changes

alvestrand marked this pull request as draft September 19, 2024 14:40

Updates after offline review

41da924

handellm changed the title ~~Capture time~~ Capture, receive, and RTP timestamp concept definitions & normative requirements for gUM/gDM Sep 20, 2024

This was referenced Sep 20, 2024

Add WebRTC-specific interactions with capture/receive/RTP timestamps w3c/webrtc-extensions#224

Draft

Add interactions with capture/presentation/receive/RTP timestamps w3c/mediacapture-transform#112

Draft

Updates after TPAC joint Media/WebRTC session.

069793f

aboba reviewed Sep 27, 2024

View reviewed changes

index.html Outdated Show resolved Hide resolved

aboba reviewed Sep 27, 2024

View reviewed changes

index.html Show resolved Hide resolved

index.html Show resolved Hide resolved

handellm added 2 commits September 27, 2024 10:31

Clarify nature and utility of timestamps.

88a2e13

Address reviewer comments.

15e2399

handellm requested a review from jan-ivar September 27, 2024 19:14

handellm marked this pull request as ready for review September 27, 2024 19:15

henbos approved these changes Oct 4, 2024

View reviewed changes

aboba reviewed Oct 4, 2024

View reviewed changes

index.html Outdated Show resolved Hide resolved

aboba approved these changes Oct 4, 2024

View reviewed changes

dontcallmedom reviewed Oct 7, 2024

View reviewed changes

index.html Outdated Show resolved Hide resolved

index.html Outdated Show resolved Hide resolved

index.html Outdated Show resolved Hide resolved

index.html Outdated Show resolved Hide resolved

handellm and others added 4 commits October 7, 2024 12:12

Update index.html

958a9bd

Co-authored-by: Dominique Hazael-Massieux <[email protected]>

Update index.html

c79520a

Co-authored-by: Dominique Hazael-Massieux <[email protected]>

Update index.html

4d9e139

Co-authored-by: Dominique Hazael-Massieux <[email protected]>

Update index.html

584ace5

Co-authored-by: Dominique Hazael-Massieux <[email protected]>

youennf reviewed Oct 7, 2024

View reviewed changes

handellm and others added 3 commits October 7, 2024 15:45

Update index.html

8d59ae2

Co-authored-by: youennf <[email protected]>

Update index.html

f5d3490

Co-authored-by: youennf <[email protected]>

Adress reviewer comments.

d915bd5

Djuffin approved these changes Oct 7, 2024

View reviewed changes

Address reviewer comments.

a8fc811

handellm added 2 commits October 9, 2024 14:04

Clarify nature of presentation time, introduce offset in webcodecs vi…

b5778b0

…deo frame writing algorithm.

fix

012628d

youennf approved these changes Oct 9, 2024

View reviewed changes

index.html Outdated Show resolved Hide resolved

Update index.html

794cd2d

Co-authored-by: youennf <[email protected]>

Add a note about presentation/capture timestamps being potentially th…

fcafe1c

…e same for local capture

guidou approved these changes Oct 31, 2024

View reviewed changes

jan-ivar approved these changes Oct 31, 2024

View reviewed changes

youennf merged commit 464ebe8 into w3c:main Oct 31, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture, receive, and RTP timestamp concept definitions & normative requirements for gUM/gDM #156

Capture, receive, and RTP timestamp concept definitions & normative requirements for gUM/gDM #156

handellm commented Sep 11, 2024 •

edited by pr-preview bot

Loading

jan-ivar left a comment

jan-ivar Sep 12, 2024

handellm Sep 27, 2024

jan-ivar Sep 12, 2024

handellm Sep 12, 2024

handellm Sep 27, 2024

jan-ivar commented Sep 12, 2024

handellm commented Sep 20, 2024

aboba left a comment

youennf Oct 7, 2024

handellm Oct 7, 2024

youennf Oct 7, 2024

handellm Oct 7, 2024

youennf Oct 8, 2024

handellm Oct 8, 2024

youennf Oct 8, 2024

guidou Oct 8, 2024

handellm Oct 8, 2024

youennf commented Oct 8, 2024

handellm commented Oct 8, 2024

youennf commented Oct 8, 2024

handellm commented Oct 8, 2024

youennf commented Oct 9, 2024

handellm commented Oct 9, 2024

dontcallmedom commented Oct 9, 2024

youennf left a comment

youennf commented Oct 10, 2024

jan-ivar commented Oct 10, 2024

henbos commented Oct 10, 2024 •

edited

Loading

padenot commented Oct 10, 2024

handellm commented Oct 11, 2024

happylinks commented Oct 14, 2024

handellm commented Oct 16, 2024

jan-ivar left a comment

		Some video sources can supply information about when a video frame was captured.
		This information is useful for example for AV sync and end-to-end delay measurement.

Capture, receive, and RTP timestamp concept definitions & normative requirements for gUM/gDM #156

Capture, receive, and RTP timestamp concept definitions & normative requirements for gUM/gDM #156

Conversation

handellm commented Sep 11, 2024 • edited by pr-preview bot Loading

jan-ivar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jan-ivar commented Sep 12, 2024

handellm commented Sep 20, 2024

aboba left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youennf commented Oct 8, 2024

handellm commented Oct 8, 2024

youennf commented Oct 8, 2024

handellm commented Oct 8, 2024

youennf commented Oct 9, 2024

handellm commented Oct 9, 2024

dontcallmedom commented Oct 9, 2024

youennf left a comment

Choose a reason for hiding this comment

youennf commented Oct 10, 2024

jan-ivar commented Oct 10, 2024

henbos commented Oct 10, 2024 • edited Loading

padenot commented Oct 10, 2024

handellm commented Oct 11, 2024

happylinks commented Oct 14, 2024

handellm commented Oct 16, 2024

jan-ivar left a comment

Choose a reason for hiding this comment

handellm commented Sep 11, 2024 •

edited by pr-preview bot

Loading

henbos commented Oct 10, 2024 •

edited

Loading