-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add jitterBufferMaximumDelay attribute to RTCRtpReceiver #199
Comments
Well - jitterBufferTarget is a target, it's not an "add this much delay" API. |
Under low latency streaming there is N38 "The application must be able to control the jitter buffer and rendering delay. This requirement is addressed by jitterBufferTarget, defined in [WebRTC-Extensions] Section 6." In my opinion it is only half addressed, as there is control over minimum jitter buffer delay but nothing to limit maximum delay. The use case, which I am interested to improve is when multiple participants join a meeting from their laptops in the same room then the remote audio playback has to be synchronised to avoid echo from that room. Current AEC in Webrtc is not able to cancel out playback signal from other laptops if that happens earlier than the reference playback signal. If the synchronisation of playbacks is in +/- 10ms the AEC can handle this pretty good. |
So if the problem is that you have two laptops A and B in a room, and A plays out the signal at T+10 while B receives the signal at T+30, fixing the jitter buffer of both to min=max=20 ms will still give 20 ms difference between the signals. I don't think just controlling the jitter buffer is going to give you that. |
There are of course more details how to achieve the sync. Controlling the amount of buffering in jitter buffer is just one piece of the puzzle. |
Given the discussion in https://webrtc-review.googlesource.com/c/src/+/339901 it looks like separate attribute maximumJitterBufferDelay makes more sense rather than adding another mode. @alvestrand, do you agree? |
Seems more sensible to me. I assume that if packets don't come in within the max jitter buffer delay, the jitter buffer will synthesize samples. |
Contrary to popular belief, NetEq currently doesn't do this. It would be a quite substantial change and I'm not sure it is ideal from a quality perspective. I'm fine with having a maximum target delay however (allowing the playout delay to be higher temporarily).
In practice it is implemented as a minimum target jitter buffer delay. |
Also, for syncing streams, I'm not sure why a maximum delay is needed. For example, if you look at the audio video sync code in WebRTC, delay is only ever "added" (through setting the minimum delay) to the stream which is ahead. |
NetEq would give out silence or comfort noise when buffer runs empty, right? Or what kind of synthesising is meant here?
I haven't looked how the audio video sync is implemented, but isn't it that the delay is ever added only for video when audio is lagging? And delay for video can be applied quickly and precisely. If the NetEq respected the jitterBufferTarget delay more precisely then it would almost work for syncing multiple audio streams. Being able to force maximum delay helps to avoid the cases where NetEq leaves the delay above jitterBufferTarget. |
Sure, what I meant was that packets are not discarded, in favor of synthesising samples, if they are later than the maximum delay.
No, both audio and video can get delayed to sync.
If one stream is behind, why not just increase the delay on the other streams? |
But that should be easy change. And can be done when user sets maximum delay indicating that loosing some delayed audio is ok. In syncing multiple audio playbacks scenario it is also not desirable to playout old audio, which may have already been heard from other playbacks.
This has been tried and works in some extent if you can predict the reason why actual playout delay is higher than the target. Otherwise there would be loop to adjusting the delay to infinity. Imagine A finds that its delay is bigger than B's by 40ms and tells B to increase it's delay by 40ms. Then B instead of reaching the target precisely also leaves playout delay at +40ms. Now A needs to adjust. With some complex logic it may work in increasing the delay direction, but once the jitterBufferTarget is set the delay never starts reduce once network conditions get better. Getting the delay back to lower values is even more complex. |
Hm...if B is behind A by 40ms and B has a minimum target of 40ms, then you would first remove that minimum target before adding delay to A? It doesn't make sense to simultaneously delay both streams if you want to achieve lowest possible latency. |
Removing the delay constraint isn't really an option. With just A and B the example is of course very simplistic. There can be N streams to synchronise and once the constraint is removed the delay can change too quickly in unknown direction. Really hard to maintain sync in such conditions. Also figuring out which buffers require constraint and which do not in a given moment doesn't sound like something, which could work reliably. In my vision how it can work is that all buffers are forced to specific delay (setting min and max) at the same moment according to worse jitterBufferMinimumDelay monitored from webrtc stats. Very simple. |
Remove it iteratively in steps until the right delay is achieved? Could you explain in more detail why removing a delay constraint is not possible?
In the example above the streams are already out of sync?
Network latency / jitter and audio device latency will be different for different devices and can change over time, so the delay needs to be adjusted continuously and independently (as in different streams will have different delay settings to achieve sync) anyway? |
I would need to experiment with it again to be more specific, but it is not new idea and ruled out in early stages of feature development. Basically if you let one or more jitter buffers change delay freely, it first takes time to measure playout delay, then communicate that to other jitter buffers and then it takes time until target is reached. During that time the playout is out of sync and AEC fails to cancel out echo. The feature with synchronised playbacks is publicly available in a product and it is possible to play with it if there is interest. I am pretty sure that the results what can be achieved with using only jitterBufferTarget with current NetEq behaviour have hit the limits and that is why there is effort to find ways to improve it.
Yes. Correct. |
Would need to write a proper explainer eventually, but for now I can reiterate some key points. |
Yes, a proper explainer would be good. It sounds like you want to add a separate operating mode for NetEq, which I'm hesitant to add (for this potentially narrow use case) unless there is a lot of interest. It also sounds like a potential footgun for developers where setting a low delay would result in terrible quality. I could imagine other use cases of a maximum delay API that would like NetEq to operate with as low delay as possible or an upper bound, but not throw away packets in the process, which would not fit your description. |
Yes, from my perspective there is no preference if there is separate operating mode, maxDelay attribute or something else, which would allow to force jitter buffer to operate in a specific delay as precisely as possible. That is why I have proposed different options to find out what suits the best. First there was option to change behaviour of existing jitterBufferTarget to follow the target more closely. That indeed can break some user expectations who are after quality of not loosing a word. Then mode vs. maxDelay. If to add a mode it could also be called something like 'low latency mode', which would disable delay jumps and keeping delay higher after under run. There could be an option for users to prefer latency vs. quality (risk of loosing some words). If I am not mistaken even the whole audio pipeline in webrtc is built with low latency in mind and eliminating the risk of under runs comes second.
If that is about changing behaviour of current jitterBufferTarget then I agree. But if there is separate max delay attribute than I don't see how the potential impact can be misunderstood. There is no need to play with the max delay if you don't understand the consequences. Even today developers can set max number of packets the jitter buffer can hold to a low value. Would the possibility to do it dynamically during a call be more dangerous? If there is good documentation in place it should be fine I think. |
Does there have to be many different web apps interested or would single one with millions of potential users count too? It is possible to add usage counter. |
Yes, an explainer would help. The use case, as I understand it, is multiple speakers playing out the same source in the same (physical) room, with different network delays, and the desire to keep them in sync well enough to allow echo cancellation to cancel all the speakers at once. One alternative design would be to timestamp the audio packets and have a local variable "timestamp to playout offset" that would hard-lock the playout of a given timestamp to the local system clock (choice of this value, and adjustment for things like clock drift, are left as an exercise for the reader); samples that arrived too late to make their scheduled playout time would be replaced with silence. |
Is that how Google Meet is doing it? https://workspaceupdates.googleblog.com/2024/05/google-meet-adaptive-audio.html?m=1 |
Background
There is missing possibility to force jitter buffer to operate with specific delay. Existing attribute jitterBufferTarget enables setting of minimum delay, but upper bound is not controllable from application layer, meaning that jitter buffer can increase the delay according to network conditions. There are use cases where it is unavoidable to have full control over jitter buffer delay. For example one scenario where several jitter buffers need to be synchronised and maintaining good synchronisation is much more important than potentially loosing some packets due to too small buffering.
Proposed API
Details
jitterBufferMaximumDelay attribute would allow to control maximum amount of buffering in jitter buffer.
The text was updated successfully, but these errors were encountered: