-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Target Bitrate As Reported by Chrome is slow to adjust when ConfigureTWCCSender is Being Used #2830
Comments
Given that the only commits between v3.2.24 and v3.2.25 are RTX related (v3.2.24...v3.2.25), I'm guessing it has something to do with RTX. This would make sense, as when RTX is negotiated, libwebrtc will attempt to probe with RTX packets instead of padding packets. I think what might be going on is that if RTX gets negotiated by default (I'm not sure whether this is the case now or not), and somehow RTX packets are not consumed, then RTX packets will stop being fed through the interceptor and libwebrtc won't get TWCC feedback for RTX probes. In 5da7278 If the general shape of my theory is correct, that would definitely explain the behavior you are seeing with libwebrtc's bandwidth estimation. |
@kcaffrey - what you write sounds correct, although I'm not sure the issue is RTX related. Note I have also seen similar things to @aankur (slow BWE on the Chrome side) but I saw this a while back before I added the RTX code. From memory, I have seen it sometimes be slow and sometimes not with no obvious cause. @aankur - just to exclude this, can you try it without RTX, i.e. instead of registering the default codecs adding just H264 for example without adding RTX, then see if the issue still occurs. You may need to run the experiment e.g. 10 times to rule out the coincidence effect. |
One other cause I've seen (which you may have ran into @adriancable) is that if you are using simulcast, the first packet is dropped by Pion (see #2777). If libwebrtc sees the first packet it sends is lost, it enters a mode where the bandwidth estimate tends to not grow fast (for reasons I'm not 100% sure about). In our own internal testing, we saw that much of the time, libwebrtc happened to send an audio packet first (meaning the issue was not hit), but if it happened to send a simulcast video packet first, then the problem would occur. So to further rule out possible causes, you can try making sure simulcast is not enabled (if it was on the client) |
@kcaffrey - also to add, it isn't really possible 'not to consume' RTX packets. In the implementation that landed, track.Read() returns both RTX and non-RTX packets (since most consumers don't need to know the difference), with the attributes being set to indicate if it's RTX (if the consumer does need to know). |
@aankur - are you able to get a WebRTC packet trace and post it here for us to look at? If you go to chrome://webrtc-internals there's now an option to 'create diagnostic packet recordings'. After turning this on you'll need to restart the stream (so we can see everything from the start) and then locate the right log file and attach it here. |
@kcaffrey / @aankur - there's one other thing to bear in mind if we are looking at differences with RTX and without. A recent change in libwebrtc (which may or may not have landed in Chrome M126) starts sending BWE probes before the tracks start, on SSRC 0, if RTX is enabled. Previously Pion didn't handle these and so it looked to the sender like these packets were lost, which would make the initial BWE hit the floor until it caught up via subsequent probes. This would have a similar effect to what @aankur reported, but it was fixed in #2816. (I did test this change, and it seems to do the right thing, but it's possible there's some edge case it's not handling right.) Pointing this out because if @aankur finds that their issue happens only with RTX enabled, does not necessarily mean the issue is with Pion's RTX handling, even if the issue is on the Pion side. I'll be able to see whether this is what's happening from the Chrome WebRTC packet trace. |
Hi,
Without RTX Packet dump With RTX and TWCC Packet Dump |
@aankur - in your original post, for pion v3, you mentioned v3.2.49, then said it happens in v3.2.25 but not v3.2.24, then above you mention v3.2.22. That's a lot of versions! Can you just tell us two, the last version that works and the first version which doesn't? It does look like the 'with RTX' case is the one that's problematic. In which case the difference between v3.2.24 and v3.2.25 is explained by the fact that v3.2.24 does not support RTX. It isn't clear to me this is a Pion issue vs. a difference in how Chrome does BWE in the RTX vs non-RTX cases (they are quite different, in the non-RTX case you get padding probes on the main track, in the RTX case you don't get padding probes, instead you get periodic retransmissions on the RTX track which also serve as BWE probes). It is quite possible that the different numbers (and nature) of the probes you get from Chrome in the two cases make it take longer or quicker for Chrome to do the BWE. Can you clarify the topology here? You are sending video from the Chrome side and receiving from the Pion side, correct? In this case, I would look at what happens between Chrome and Chrome in the RTX and non-RTX cases. So take Pion out of the equation and just see if it's a 'natural' RTX vs non-RTX effect. |
Hi, i am sending data from Chrome to Pion (Video Only/Not Simulcast)
|
@aankur - please try sending video from Chrome to Chrome in the RTX vs non-RTX case, and then look at the targetBitrate graph for the sender Chrome in both cases. My guess is that what you are seeing is the difference in how Chrome (actually libwebrtc) does bandwidth estimation in the RTX vs non-RTX case, and that it isn't a Pion issue. v3.2.24 and v3.2.25 are different only because the former doesn't support RTX. There are no other changes (as I understand it), so this is consistent with this being a Chrome-side difference in how BWE is done in the RTX vs non-RTX case. |
OK, so it looks like: when RTX is enabled, you get a slow ramp on the sender side if the receiver is Pion, but not if the receiver is Chrome. When RTX is disabled, everything works similarly whether the receiver is Pion or Chrome. (The Pion version is kind of a red herring, because 3.2.24 does not support RTX so whether the sender offers RTX or not does not matter, so you are always operating with RTX disabled, which is why you always get the fast ramp with 3.2.24.) Sorry if this duplicates what you sent previously a little, but can you send 4 Chrome WebRTC packet logs (all from the sender side, all capturing from the very beginning of the stream):
... along with the targetBitrate graph screenshots for each of the 4 cases. I'll then see if I can spot any difference between #1 vs. #2, and #3 vs. #4. Thank you so much! |
@aankur - when you save the Chrome WebRTC logs you will need to save them in V1 format so my tools can digest them properly. To do that you will need to run Chrome with this command line parameter:
|
Nice sleuthing, @cnderrauber and @boks1971 I bet will know the answer quick! Tell me if there is anything I can do to help. I hope this isn't because we are handling NACKs across different routines (timing issues). If it is this is still all fixable. We should update |
I can reproduce this with v3 master, the targetbitrates/availableOutgingbitrates costs 10sec to reach 1.5Mbps@480p with default codecs (has rtx enabled). If I comment out rtx codec in the codec register then it reach the target nearly immediately. |
It shows no difference in the livekit sfu (rtx vs no rtx), wondering if it relates to how twcc process the packets that the livekit sfu does't use interceptor (but uses the pion twcc's recorder). |
@cnderrauber - it must be due to some difference between Pion and Chrome in how TWCC packets for the RTX case are generated. I had a quick look previously at the TWCC packets generated for RTX and eyeballing they looked OK, but I haven't compared Chrome with Pion. Are you able to get two Chrome WebRTC logs from the video sender side (with RTX enabled), one where the receiver is Pion and one where the receiver is Chrome, running Chrome with Also, if you can, please post the targetBitrate graph screenshot from Chrome in both cases. Thanks! |
Hi, Single Video, No Simulcast -> Chrome to Chrome/Chrome to Pion
|
@cnderrauber - I have noticed something odd. Here is a log of some TWCC packets being sent from a Pion receiver. SSRC 2859612492 is the main video track, SSRC 631513261 is the RTX track. So in this log we are getting 2 main track TWCCs, followed by 3 RTX TWCCs, followed by 2 more main track TWCCs.
I thought the sanity check (assuming the packets are received in order, which they are) for TWCC is that the base sequence number for each TWCC must equal the base sequence number for the previous TWCC + the status count for the previous TWCC. This fails between adjacent TWCC RTX packets. I am not surprised this messes up BWE. I have no idea why it's happening, though. |
@cnderrauber Would you be ok putting RTX behind a flag until this is fixed? If you have time to fix it that would be even better, but I understand if you don't. The default experience of Pion is broken for users currently. I should have time soon to address this. Releasing new version of DTLS today |
@adriancable I looked at the TWCC logs you posted, and it all looks rather normal to me. What it indicates is that the TWCC interceptor is "losing" some packets (which I suspect are the RTX packets not making their way through the TWCC interceptor, somehow).
One note is that TWCC is transport-wide, so there is one contiguous sequence of TWCCs across all SSRCs. With that in mind, the sequence you pasted has the following base sequence numbers and status counts:
With the exception of the two entries that I added an asterisk to, each BSN equals the prior plus the status count (4750=4713+37, 4751=4750+1, etc). The explanation of the two noted are packets that were denoted as lost in the prior feedback which arrived late. For example, in the third TWCC feedback (BSN 4751, count 29), there is a run of 14 "receive small deltas", followed by 2 missing packets, then a received large delta and 4 small deltas. The first missing packet would have been SN=4765. In the following feedback, both 4765 and 4766 are denoted as "receive small delta" (therefore being late arrivals). This behavior of having the base sequence number be smaller than the prior BSN + count is expected and desirable, as seen in libwebrtc's unit tests (also, there is no other way to report that a previously missing packet was received). Note, however, that not all the missing packets were marked as late arrivals. In the first feedback, for example, 4715, 4716, and 4717 were all marked as missing and never marked as arrived in a later feedback. There are many other such permanently missing packets, which I highly suspect were RTX. Since chrome sees so many missing packets in TWCC feedback, the bandwidth estimate grows slowly due to the loss based estimator being used. |
had some threads this afternoon but was occupied by other things, it should relate to how rtx has been read in the rtpreceiver, will have a pr tomorrow |
@adriancable the rtx read loop could be blocked by RemoteTrack.Read since the repair stream uses a no buffer channel, and the interceptor uses the timestamp when the packet has been read from the buffer that explains the out-of-order feedback @kcaffrey had seen. |
@kcaffrey - the log would be consistent if there were lots of out of order packets. But that isn't the case. For example:
This looks like the 4767 was received (a long time) before 4766. But looking at the tcpdump on the receiver side this isn't the case. 4767 was actually received a few ms after 4766. @cnderrauber - I think you are on to something. The current logic in trackRemote.Read() returns an RTX packet if one is waiting in the repairStreamChannel, otherwise it will block on reading the non-repair track. So the failure mode here in this logic is that, if there's no RTX packet waiting and we block on the non-repair track, and then some RTX packets come in, we won't process them until the next non-repair packet comes in (which unblocks the loop). This could definitely mess up the timestamps and hence the BWE. I don't think it's an issue that the repairStreamChannel is unbuffered, but in trackRemote.Read() we probably want instead to block the loop until either an RTX or a non-repair packet comes in. If you have a strategy in mind for a PR, that would be amazing. |
@aankur - are you able to test this modification. In rtpreceiver.go:
In track_remote.go:
I'm currently OOO so this is tested by my eyeballs only. |
Hi, |
@aankur - that’s disappointing. Maybe there’s something not right with my fix, or maybe the cause of your issue is elsewhere. @cnderrauber - I’ll wait for your tests/PR since I may not have understood your concern right, and fixed the wrong issue (although I think what I fixed in my code above should still be fixed, probably is a less haphazard way). |
@aankur - one other thing you could try is flipping round the order of the case statements in readRTP, to give priority to the RTX track, i.e.
I'd be interested to see if this makes a difference. |
Fix #2830. The TrackRemote.Read could block in readRTP if the buffer is empty then rtx packets arrival before next media rtp packet will be readed after the next media rtp packet and cause out-of-order fb and mess up remote peer's bandwidth estimation.
@adriancable The problem is the reading is blocked by empty buffer to wait for next video rtp packet then the rtx packets arrival before it will be read after it then cause out-of-order feedback. #2833 works in my test, @aankur can you confirm it works too? |
Hi, |
Fix #2830. The TrackRemote.Read could block in readRTP if the buffer is empty then rtx packets arrival before next media rtp packet will be readed after the next media rtp packet and cause out-of-order fb and mess up remote peer's bandwidth estimation.
@cnderrauber - this is great - thank you so much! However, I still think the RTX code needs modifying along the lines of #2830 (comment) As it currently stands, if track.Read() is called when there is no RTX packet waiting, it will block on reading from the non-repair track. At this point, if RTX packets come in, they will just queue up until the next non-repair RTP packet comes in, which then unblocks. This may not be terrible in practice because there shouldn't be long gaps between RTP packets on the non-repair track, so the RTX packets would not be blocked for long. But I think we should fix it because the point of RTX is to get missing packets to the receiver as quickly as possible. I think my modification achieves that, but it would be great to have some extra eyeballs before I make the PR. |
@adriancable wait for |
@cnderrauber - I agree. I think we are good to close this! |
Your environment.
What did you do?
Used the Broadcast Example, created a new RegisterDefaultInterceptors without webrtc.ConfigureTWCCSender
replaced
What did you expect?
Bandwidth Ramp-up should be fast with webrtc.ConfigureTWCCSender
What happened?
looks like introduced in v3.2.25
when using webrtc.ConfigureTWCCSender the rampup was slow and in a sea-saw pattern and got stuck at 1 mbps
when not using it it was still slow but went upto 2.5 mbps, please see the Target Bitrate as reported on the graph
With ConfigureTWCCSender , Slow, Seesaw and stuck at 1mbps
Without ConfigureTWCCSender, slow but goes upto 2.5mbps
With ConfigureTWCCSender in v3.2.24
The text was updated successfully, but these errors were encountered: