-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding new candidates and invalidating all previous ones flaps instantly to disconnected #486
Comments
Notice that Disconnected doesn't mean it's over. The ICE spec has a Failed We can always come back from Disconnected, which means the app shouldn't do anything drastic in that state. The question is whether this scenario you're testing has an unreasonable amount of time passing when coming back from Disconnected? |
I see, we currently treat it as drastic so that is probably the bug here. Let me test with delaying that for a bit. |
Some preliminary testing suggests that this works quite well! Invalidating the old candidate and delaying acting on |
Wihoo! |
One question though, why doesn't it immediately switch to I've given it new candidates, shouldn't it switch to |
I think it's because we nominated, and that nomination goes away. If I recall correctly Checking is maybe only seen before anything is connected. |
Looking closer at the state machine, there seems to never be a transition to Once we are in |
I'm wondering if this is a spec thing. I don't think I ended up here by accident. |
I've grepped the spec for But outside of that, this looks like a bug to me. From str0m's behaviour we are in |
I think I got it from here: https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/iceConnectionState#disconnected |
Not that we need to follow how browsers do it, but I figured for someone using str0m, it's nice if the states corresponds as much as possible with what they see in the browser. |
I see. This is somewhat of an edge-case though :) If you invalidate all current candidates and add new ones, you are kind of going back to the start, as if the agent just got created. Technically none of the checks failed as referred to be the linked text. I am currently testing with a fork. If this is successful, would you mind a patch for this? |
Now I can't reproduce the disconnect anymore. Not sure what I was testing but the state change is now always |
Would a patch help you guys? I'm sort of thinking I like to preserve the (close to browser) behavior if I can. But I'm not feeling very strongly about it. |
I am still evaluating! :)
I can understand the sentiment! If I'd have to find an argument for it, then I'd probably say that I would put this into the "we are a WebRTC implementation with an unusual API and this is part of it". When looking at the actual behaviour in regards to the protocol then we are in checking state so it doesn't affect compatibility with anything but is more of an API decision towards the application. |
Related discussion #416 Notice Peter Thatcher agrees with not treating |
I am not disagreeing, just want to defer this complexity because it comes with other implications :) It lengthens the overall timeout before we can reset all the state because the connection is actually broken. The layer above ICE in our app doesn't know about "temporary connection failures" so it can't act on it. While that could be refactored, it doesn't seem to have a lot of value right now. It seems useful to entirely avoid going into any error state if we can, be it temporary or permanent. In this case, I would even argue that it is somewhat of a premature decision by the state machine. Yes, I've invalidated the last nominated candidate but I've also given you new ones and it is checking them. Why should that output Maybe we are just hairsplitting the terminology here? It seems impossible to capture this accurately in a single state. Connected / Disconnected is one. Checking / Completed is orthogonal to that, isn't it? Footnotes
|
I've now implemented a timer and I am now more convinced that even though a timer might be correct in the general case, it is a workaround for this particular scenario. When switching networks (i.e. invalidating current candidates and adding new ones), connectivity is likely going to be restored. How long will that take? 5 seconds? 10? 15? it is a guessing game. Whilst it may be appealing to have a generic handling of " It seems to be a much more robust solution to have str0m test the new candidates and tell me when those fail. But for that, I need to first know that it is even testing new candidates (i.e. emit
|
When roaming networks, we learn about new interfaces and retire old ones. It appears that, if str0m is told about all of this information at once (i.e.
invalidate
all current candidates and add new ones), it instantly flaps toDisconnected
instead of trying the new candidate pairs.I think this is due to
evaluate_state
being called at the very top ofhandle_timeout
:str0m/src/ice/agent.rs
Line 851 in aeda389
Is this on purpose? I would have expected str0m first start testing all newly added candidates before concluding that we are
Disconnected
.The text was updated successfully, but these errors were encountered: