Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent issue establishing DTLS connectivty #2113

Closed
bryphe-coder opened this issue Feb 3, 2022 · 7 comments · Fixed by #2114
Closed

Intermittent issue establishing DTLS connectivty #2113

bryphe-coder opened this issue Feb 3, 2022 · 7 comments · Fixed by #2114

Comments

@bryphe-coder
Copy link
Contributor

Hi, first off, thanks for all the work you do in maintaining this library!

Your environment.

  • Version: Release or SHA
  • Browser: No browser, just go - test case with both pion client <-> pion server
  • Other Information - see below (towards the end) for full failure log and success log

What did you do?

We've set up a test case that creates a client and server and connects them with pion. Most of the time, the test passes. However, we've found ~10% of the time, there will be a failure. (cc @kylecarbs)

When it fails, it seems there is an issue establishing a DTLS connection. Here's an excerpt of the log where a failure takes place:

    t.go:56: 2022-02-02 23:39:00.907 [DEBUG]	(server)	<conn.go:222>	dtls transport state updated	{"state": "connecting"}
dtls TRACE: 23:39:00.908347 handshaker.go:166: [handshake:client] Flight 1: Preparing
dtls TRACE: 23:39:00.908413 handshaker.go:166: [handshake:client] Flight 1: Sending
dtls TRACE: 23:39:00.908475 conn.go:382: [handshake:client] -> ClientHello (epoch: 0, seq: 0)
pc INFO: 2022/02/02 23:39:00 ICE connection state changed: connected
    t.go:56: 2022-02-02 23:39:00.908 [DEBUG]	(server)	<conn.go:148>	ice connection state updated	{"state": "connected"}
ice TRACE: 23:39:00.909089 selection.go:130: inbound STUN (SuccessResponse) from udp4 host 1.2.3.5:5578 to udp4 host 1.2.3.4:5231
ice TRACE: 23:39:00.909168 selection.go:140: Found valid candidate pair: prio 9151314440652587007 (local, prio 2130706431) udp4 host 1.2.3.4:5231 <-> udp4 host 1.2.3.5:5578 (remote, prio 2130706431)
ice TRACE: 23:39:00.909221 agent.go:568: Set selected candidate pair: prio 9151314440652587007 (local, prio 2130706431) udp4 host 1.2.3.4:5231 <-> udp4 host 1.2.3.5:5578 (remote, prio 2130706431)
ice INFO: 2022/02/02 23:39:00 Setting new connection state: Connected
    t.go:56: 2022-02-02 23:39:00.909 [DEBUG]	(client)	<conn.go:226>	selected candidate pair changed	{"local": {"foundation": "422338508", "priority": 2130706431, "address": "1.2.3.4", "protocol": 1, "port": 5231, "type": 1, "component": 1, "relatedAddress": "", "relatedPort": 0, "tcpType": ""}, "remote": {"foundation": "608990844", "priority": 2130706431, "address": "1.2.3.5", "protocol": 1, "port": 5578, "type": 1, "component": 1, "relatedAddress": "", "relatedPort": 0, "tcpType": ""}}
dtls TRACE: 23:39:00.909762 handshaker.go:166: [handshake:client] Flight 1: Waiting
pc WARNING: 2022/02/02 23:39:00 Failed to start manager: ICE connection not started
pc WARNING: 2022/02/02 23:39:00 Failed to start SCTP: DTLS not established
pc WARNING: 2022/02/02 23:39:00 undeclaredMediaProcessor failed to open SrtpSession: the DTLS transport has not started yet
pc INFO: 2022/02/02 23:39:00 ICE connection state changed: connected
    t.go:56: 2022-02-02 23:39:00.910 [DEBUG]	(client)	<conn.go:148>	ice connection state updated	{"state": "connected"}
pc WARNING: 2022/02/02 23:39:00 undeclaredMediaProcessor failed to open SrtcpSession: the DTLS transport has not started yet
pc INFO: 2022/02/02 23:39:00 ICE connection state changed: checking
    t.go:56: 2022-02-02 23:39:00.910 [DEBUG]	(client)	<conn.go:148>	ice connection state updated	{"state": "checking"}
    t.go:56: 2022-02-02 23:39:30.908 [DEBUG]	(server)	<conn.go:222>	dtls transport state updated	{"state": "failed"}
pc INFO: 2022/02/02 23:39:30 peer connection state changed: failed
pc WARNING: 2022/02/02 23:39:30 Failed to start manager: handshake error: context deadline exceeded
    t.go:56: 2022-02-02 23:39:30.909 [DEBUG]	(server)	<conn.go:186>	rtc connection updated	{"state": "failed"}
pc WARNING: 2022/02/02 23:39:30 Failed to start SCTP: DTLS not established

Essentially it seems that we get to the point of Flight 1: Waiting on the client, and then just Failed to start manager: ICE connection not started and Failed to start SCTP: DTLS not established.

We also see handshake error: context deadline exceed - but it's not clear to me if this is due to the previous failures, or is actually the root cause.

Unfortunately, our code is in a private repro, so I can't provide a runnable repro at the moment - but I'll see what I can do to minimize it.

Just a few questions:

  1. Have you seen issues like this before?
  2. Are there ideas we can try to troubleshoot this further (additional logging, shim in an alternate DTLS implementation, etc).

It's not clear to me if this is a bug in pion or a bug in the way we're using it - so appreciate any tips 😄

What did you expect?

Excerpt from successful run
    t.go:56: 2022-02-03 01:49:32.651 [DEBUG]    (server)        <conn.go:222>   dtls transport state updated    {"state": "connecting"}
pc INFO: 2022/02/03 01:49:32 ICE connection state changed: connected
    t.go:56: 2022-02-03 01:49:32.651 [DEBUG]    (server)        <conn.go:148>   ice connection state updated    {"state": "connected"}
    t.go:56: 2022-02-03 01:49:32.651 [DEBUG]    (client)        <conn.go:148>   ice connection state updated    {"state": "connected"}
dtls TRACE: 01:49:32.652197 handshaker.go:166: [handshake:client] Flight 1: Preparing
    t.go:56: 2022-02-03 01:49:32.651 [DEBUG]    (client)        <conn.go:222>   dtls transport state updated    {"state": "connecting"}
    t.go:56: 2022-02-03 01:49:32.651 [DEBUG]    (server)        <conn.go:226>   selected candidate pair changed {"local": {"foundation": "608990844", "priority": 2130706431, "address": "1.2.3.5", "protocol": 1, "port": 5887, "type": 1, "component": 1, "relatedAddress": "", "relatedPort": 0, "tcpType": ""}, "remote": {"foundation": "422338508", "priority": 2130706431, "address": "1.2.3.4", "protocol": 1, "port": 5081, "type": 1, "component": 1, "relatedAddress": "", "relatedPort": 0, "tcpType": ""}}
dtls TRACE: 01:49:32.652387 handshaker.go:166: [handshake:client] Flight 1: Sending
    t.go:56: 2022-02-03 01:49:32.651 [DEBUG]    (client)        <conn.go:226>   selected candidate pair changed {"local": {"foundation": "422338508", "priority": 2130706431, "address": "1.2.3.4", "protocol": 1, "port": 5081, "type": 1, "component": 1, "relatedAddress": "", "relatedPort": 0, "tcpType": ""}, "remote": {"foundation": "3674517152", "priority": 1862270975, "address": "1.2.3.5", "protocol": 1, "port": 5887, "type": 3, "component": 1, "relatedAddress": "", "relatedPort": 0, "tcpType": ""}}
dtls TRACE: 01:49:32.652587 conn.go:382: [handshake:client] -> ClientHello (epoch: 0, seq: 0)
dtls TRACE: 01:49:32.652733 handshaker.go:166: [handshake:server] Flight 0: Preparing
dtls TRACE: 01:49:32.652819 handshaker.go:166: [handshake:server] Flight 0: Sending
dtls TRACE: 01:49:32.652870 handshaker.go:166: [handshake:server] Flight 0: Waiting
dtls TRACE: 01:49:32.653020 handshaker.go:166: [handshake:client] Flight 1: Waiting
dtls INFO: 2022/02/03 01:49:32 [vendor time]
dtls TRACE: 01:49:32.654543 handshaker.go:283: [handshake:server] Flight 0 -> Flight 2
dtls TRACE: 01:49:32.654572 handshaker.go:166: [handshake:server] Flight 2: Preparing
dtls TRACE: 01:49:32.654606 handshaker.go:166: [handshake:server] Flight 2: Sending
dtls TRACE: 01:49:32.654669 conn.go:382: [handshake:server] -> HelloVerifyRequest (epoch: 0, seq: 0)
dtls TRACE: 01:49:32.654882 handshaker.go:166: [handshake:server] Flight 2: Waiting
dtls INFO: 2022/02/03 01:49:32 [vendor time]
dtls TRACE: 01:49:32.655164 handshaker.go:283: [handshake:client] Flight 1 -> Flight 3
dtls TRACE: 01:49:32.655193 handshaker.go:166: [handshake:client] Flight 3: Preparing
dtls TRACE: 01:49:32.655272 handshaker.go:166: [handshake:client] Flight 3: Sending
dtls TRACE: 01:49:32.655346 conn.go:382: [handshake:client] -> ClientHello (epoch: 0, seq: 1)
dtls TRACE: 01:49:32.655644 handshaker.go:166: [handshake:client] Flight 3: Waiting
dtls INFO: 2022/02/03 01:49:32 [vendor time]
dtls TRACE: 01:49:32.655787 handshaker.go:283: [handshake:server] Flight 2 -> Flight 4
dtls TRACE: 01:49:32.655812 handshaker.go:166: [handshake:server] Flight 4: Preparing
dtls TRACE: 01:49:32.655993 handshaker.go:166: [handshake:server] Flight 4: Sending
dtls TRACE: 01:49:32.656043 conn.go:382: [handshake:server] -> ServerHello (epoch: 0, seq: 1)
dtls TRACE: 01:49:32.656123 conn.go:382: [handshake:server] -> TypeCertificate (epoch: 0, seq: 2)
dtls TRACE: 01:49:32.656206 conn.go:382: [handshake:server] -> ServerKeyExchange (epoch: 0, seq: 3)
dtls TRACE: 01:49:32.656271 conn.go:382: [handshake:server] -> CertificateRequest (epoch: 0, seq: 4)
dtls TRACE: 01:49:32.656319 conn.go:382: [handshake:server] -> ServerHelloDone (epoch: 0, seq: 5)
dtls TRACE: 01:49:32.656559 handshaker.go:166: [handshake:server] Flight 4: Waiting
dtls TRACE: 01:49:32.656908 flight3handler.go:86: [handshake] use cipher suite: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
dtls INFO: 2022/02/03 01:49:32 [vendor time]
dtls TRACE: 01:49:32.659273 handshaker.go:283: [handshake:client] Flight 3 -> Flight 5
dtls TRACE: 01:49:32.659311 handshaker.go:166: [handshake:client] Flight 5: Preparing
dtls TRACE: 01:49:32.660117 handshaker.go:234: [handshake:client] -> changeCipherSpec (epoch: 1)
dtls TRACE: 01:49:32.660159 handshaker.go:166: [handshake:client] Flight 5: Sending
dtls TRACE: 01:49:32.660197 conn.go:382: [handshake:client] -> TypeCertificate (epoch: 0, seq: 2)
dtls TRACE: 01:49:32.660265 conn.go:382: [handshake:client] -> ClientKeyExchange (epoch: 0, seq: 3)
dtls TRACE: 01:49:32.660308 conn.go:382: [handshake:client] -> CertificateVerify (epoch: 0, seq: 4)
dtls TRACE: 01:49:32.660351 conn.go:382: [handshake:client] -> Finished (epoch: 1, seq: 5)
dtls TRACE: 01:49:32.660671 handshaker.go:166: [handshake:client] Flight 5: Waiting
dtls DEBUG: 01:49:32.660814 conn.go:731: CipherSuite not initialized, queuing packet
dtls DEBUG: 01:49:32.660899 conn.go:653: received packet of next epoch, queuing packet
dtls TRACE: 01:49:32.662465 conn.go:737: server: <- ChangeCipherSpec (epoch: 1)
dtls INFO: 2022/02/03 01:49:32 [vendor time]
dtls TRACE: 01:49:32.662591 handshaker.go:283: [handshake:server] Flight 4 -> Flight 6
dtls TRACE: 01:49:32.662625 handshaker.go:166: [handshake:server] Flight 6: Preparing
dtls TRACE: 01:49:32.662743 handshaker.go:234: [handshake:server] -> changeCipherSpec (epoch: 1)
dtls TRACE: 01:49:32.662784 handshaker.go:166: [handshake:server] Flight 6: Sending
dtls TRACE: 01:49:32.662818 conn.go:382: [handshake:server] -> Finished (epoch: 1, seq: 6)
dtls TRACE: 01:49:32.663141 handshaker.go:166: [handshake:server] Flight 6: Finished
dtls TRACE: 01:49:32.663197 conn.go:211: Handshake Completed
dtls TRACE: 01:49:32.663203 conn.go:737: client: <- ChangeCipherSpec (epoch: 1)
dtls INFO: 2022/02/03 01:49:32 [vendor time]
dtls TRACE: 01:49:32.663517 handshaker.go:283: [handshake:client] Flight 5 -> Flight 5
dtls TRACE: 01:49:32.663561 handshaker.go:166: [handshake:client] Flight 5: Finished
dtls TRACE: 01:49:32.663615 conn.go:211: Handshake Completed
    t.go:56: 2022-02-03 01:49:32.663 [DEBUG]    (client)        <conn.go:222>   dtls transport state updated 

What happened?

Full Failure Log (Verbose!)
ice TRACE: 23:39:00.877572 agent.go:596: pinging all candidates
    t.go:56: 2022-02-02 23:39:00.877 [DEBUG]	(client)	<conn.go:459>	creating data channel	{"label": "ping", "opts": "\u0026{ID:2 Negotiated:true Protocol: Unordered:false OpenOnDisconnect:true}"}
    t.go:56: 2022-02-02 23:39:00.878 [DEBUG]	(client)	<conn.go:267>	negotiating
ice WARNING: 2022/02/02 23:39:00 vnet is enabled
ice WARNING: 2022/02/02 23:39:00 vnet does not support mDNS yet
ice TRACE: 23:39:00.879754 agent.go:568: Set selected candidate pair: 
pc INFO: 2022/02/02 23:39:00 signaling state changed to have-local-offer
    t.go:56: 2022-02-02 23:39:00.880 [DEBUG]	(client)	<conn.go:165>	ice gathering state updated	{"state": "gathering"}
    t.go:56: 2022-02-02 23:39:00.880 [DEBUG]	(client)	<conn.go:459>	creating data channel	{"label": "echo", "opts": "\u0026{ID:1 Negotiated:true Protocol: Unordered:false OpenOnDisconnect:true}"}
    t.go:56: 2022-02-02 23:39:00.880 [DEBUG]	(client)	<conn.go:291>	sending offer
    t.go:56: 2022-02-02 23:39:00.881 [DEBUG]	(client)	<conn.go:297>	sent offer
    t.go:56: 2022-02-02 23:39:00.881 [DEBUG]	(client)	<conn.go:301>	awaiting remote description...
ice TRACE: 23:39:00.882255 selection.go:59: Nominatable pair found, nominating (udp4 host 1.2.3.4:5258, udp4 prflx 1.2.3.5:5767 related :0)
    t.go:56: 2022-02-02 23:39:00.881 [DEBUG]	(client)	<conn.go:218>	signaling state updated	{"state": "have-local-offer"}
ice TRACE: 23:39:00.882819 selection.go:87: ping STUN (nominate candidate pair) from udp4 host 1.2.3.4:5258 to udp4 prflx 1.2.3.5:5767 related :0
ice TRACE: 23:39:00.883360 agent.go:955: ping STUN from udp4 host 1.2.3.4:5258 to udp4 prflx 1.2.3.5:5767 related :0
    t.go:56: 2022-02-02 23:39:00.884 [DEBUG]	(server)	<conn.go:459>	creating data channel	{"label": "ping", "opts": "\u0026{ID:1 Negotiated:true Protocol: Unordered:false OpenOnDisconnect:true}"}
ice TRACE: 23:39:00.886107 agent.go:1106: inbound STUN (Request) from 1.2.3.4:5258 to udp4 host 1.2.3.5:5767
ice TRACE: 23:39:00.886293 agent.go:568: Set selected candidate pair: prio 9151314440652587007 (local, prio 2130706431) udp4 host 1.2.3.5:5767 <-> udp4 host 1.2.3.4:5258 (remote, prio 2130706431)
ice INFO: 2022/02/02 23:39:00 Setting new connection state: Connected
    t.go:56: 2022-02-02 23:39:00.886 [DEBUG]	(server)	<conn.go:459>	creating data channel	{"label": "echo", "opts": "\u0026{ID:2 Negotiated:true Protocol: Unordered:false OpenOnDisconnect:true}"}
    t.go:56: 2022-02-02 23:39:00.887 [DEBUG]	(client)	<conn.go:459>	creating data channel	{"label": "hello", "opts": "\u0026{ID:0 Negotiated:false Protocol: Unordered:false OpenOnDisconnect:false}"}
    t.go:56: 2022-02-02 23:39:00.889 [DEBUG]	(server)	<conn.go:267>	negotiating
    t.go:56: 2022-02-02 23:39:00.890 [DEBUG]	(server)	<conn.go:301>	awaiting remote description...
    t.go:56: 2022-02-02 23:39:00.890 [DEBUG]	(server)	<conn.go:307>	setting remote description
pc INFO: 2022/02/02 23:39:00 signaling state changed to have-remote-offer
ice WARNING: 2022/02/02 23:39:00 vnet is enabled
ice WARNING: 2022/02/02 23:39:00 vnet does not support mDNS yet
    t.go:56: 2022-02-02 23:39:00.891 [DEBUG]	(server)	<conn.go:218>	signaling state updated	{"state": "have-remote-offer"}
ice TRACE: 23:39:00.891860 agent.go:568: Set selected candidate pair: 
pc INFO: 2022/02/02 23:39:00 signaling state changed to stable
    t.go:56: 2022-02-02 23:39:00.892 [DEBUG]	(server)	<conn.go:165>	ice gathering state updated	{"state": "gathering"}
    t.go:56: 2022-02-02 23:39:00.892 [DEBUG]	(server)	<conn.go:218>	signaling state updated	{"state": "stable"}
    t.go:56: 2022-02-02 23:39:00.892 [DEBUG]	(server)	<conn.go:330>	sending answer
    t.go:56: 2022-02-02 23:39:00.892 [DEBUG]	(server)	<conn.go:336>	sent answer
ice TRACE: 23:39:00.894165 selection.go:130: inbound STUN (SuccessResponse) from udp4 prflx 1.2.3.5:5767 related :0 to udp4 host 1.2.3.4:5258
ice TRACE: 23:39:00.894252 selection.go:140: Found valid candidate pair: prio 7998392936314175488 (local, prio 2130706431) udp4 host 1.2.3.4:5258 <-> udp4 prflx 1.2.3.5:5767 related :0 (remote, prio 1862270975)
ice TRACE: 23:39:00.894493 agent.go:568: Set selected candidate pair: prio 7998392936314175488 (local, prio 2130706431) udp4 host 1.2.3.4:5258 <-> udp4 prflx 1.2.3.5:5767 related :0 (remote, prio 1862270975)
ice INFO: 2022/02/02 23:39:00 Setting new connection state: Connected
pc INFO: 2022/02/02 23:39:00 ICE connection state changed: connected
pc INFO: 2022/02/02 23:39:00 ICE connection state changed: connected
ice DEBUG: 23:39:00.896017 agent.go:454: Started agent: isControlling? false, remoteUfrag: "OWvtSjrSpVqjdngV", remotePwd: "dfnYYGnqyTxObdtrKOIDJpXGFVlTtDlE"
ice INFO: 2022/02/02 23:39:00 Setting new connection state: Checking
pc INFO: 2022/02/02 23:39:00 ICE connection state changed: checking
ice TRACE: 23:39:00.896566 agent.go:596: pinging all candidates
ice WARNING: 2022/02/02 23:39:00 pingAllCandidates called with no candidate pairs. Connection is not possible yet.
    t.go:56: 2022-02-02 23:39:00.893 [DEBUG]	(client)	<conn.go:307>	setting remote description
    t.go:56: 2022-02-02 23:39:00.893 [DEBUG]	(client)	<conn.go:165>	ice gathering state updated	{"state": "complete"}
    t.go:56: 2022-02-02 23:39:00.902 [DEBUG]	(client)	<conn.go:218>	signaling state updated	{"state": "stable"}
    t.go:56: 2022-02-02 23:39:00.893 [DEBUG]	(client)	<conn.go:236>	sending local candidate	{"candidate": "candidate:422338508 1 udp 2130706431 1.2.3.4 5231 typ host"}
dtls TRACE: 23:39:00.903170 handshaker.go:166: [handshake:client] Flight 1: Preparing
ice DEBUG: 23:39:00.903217 agent.go:454: Started agent: isControlling? true, remoteUfrag: "ENLEMllmrNzTZFti", remotePwd: "OZifIohfKWDUXRaXvybJcgGSgkLoRpFa"
dtls TRACE: 23:39:00.903240 handshaker.go:166: [handshake:client] Flight 1: Sending
ice INFO: 2022/02/02 23:39:00 Setting new connection state: Checking
dtls TRACE: 23:39:00.903301 conn.go:382: [handshake:client] -> ClientHello (epoch: 0, seq: 0)
    t.go:56: 2022-02-02 23:39:00.896 [DEBUG]	(server)	<conn.go:148>	ice connection state updated	{"state": "checking"}
    t.go:56: 2022-02-02 23:39:00.896 [DEBUG]	(server)	<conn.go:236>	sending local candidate	{"candidate": "candidate:608990844 1 udp 2130706431 1.2.3.5 5578 typ host"}
    t.go:56: 2022-02-02 23:39:00.896 [DEBUG]	(server)	<conn.go:165>	ice gathering state updated	{"state": "complete"}
    t.go:56: 2022-02-02 23:39:00.904 [DEBUG]	(server)	<conn.go:350>	accepting candidate	{"candidate": "candidate:422338508 1 udp 2130706431 1.2.3.4 5231 typ host"}
    t.go:56: 2022-02-02 23:39:00.904 [DEBUG]	(client)	<conn.go:350>	accepting candidate	{"candidate": "candidate:608990844 1 udp 2130706431 1.2.3.5 5578 typ host"}
ice TRACE: 23:39:00.904846 agent.go:596: pinging all candidates
ice TRACE: 23:39:00.904977 agent.go:955: ping STUN from udp4 host 1.2.3.5:5578 to udp4 host 1.2.3.4:5231
ice TRACE: 23:39:00.905177 agent.go:1106: inbound STUN (Request) from 1.2.3.5:5578 to udp4 host 1.2.3.4:5231
ice TRACE: 23:39:00.905449 selection.go:221: inbound STUN (SuccessResponse) from udp4 host 1.2.3.4:5231 to udp4 host 1.2.3.5:5578
ice TRACE: 23:39:00.905531 selection.go:231: Found valid candidate pair: prio 9151314440652587007 (local, prio 2130706431) udp4 host 1.2.3.5:5578 <-> udp4 host 1.2.3.4:5231 (remote, prio 2130706431)
ice TRACE: 23:39:00.905588 agent.go:596: pinging all candidates
ice TRACE: 23:39:00.905697 agent.go:955: ping STUN from udp4 host 1.2.3.4:5231 to udp4 host 1.2.3.5:5578
ice TRACE: 23:39:00.905838 agent.go:1106: inbound STUN (Request) from 1.2.3.4:5231 to udp4 host 1.2.3.5:5578
ice TRACE: 23:39:00.906019 agent.go:955: ping STUN from udp4 host 1.2.3.5:5578 to udp4 host 1.2.3.4:5231
ice TRACE: 23:39:00.906209 selection.go:130: inbound STUN (SuccessResponse) from udp4 host 1.2.3.5:5578 to udp4 host 1.2.3.4:5231
ice TRACE: 23:39:00.906279 selection.go:140: Found valid candidate pair: prio 9151314440652587007 (local, prio 2130706431) udp4 host 1.2.3.4:5231 <-> udp4 host 1.2.3.5:5578 (remote, prio 2130706431)
ice TRACE: 23:39:00.906366 agent.go:1106: inbound STUN (Request) from 1.2.3.5:5578 to udp4 host 1.2.3.4:5231
ice TRACE: 23:39:00.906503 selection.go:106: The candidate (udp4 host 1.2.3.4:5231, udp4 host 1.2.3.5:5578) is the best candidate available, marking it as nominated
ice TRACE: 23:39:00.906598 selection.go:87: ping STUN (nominate candidate pair) from udp4 host 1.2.3.4:5231 to udp4 host 1.2.3.5:5578
ice TRACE: 23:39:00.906637 agent.go:955: ping STUN from udp4 host 1.2.3.4:5231 to udp4 host 1.2.3.5:5578
ice TRACE: 23:39:00.906832 selection.go:221: inbound STUN (SuccessResponse) from udp4 host 1.2.3.4:5231 to udp4 host 1.2.3.5:5578
ice TRACE: 23:39:00.906899 selection.go:231: Found valid candidate pair: prio 9151314440652587007 (local, prio 2130706431) udp4 host 1.2.3.5:5578 <-> udp4 host 1.2.3.4:5231 (remote, prio 2130706431)
ice TRACE: 23:39:00.906999 agent.go:1106: inbound STUN (Request) from 1.2.3.4:5231 to udp4 host 1.2.3.5:5578
ice TRACE: 23:39:00.907067 agent.go:568: Set selected candidate pair: prio 9151314440652587007 (local, prio 2130706431) udp4 host 1.2.3.5:5578 <-> udp4 host 1.2.3.4:5231 (remote, prio 2130706431)
ice INFO: 2022/02/02 23:39:00 Setting new connection state: Connected
    t.go:56: 2022-02-02 23:39:00.907 [DEBUG]	(server)	<conn.go:226>	selected candidate pair changed	{"local": {"foundation": "608990844", "priority": 2130706431, "address": "1.2.3.5", "protocol": 1, "port": 5578, "type": 1, "component": 1, "relatedAddress": "", "relatedPort": 0, "tcpType": ""}, "remote": {"foundation": "422338508", "priority": 2130706431, "address": "1.2.3.4", "protocol": 1, "port": 5231, "type": 1, "component": 1, "relatedAddress": "", "relatedPort": 0, "tcpType": ""}}
    t.go:56: 2022-02-02 23:39:00.907 [DEBUG]	(server)	<conn.go:222>	dtls transport state updated	{"state": "connecting"}
dtls TRACE: 23:39:00.908347 handshaker.go:166: [handshake:client] Flight 1: Preparing
dtls TRACE: 23:39:00.908413 handshaker.go:166: [handshake:client] Flight 1: Sending
dtls TRACE: 23:39:00.908475 conn.go:382: [handshake:client] -> ClientHello (epoch: 0, seq: 0)
pc INFO: 2022/02/02 23:39:00 ICE connection state changed: connected
    t.go:56: 2022-02-02 23:39:00.908 [DEBUG]	(server)	<conn.go:148>	ice connection state updated	{"state": "connected"}
ice TRACE: 23:39:00.909089 selection.go:130: inbound STUN (SuccessResponse) from udp4 host 1.2.3.5:5578 to udp4 host 1.2.3.4:5231
ice TRACE: 23:39:00.909168 selection.go:140: Found valid candidate pair: prio 9151314440652587007 (local, prio 2130706431) udp4 host 1.2.3.4:5231 <-> udp4 host 1.2.3.5:5578 (remote, prio 2130706431)
ice TRACE: 23:39:00.909221 agent.go:568: Set selected candidate pair: prio 9151314440652587007 (local, prio 2130706431) udp4 host 1.2.3.4:5231 <-> udp4 host 1.2.3.5:5578 (remote, prio 2130706431)
ice INFO: 2022/02/02 23:39:00 Setting new connection state: Connected
    t.go:56: 2022-02-02 23:39:00.909 [DEBUG]	(client)	<conn.go:226>	selected candidate pair changed	{"local": {"foundation": "422338508", "priority": 2130706431, "address": "1.2.3.4", "protocol": 1, "port": 5231, "type": 1, "component": 1, "relatedAddress": "", "relatedPort": 0, "tcpType": ""}, "remote": {"foundation": "608990844", "priority": 2130706431, "address": "1.2.3.5", "protocol": 1, "port": 5578, "type": 1, "component": 1, "relatedAddress": "", "relatedPort": 0, "tcpType": ""}}
dtls TRACE: 23:39:00.909762 handshaker.go:166: [handshake:client] Flight 1: Waiting
pc WARNING: 2022/02/02 23:39:00 Failed to start manager: ICE connection not started
pc WARNING: 2022/02/02 23:39:00 Failed to start SCTP: DTLS not established
pc WARNING: 2022/02/02 23:39:00 undeclaredMediaProcessor failed to open SrtpSession: the DTLS transport has not started yet
pc INFO: 2022/02/02 23:39:00 ICE connection state changed: connected
    t.go:56: 2022-02-02 23:39:00.910 [DEBUG]	(client)	<conn.go:148>	ice connection state updated	{"state": "connected"}
pc WARNING: 2022/02/02 23:39:00 undeclaredMediaProcessor failed to open SrtcpSession: the DTLS transport has not started yet
pc INFO: 2022/02/02 23:39:00 ICE connection state changed: checking
    t.go:56: 2022-02-02 23:39:00.910 [DEBUG]	(client)	<conn.go:148>	ice connection state updated	{"state": "checking"}
    t.go:56: 2022-02-02 23:39:30.908 [DEBUG]	(server)	<conn.go:222>	dtls transport state updated	{"state": "failed"}
pc INFO: 2022/02/02 23:39:30 peer connection state changed: failed
pc WARNING: 2022/02/02 23:39:30 Failed to start manager: handshake error: context deadline exceeded
    t.go:56: 2022-02-02 23:39:30.909 [DEBUG]	(server)	<conn.go:186>	rtc connection updated	{"state": "failed"}
pc WARNING: 2022/02/02 23:39:30 Failed to start SCTP: DTLS not established
pc WARNING: 2022/02/02 23:39:30 undeclaredMediaProcessor failed to open SrtcpSession: the DTLS transport has not started yet
pc WARNING: 2022/02/02 23:39:30 undeclaredMediaProcessor failed to open SrtpSession: the DTLS transport has not started yet
    t.go:56: 2022-02-02 23:39:30.909 [DEBUG]	(server)	<channel.go:268>	datachannel closing with error	{"id": 1, "label": "ping"} ...
        "error": connection has failed:
                     github.com/<org>/<repo>/g/peer.init
                         /home/runner/work/<org>/<repo>/g/peer/conn.go:26
    t.go:56: 2022-02-02 23:39:30.909 [DEBUG]	(server)	<channel.go:268>	datachannel closing with error	{"id": 2, "label": "echo"} ...
        "error": connection has failed:
                     github.com/<org>/<repo>/g/peer.init
                         /home/runner/work/<org>/<repo>/g/peer/conn.go:26
ice TRACE: 23:39:30.909781 selection.go:51: checking keepalive
ice TRACE: 23:39:30.910470 agent.go:955: ping STUN from udp4 host 1.2.3.4:5231 to udp4 host 1.2.3.5:5578
    t.go:56: 2022-02-02 23:39:30.910 [DEBUG]	(server)	<conn.go:550>	closing conn with error ...
        "error": read ping echo channel:
                     github.com/<org>/<repo>/g/peer.(*Conn).pingEchoChannel.func1.1
                         /home/runner/work/<org>/<repo>/g/peer/conn.go:413
                   - connection has failed:
                     github.com/<org>/<repo>/g/peer.init
                         /home/runner/work/<org>/<repo>/g/peer/conn.go:26
ice TRACE: 23:39:30.910654 agent.go:1106: inbound STUN (Request) from 1.2.3.4:5231 to udp4 host 1.2.3.5:5578
    t.go:56: 2022-02-02 23:39:30.910 [DEBUG]	(server)	<conn.go:222>	dtls transport state updated	{"state": "closed"}
ice TRACE: 23:39:30.910857 agent.go:955: ping STUN from udp4 host 1.2.3.5:5578 to udp4 host 1.2.3.4:5231
ice WARNING: 2022/02/02 23:39:30 taskLoop failed: the agent is closed
ice TRACE: 23:39:30.911108 selection.go:130: inbound STUN (SuccessResponse) from udp4 host 1.2.3.5:5578 to udp4 host 1.2.3.4:5231
ice TRACE: 23:39:30.911180 selection.go:140: Found valid candidate pair: prio 9151314440652587007 (local, prio 2130706431) udp4 host 1.2.3.4:5231 <-> udp4 host 1.2.3.5:5578 (remote, prio 2130706431)
ice INFO: 2022/02/02 23:39:30 Setting new connection state: Closed
ice TRACE: 23:39:30.911269 agent.go:1106: inbound STUN (Request) from 1.2.3.5:5578 to udp4 host 1.2.3.4:5231
pc INFO: 2022/02/02 23:39:30 ICE connection state changed: closed
pc INFO: 2022/02/02 23:39:30 peer connection state changed: closed
    t.go:56: 2022-02-02 23:39:30.911 [DEBUG]	(server)	<conn.go:148>	ice connection state updated	{"state": "closed"}
    t.go:56: 2022-02-02 23:39:30.911 [DEBUG]	(server)	<conn.go:568>	waiting for rtc connection close...
    t.go:56: 2022-02-02 23:39:30.911 [DEBUG]	(server)	<conn.go:186>	rtc connection updated	{"state": "closed"}
    t.go:56: 2022-02-02 23:39:30.911 [DEBUG]	(server)	<conn.go:572>	waiting for ice connection close...
    t.go:56: 2022-02-02 23:39:30.911 [DEBUG]	(server)	<conn.go:581>	closed
    conn_test.go:115: 
        	Error Trace:	conn_test.go:115
        	Error:      	Received unexpected error:
        	            	read ping echo channel:
        	            	    github.com/<org>/<repo>/g/peer.(*Conn).pingEchoChannel.func1.1
        	            	        /home/runner/work/<org>/<repo>/g/peer/conn.go:413
        	            	  - connection has failed:
        	            	    github.com/<org>/<repo>/g/peer.init
        	            	        /home/runner/work/<org>/<repo>/g/peer/conn.go:26
        	Test:       	TestConn/Accept
    t.go:56: 2022-02-02 23:39:30.912 [DEBUG]	(client)	<conn.go:550>	closing conn with error	{"error": null}
    t.go:56: 2022-02-02 23:39:30.912 [DEBUG]	(client)	<channel.go:268>	datachannel closing with error	{"id": 2, "label": "ping"} ...
        "error": connection was closed:
                     github.com/<org>/<repo>/g/peer.init
                         /home/runner/work/<org>/<repo>/g/peer/conn.go:29
                   - EOF
    t.go:56: 2022-02-02 23:39:30.912 [DEBUG]	(client)	<conn.go:222>	dtls transport state updated	{"state": "closed"}
ice INFO: 2022/02/02 23:39:30 Setting new connection state: Closed
pc INFO: 2022/02/02 23:39:30 peer connection state changed: closed
    t.go:56: 2022-02-02 23:39:30.913 [DEBUG]	(client)	<conn.go:568>	waiting for rtc connection close...
    t.go:56: 2022-02-02 23:39:30.913 [DEBUG]	(client)	<conn.go:186>	rtc connection updated	{"state": "closed"}
pc INFO: 2022/02/02 23:39:30 ICE connection state changed: closed
    t.go:56: 2022-02-02 23:39:30.913 [DEBUG]	(client)	<conn.go:572>	waiting for ice connection close...
    t.go:56: 2022-02-02 23:39:30.913 [DEBUG]	(client)	<channel.go:268>	datachannel closing with error	{"id": 1, "label": "echo"} ...
        "error": connection was closed:
                     github.com/<org>/<repo>/g/peer.init
                         /home/runner/work/<org>/<repo>/g/peer/conn.go:29
                   - EOF
    t.go:56: 2022-02-02 23:39:30.913 [DEBUG]	(client)	<conn.go:148>	ice connection state updated	{"state": "closed"}
    t.go:56: 2022-02-02 23:39:30.914 [DEBUG]	(client)	<channel.go:268>	datachannel closing with error	{"id": null, "label": "hello"} ...
        "error": connection was closed:
                     github.com/<org>/<repo>/g/peer.init
                         /home/runner/work/<org>/<repo>/g/peer/conn.go:29
                   - EOF
    t.go:56: 2022-02-02 23:39:30.914 [DEBUG]	(client)	<conn.go:581>	closed

Thanks again for your help & work on this library!

@Sean-Der
Copy link
Member

Sean-Der commented Feb 3, 2022

Hey @bryphe-coder

10% is really high, that's no good! I have seen two DTLS issues come up lately. If you are running a ICE Lite Offerer this fixed things where DTLS would never connect. I would expect things to never work though.

this fixes connections with very high RTT, it doesn't sound like this either.

Would you be able to capture a pcap from both sides?

@boks1971
Copy link
Contributor

boks1971 commented Feb 3, 2022

@bryphe-coder If you can also grab the SDP from both sides in addition to pcap Sean is requesting, that would be great. Will be good to cross check that the SDP setup line is not causing a deadlock.

@kylecarbs
Copy link
Member

kylecarbs commented Feb 3, 2022

I added logging for the SDP (ctrl+f "offer" and "answer). It's difficult for us to grab a pcap, because we can only get this to occur in our CI where it's really slow to run compared to local.

I put the full log output in a gist.

@bryphe-coder and I will try to grab a pcap too!

@Sean-Der the second RTT issue sounds close, because this only occurs when connections are exchanged extremely slowly.

@kylecarbs
Copy link
Member

Based on consistent failed output from multiple runs:

2022-02-02T23:39:37.1020751Z dtls TRACE: 23:39:00.909762 handshaker.go:166: [handshake:client] Flight 1: Waiting
2022-02-02T23:39:37.1021128Z pc WARNING: 2022/02/02 23:39:00 Failed to start manager: ICE connection not started
2022-02-02T23:39:37.1021486Z pc WARNING: 2022/02/02 23:39:00 Failed to start SCTP: DTLS not established
2022-02-02T23:39:37.1021929Z pc WARNING: 2022/02/02 23:39:00 undeclaredMediaProcessor failed to open SrtpSession: the DTLS transport has not started yet
2022-02-02T23:39:37.1022426Z pc INFO: 2022/02/02 23:39:00 ICE connection state changed: connected
2022-02-02T23:39:37.1023152Z     t.go:56: 2022-02-02 23:39:00.910 [DEBUG]	(client)	<conn.go:148>	ice connection state updated	{"state": "connected"}

This occurs because the ICE connection is still in a new state:

pc.log.Warnf("Failed to start manager: %s", err)

Updating the ICE state occurs async:

webrtc/icetransport.go

Lines 101 to 106 in 8796a5c

if err := agent.OnConnectionStateChange(func(iceState ice.ConnectionState) {
state := newICETransportStateFromICE(iceState)
t.setState(state)
t.onConnectionStateChange(state)
}); err != nil {

I'm not sure how the timing works here, but could we remove the check for New ICE state?

webrtc/dtlstransport.go

Lines 284 to 286 in 8796a5c

if err := t.ensureICEConn(); err != nil {
return DTLSRole(0), nil, err
}

webrtc/dtlstransport.go

Lines 448 to 454 in 8796a5c

func (t *DTLSTransport) ensureICEConn() error {
if t.iceTransport == nil || t.iceTransport.State() == ICETransportStateNew {
return errICEConnectionNotStarted
}
return nil
}

@Sean-Der
Copy link
Member

Sean-Der commented Feb 3, 2022

I am in support of that change @kylecarbs! It looks like the check was added in 736e0bb and not for a specific reason.

@Sean-Der
Copy link
Member

Sean-Der commented Feb 3, 2022

I ran this locally and all the tests passed.

The change also makes sense to me from a ORTC stand point. I should be able to start a new DTLSTransport over an existing ICETransport.

Want to open a PR with this commit @bryphe-coder would love to have you in the AUTHORS.txt and get you involved with Pion :)

diff --git a/dtlstransport.go b/dtlstransport.go
index de078d6..109513b 100644
--- a/dtlstransport.go
+++ b/dtlstransport.go
@@ -446,7 +446,7 @@ func (t *DTLSTransport) validateFingerPrint(remoteCert *x509.Certificate) error
 }

 func (t *DTLSTransport) ensureICEConn() error {
-       if t.iceTransport == nil || t.iceTransport.State() == ICETransportStateNew {
+       if t.iceTransport == nil {
                return errICEConnectionNotStarted
        }

@bryphe-coder
Copy link
Contributor Author

bryphe-coder commented Feb 3, 2022

More than happy to push up the change! Although @Sean-Der and @kylecarbs , you're the ones that deserve the credit 😄

Really appreciate the help!

Sean-Der pushed a commit to bryphe-coder/webrtc that referenced this issue Feb 3, 2022
DTLS should be able to be negotiated over an already established
ICETransport

Resolves pion#2113
Sean-Der pushed a commit to bryphe-coder/webrtc that referenced this issue Feb 3, 2022
DTLS should be able to be negotiated over an already established
ICETransport

Resolves pion#2113
Sean-Der pushed a commit that referenced this issue Feb 3, 2022
DTLS should be able to be negotiated over an already established
ICETransport

Resolves #2113
boks1971 pushed a commit that referenced this issue Feb 9, 2022
DTLS should be able to be negotiated over an already established
ICETransport

Resolves #2113
daonb pushed a commit to daonb/webrtc that referenced this issue Nov 20, 2022
DTLS should be able to be negotiated over an already established
ICETransport

Resolves pion#2113
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

4 participants