Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DTLS client role in long delay connections #424

Merged
merged 1 commit into from
Feb 5, 2022

Conversation

boks1971
Copy link
Contributor

@boks1971 boks1971 commented Feb 2, 2022

Fixes pion/webrtc#2089

When a retranmission from the remote side arrives after
the handshake is complete, the finish routine
puts it back into retransmit loop.

With Chrome, this fails after 15 seconds.
Firefox does not error out though.

@codecov
Copy link

codecov bot commented Feb 2, 2022

Codecov Report

Merging #424 (ad3a0b2) into master (17f86a3) will decrease coverage by 0.09%.
The diff coverage is 100.00%.

❗ Current head ad3a0b2 differs from pull request most recent head b04ff13. Consider uploading reports for the commit b04ff13 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #424      +/-   ##
==========================================
- Coverage   75.98%   75.88%   -0.10%     
==========================================
  Files          91       91              
  Lines        5371     5374       +3     
==========================================
- Hits         4081     4078       -3     
- Misses        975      980       +5     
- Partials      315      316       +1     
Flag Coverage Δ
go 75.91% <100.00%> (-0.10%) ⬇️
wasm 66.92% <100.00%> (+0.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
handshaker.go 75.95% <100.00%> (+0.40%) ⬆️
pkg/crypto/ciphersuite/gcm.go 60.31% <0.00%> (-3.18%) ⬇️
conn.go 81.20% <0.00%> (-0.85%) ⬇️
internal/net/dpipe/dpipe.go 96.66% <0.00%> (+2.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17f86a3...b04ff13. Read the comment docs.

Copy link
Member

@at-wat at-wat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to RFC6347, last flight should be retransmitted even after entering the FINISHED state.
https://datatracker.ietf.org/doc/html/rfc6347#page-22
I think transition to the SENDING state was wrong, but retransmission without loop is still needed.

@boks1971
Copy link
Contributor Author

boks1971 commented Feb 2, 2022

According to RFC6347, last flight should be retransmitted even after entering the FINISHED state. https://datatracker.ietf.org/doc/html/rfc6347#page-22 I think transition to the SENDING state was wrong, but retransmission without loop is still needed.

Thank you so much @at-wat .

In the text below the diagram, there is this text

In addition, for at least twice the default MSL defined for [TCP],
   when in the FINISHED state, the node that transmits the last flight
   (the server in an ordinary handshake or the client in a resumed
   handshake) MUST respond to a retransmit of the peer's last flight



Rescorla & Modadugu          Standards Track                   [Page 23]

RFC 6347                          DTLS                      January 2012


   with a retransmit of the last flight.  This avoids deadlock
   conditions if the last flight gets lost.  This requirement applies to
   DTLS 1.0 as well, and though not explicit in [DTLS1], it was always
   required for the state machine to function correctly.  To see why
   this is necessary, consider what happens in an ordinary handshake if
   the server's Finished message is lost: the server believes the
   handshake is complete but it actually is not.  As the client is
   waiting for the Finished message, the client's retransmit timer will
   fire and it will retransmit the client's Finished message.  This will
   cause the server to respond with its own Finished message, completing
   the handshake.  The same logic applies on the server side for the
   resumed handshake.

This change affects the DTLS Client role path.

The server role path is unaffected. It is only the server that is expected to retransmit the last one from the text above. Am I reading it right?

@at-wat
Copy link
Member

at-wat commented Feb 2, 2022

@boks1971 Your explanation seems right!

It would be better to add test cases that fail on master branch and pass on this branch.
(Maybe based on TestHandshaker?)

@boks1971
Copy link
Contributor Author

boks1971 commented Feb 3, 2022

@boks1971 Your explanation seems right!

It would be better to add test cases that fail on master branch and pass on this branch. (Maybe based on TestHandshaker?)

Will take a look @at-wat . Just started with this repo yesterday. So, it might take me a while to get a handle of how tests are written :-)

Fixes pion/webrtc#2089

When a retranmission from the remote side arrives after
the handshake is complete, the `finish` routine
puts it back into retransmit loop.

With Chrome, this fails after 15 seconds.
Firefox does not error out though.

Testing:
---------
- Tested with Firefox and Chrome with long delay
(500 ms up and down) in network link conditioner.
- Tested the above with no introduced delays too.
- Added test for slow server.
@boks1971
Copy link
Contributor Author

boks1971 commented Feb 3, 2022

@at-wat I have added a couple of tests in TestHandshaker like you suggested to test this path. Ensured that it was failing without the fix and it passes after the fix. Please review when you get a chance.

@at-wat at-wat self-requested a review February 4, 2022 12:22
@boks1971
Copy link
Contributor Author

boks1971 commented Feb 4, 2022

Thank you for looking at this @at-wat . Can you approve this if you are happy with the change and tests?

Copy link
Member

@at-wat at-wat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

It might be nice if we can avoid increasing the test duration, but the test stability would have priority.

@boks1971
Copy link
Contributor Author

boks1971 commented Feb 5, 2022

Thank you @at-wat . I increased the timeout as a precaution. It was not strictly needed. Just wanted to prioritise test stability like you mentioned just in case CI is having some trouble.

@boks1971 boks1971 merged commit 748c25c into master Feb 5, 2022
@boks1971 boks1971 deleted the raja_dtls_long_delay branch February 5, 2022 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Chrome browser DTLS fails after 15 seconds in long delay connections.
2 participants