Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUIC] Fix rooted connection when hanshake failed #87328

Merged
merged 7 commits into from
Jun 14, 2023

Conversation

ManickaP
Copy link
Member

@ManickaP ManickaP commented Jun 9, 2023

Fixes #87291

Added tests cover the exact same scenario as happened in ASP. Tested before and after the change to confirm the fix. Also tested with ASP test by replacing System.Net.Quic.dll in shared runtime by the tests - it passes after the change.

The fix is in QuicConnection.cs:501, which will complete and thus unroot anything stored in _connectedTcs. Note that event SHUTDOWN_COMPLETE happens for all msquic objects and is awaited by DisposeAsync which is called in all failed connection attempts (client and server).

I also checked the code for other task sources that could cause a similar issue and made sure they do get always completed in SHUTDOWN_COMPLETE.

Minor stuff sneaked in:
While at it, fixed and cleaned up logging - the logs contained bogus event names and I had this stashed elsewhere.
Also fixed some small typos and misnamed aliases which I also had stashed elsewhere.

cc: @amcasey

@ghost
Copy link

ghost commented Jun 9, 2023

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #87291

Added tests cover the exact same scenario as happened ASP. Tested before after change to confirm the fix. Also tested with ASP test by replacing System.Net.Quic.dll in shared runtime by the tests - it passed after the change.

The fix is in QuicConnection.cs:501, which will complete and thus unroot anything stored in _connectedTcs. Note that event SHUTDOWN_COMPLETE happens for all msquic objects and is awaited by DisposeAsync which is called in all failed connection attempts (client and server).

I also checked the code for other task sources that could cause a similar issue and made sure they do get always completed in SHUTDOWN_COMPLETE.

Minor stuff sneaked in:
While at it, fixed and cleaned up logging - the logs contained bogus event names and I had this stashed elsewhere.
Also fixed some small typos and misnamed aliases which I also had stashed elsewhere.

Author: ManickaP
Assignees: -
Labels:

area-System.Net.Quic

Milestone: -

Copy link
Member

@BrennanConroy BrennanConroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Debug.Assert(_connectedTcs.IsCompleted); in DisposeAsync()

@wfurt wfurt requested a review from rzikm June 11, 2023 21:19
}

_acceptQueue.Writer.TryComplete(ExceptionDispatchInfo.SetCurrentStackTrace(ThrowHelper.GetOperationAbortedException()));
Exception exception = ExceptionDispatchInfo.SetCurrentStackTrace(ThrowHelper.GetOperationAbortedException());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will allocate exception and stack in all cases right?I would hope we should be able to figure out the connection state and do it only when needed. I'm ok if we do it separately as improvement to unblock asp.net.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocation was there before the change, I'm just re-using the same exception in another place now.
As this happens once per a connection, I'm not overly concerned about the cost. But it would be a nice improvement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment. Where was the exception allocated before for HandleEventShutdownComplete? And I agree that this is not critical since that is once per connection. But walking the stack may note be cheap as allocation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_acceptQueue.Writer.TryComplete(ExceptionDispatchInfo.SetCurrentStackTrace(ThrowHelper.GetOperationAbortedException()));

Copy link
Member

@wfurt wfurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish you left the tracing refactor as separate change. It seems big enough for separate change.

Copy link
Member

@rzikm rzikm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo existing comments

@ManickaP
Copy link
Member Author

The newly added tests were consistently failing on mono, but they didn't trigger the newly added asserts, so there shouldn't be any leaks. As I was not able to debug mono in a reasonable time, I'm excluding them for now.

@marek-safar who could help me with analysis of why is the object still alive (held by WeakReference)? Or is this expected difference between CLR and mono? Is this even worth investigating?

@ManickaP
Copy link
Member Author

nit: Debug.Assert(_connectedTcs.IsCompleted); in DisposeAsync()

Asserts added.

@dotnet dotnet deleted a comment from azure-pipelines bot Jun 13, 2023
@dotnet dotnet deleted a comment from azure-pipelines bot Jun 13, 2023
@ManickaP
Copy link
Member Author

ManickaP commented Jun 13, 2023

outerloop failures: #87359, #63861

@@ -234,6 +235,8 @@ internal unsafe QuicConnection(QUIC_HANDLE* handle, QUIC_NEW_CONNECTION_INFO* in

private async ValueTask FinishConnectAsync(QuicClientConnectionOptions options, CancellationToken cancellationToken = default)
{
ObjectDisposedException.ThrowIf(_disposed == 1, this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I still feel this is dead code and should not be here. But that can be addresses separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was bad merge. I merged some files manually from an older branch and must have missed it. I removed it and looked for other places as well.

@marek-safar
Copy link
Contributor

@marek-safar who could help me with analysis of why is the object still alive (held by WeakReference)? Or is this expected difference between CLR and mono? Is this even worth investigating?

This looks like conservative scanning difference. You can ignore it or rewrite the code to move the weakreferenced code into another method to allow Mono GC collecting but as this is generating async state machine it's probably not worth the effort.

@ManickaP ManickaP merged commit dc8114a into dotnet:main Jun 14, 2023
@ManickaP ManickaP deleted the mapichov/quic-rooted-connection branch June 14, 2023 11:34
@karelz karelz added this to the 8.0.0 milestone Jul 3, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Aug 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QuicConnection not being released after exception
6 participants