close quic streams on Dispose #54798

wfurt · 2021-06-27T22:00:33Z

This change has multiple parts:

Primarily this changes Dispose behavior to flush all pending data via gracefully shutdown (contributes to #756).
This can possibly block if the connection is up but receiver would stop reading and we exceed sending window.
We may solve this via shutdown API taking Cancellation token.
This behavior is essential for all failing stream conformance tests.

On reading side we may way for graceful shutdown or rest but that can also take long time and it can possibly lead to DoS. I look and for NetwrkStream we call shutdown for both directions. So I decided to call read abort. In ideal case the application (like H/3) know when to stop reading/writing so peer can gracefully close the stream before Dispose e.g. the dispose is just measure to avoid internal reset with code 0.

With this in place we may not safely cleanup all the handles in Dispose method. To deal with this I added simple atomic state so it can be alternatively deferred to HandleEventShutdownComplete(). I feel there is opportunity to perhaps simplify the logic more but I wanted more testing and experiment while I wanted to get this change above out.
The tricky part is that the state can often change during Dispose when calling shutdown/reset so I use simple state to cleanup on either place.

To make debug easier I set the safe handled to IntPtr.Zero during release so when called twice or after I get nice SafeHandle exception instead of possibly corrupting memory and getting native crash.

We have conformance test expect that stream.CanRead and CanWrite would reflect state of the stream. So they are no longer readonly.

With the change the WaitForAvailable*Stream tests started to fail. It turned out they made incorrect assumption that the limit would increase when the stream is disposed locally. Big thanks to @nibanks who helped me to debug the fact that we need to release it on the other side as well to get predicable behavior. I feel some H/3 tests may have similar issue but I did not dig deeper.

With this all the conformance tests are passing with exception of parallel IO.
I'm still running tests in loop and looking for odd failures.
This may need some more adjustments but I wanted to get it out for some feedback.

ghost · 2021-06-27T22:00:38Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

This change has multiple parts:

Primarily this changes Dispose behavior to flush all pending data via gracefully shutdown (contributes to #756).
This can possibly block if the connection is up but receiver would stop reading and we exceed sending window.
We may solve this via shutdown API taking Cancellation token.
This behavior is essential for all failing stream conformance tests.

On reading side we may way for graceful shutdown or rest but that can also take long time and it can possibly lead to DoS. I look and for NetwrkStream we call shutdown for both directions. So I decided to call read abort. In ideal case the application (like H/3) know when to stop reading/writing so peer can gracefully close the stream before Dispose e.g. the dispose is just measure to avoid internal reset with code 0.

With this in place we may not safely cleanup all the handles in Dispose method. To deal with this I added simple atomic state so it can be alternatively deferred to HandleEventShutdownComplete(). I feel there is opportunity to perhaps simplify the logic more but I wanted more testing and experiment while I wanted to get this change above out.
The tricky part is that the state can often change during Dispose when calling shutdown/reset so I use simple state to cleanup on either place.

To make debug easier I set the safe handled to IntPtr.Zero during release so when called twice or after I get nice SafeHandle exception instead of possibly corrupting memory and getting native crash.

We have conformance test expect that stream.CanRead and CanWrite would reflect state of the stream. So they are no longer readonly.

With the change the WaitForAvailable*Stream tests started to fail. It turned out they made incorrect assumption that the limit would increase when the stream is disposed locally. Big thanks to @nibanks who helped me to debug the fact that we need to release it on the other side as well to get predicable behavior. I feel some H/3 tests may have similar issue but I did not dig deeper.

With this all the conformance tests are passing with exception of parallel IO.
I'm still running tests in loop and looking for odd failures.
This may need some more adjustments but I wanted to get it out for some feedback.

Author:	wfurt
Assignees:	wfurt
Labels:	`area-System.Net.Quic`
Milestone:	-

ManickaP

I'm not fully on board with _canRead/Write change as well as ShutdownDone. To me, it's another state to check on every step. I'd rather combine those in their respective state fields.

I also ran the tests, H/3 still intermittently crashes, and some cookie tests were failing for me. Those are being handled by Geoff AFAIK.

ManickaP · 2021-06-29T17:28:10Z

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicConnection.cs

@@ -70,7 +70,7 @@ internal sealed class State
                SingleWriter = true,
            });

-            public void RemoveStream(MsQuicStream stream)
+            public void RemoveStream(MsQuicStream? stream)


Why are we even passing in the stream? Doesn't seem to be used at all.

I was going back and forth on this one. When debugging it is very nice to have the stream here so one can emit traces from single location. This is basically same for the AddStream.

I'm trying to understand why we need this at all... It seems like we are deferring Dispose of the connection until all streams in use on it are completed. Is that correct? Why do we want to do this? Can we just get rid of this tracking completely?

yes, that is the intend. We saw crashes without it but it can perhaps be done in different way. If so I would do it in separate PR. The reason why this was touches is that the HandleEventShutdownComplete does not have access to the Stream itself so it would pass NULL.
I'm OK removing the parameter completely as it is not use for product or perhaps using it Debug builds only and for example dump the Stream info if we are on path to Assert.

I agree, we should consider for a separate PR.

Mostly I want to make sure we are making an intentional decision about behavior here.

When chatting with @nibanks he sad we would not get HandleEventShutdownComplete with pending streams. So if we delay cleanup in Dispose and if we can relay on that we could perhaps remove to logic. It would be nice to have stress tests up at that time so we can verify the impact.

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicConnection.cs

ManickaP · 2021-06-29T17:31:31Z

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicConnection.cs

@@ -650,6 +650,14 @@ private async Task FlushAcceptQueue()
            _state.AcceptQueue.Writer.TryComplete();
            await foreach (MsQuicStream item in _state.AcceptQueue.Reader.ReadAllAsync().ConfigureAwait(false))
            {
+                if (item.CanRead)
+                {
+                    item.AbortRead(0xffffffff);


This hard-coded error-code is not probably something we would want here, isn't it? All the error codes should come from user app since there aren't any officially defined in QUIC protocol.

Agreed. If necessary we should hard code the appropriate H3 error code here, for now. But ideally this would not be H3 specific.

Do you have suggestion for value and name @geoffkizer? I agree this should preferably not be H3 specific so I picked the MaxValue instead.
Good behaving app would probably shutdown the Listener and accept and dispose all the pending streams.

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicConnection.cs

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs

src/libraries/System.Net.Quic/tests/FunctionalTests/MsQuicTests.cs

...libraries/System.Net.Quic/tests/FunctionalTests/QuicStreamConnectedStreamConformanceTests.cs

geoffkizer · 2021-07-01T18:05:19Z

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicConnection.cs

@@ -252,7 +252,7 @@ private static uint HandleEventShutdownComplete(State state, ref ConnectionEvent
            state.ShutdownTcs.SetResult(MsQuicStatusCodes.Success);

            // Stop accepting new streams.
-            state.AcceptQueue.Writer.Complete();
+            state.AcceptQueue.Writer.TryComplete();


It seems like there are a bunch of places now where we either call Writer.Complete or Writer.TryComplete. Are all of these necessary? I would assume we only have to do this in a limited number of places, e.g. HandleEventShutdownInitiatedByTransport and HandleEventShutdownInitiatedByPeer, and maybe one other for when we initiate shutdown ourselves.

I always get nervous when we do stuff like call TryComplete instead of Complete, because it seems like we don't have a clean handling of exactly when the Complete operation needs to happen. I'd prefer to see the TryComplete calls changed to either Complete or to assert that it's already completed.

In this case I was getting exception when when the connection was already Disposed and that triggered flush of events and ShutdownComplete. I could wrap it with if (_disposed == 0) but changing it to Try look better to me. Since this only means we will not put more items to the queue.

I look at the cases @geoffkizer and it is not that we would not know if the Complete operation should happen. We just don't know if happened before. Like in this case the shutdown may be initiated by peer or somebody may Dispose the connection but than we get this final event and it will throw if we try to complete it again. Unless I missed something I did not find API on the Writer to know if it was completed or not.
One option I can think of you we making this nullable and set it to null after completion. Or we can add yet another state. Let me know what you want me to do.

I think it's fine for this PR, but we should revisit the logic here moving forward.

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs

geoffkizer · 2021-07-01T18:38:32Z

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs

+                }
+
+                // We already got final event.
+                releaseHandles = Interlocked.Exchange(ref _state.ShutdownDone, 1) == 2;


If we already got a final event, then do we actually need to call shutdown/abort below?

The way how it currently work is that ShutdownComplete will not release everything if Dispose was not called. See note bellow. We could change that but then from example Shutdown and/or Rest can throw if handle is gone.

geoffkizer · 2021-07-01T18:41:47Z

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs

@@ -824,6 +899,18 @@ private static uint HandleEventShutdownComplete(State state, ref StreamEvent evt
                state.ShutdownCompletionSource.SetResult();
            }

+            // Dispose was called before complete event.
+            bool releaseHandles = Interlocked.Exchange(ref state.ShutdownDone, 2) == 1;


Isn't it the case that we always get this ShutdownComplete event? So then can't we always release the handles here?

I was thinking about it. However in some cases when can ShutdownComplete without us asking for it .e.g. Disposing. With that, attempt to shutdown/reset will throw. So we would need careful synchronization before (and perhaps during) every operation.

It seems like the handle is only needed for event callbacks, right? I don't think other code should be accessing it. So I don't think there's any real synchronization needed here, is there?

I'm not sure I understand the comment. The cleanup does more than just handle and we also pass the handle in functions like Shutdown().

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs

geoffkizer · 2021-07-01T22:28:55Z

Related: #55044

…uicDispose

wfurt · 2021-07-02T00:22:03Z

I addressed most feedback and left some questions/comments. It would be great if you can take another look @geoffkizer @ManickaP

ManickaP · 2021-07-02T12:46:04Z

I did a run of the current code and neither Http nor Quic errors. 🥳

ManickaP

There are still some unanswered questions.

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs

src/libraries/System.Net.Quic/tests/FunctionalTests/MsQuicTests.cs

...libraries/System.Net.Quic/tests/FunctionalTests/QuicStreamConnectedStreamConformanceTests.cs

ManickaP · 2021-07-06T13:11:44Z

@wfurt please let me know when you'll want me to re-review. I see some open discussion with @geoffkizer so I'm not sure whether you plan to do more changes here.

wfurt · 2021-07-07T18:10:34Z

contributes to #49157

close quic streams on Dispose

7af33bf

wfurt added the area-System.Net.Quic label Jun 27, 2021

wfurt requested review from stephentoub and a team June 27, 2021 22:00

wfurt self-assigned this Jun 27, 2021

ManickaP reviewed Jun 29, 2021

View reviewed changes

Merge branch 'main' into quicDispose

3597c1f

geoffkizer reviewed Jul 1, 2021

View reviewed changes

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs Outdated Show resolved Hide resolved

geoffkizer reviewed Jul 1, 2021

View reviewed changes

src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs Show resolved Hide resolved

wfurt added 3 commits July 1, 2021 15:48

feedback from review

24adc3f

Merge branch 'quicDispose' of https://github.com/wfurt/runtime into q…

53d7f5b

…uicDispose

update ShutdownState handling

26f9d4b

ManickaP reviewed Jul 2, 2021

View reviewed changes

wfurt mentioned this pull request Jul 2, 2021

HTTP/3: QuicStream.ReadAsync ignores abort? #55056

Closed

wfurt added 2 commits July 5, 2021 19:32

feedback from review

7c2b71a

fix failing cancelling test

f107c95

ManickaP approved these changes Jul 6, 2021

View reviewed changes

wfurt mentioned this pull request Jul 7, 2021

add basic support for client certificate to Quic #54302

Merged

geoffkizer approved these changes Jul 7, 2021

View reviewed changes

wfurt merged commit 0d1e9ca into dotnet:main Jul 7, 2021

wfurt deleted the quicDispose branch July 7, 2021 19:25

geoffkizer mentioned this pull request Jul 8, 2021

fix read abort handling and revert CanRead/CanWrite to previous behavior #55341

Merged

karelz added this to the 6.0.0 milestone Jul 15, 2021

ghost locked as resolved and limited conversation to collaborators Aug 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

close quic streams on Dispose #54798

close quic streams on Dispose #54798

wfurt commented Jun 27, 2021

ghost commented Jun 27, 2021

ManickaP left a comment •

edited

Loading

ManickaP Jun 29, 2021

wfurt Jul 1, 2021

geoffkizer Jul 1, 2021

wfurt Jul 1, 2021

geoffkizer Jul 1, 2021

wfurt Jul 1, 2021

ManickaP Jun 29, 2021

geoffkizer Jul 1, 2021

wfurt Jul 1, 2021

geoffkizer Jul 1, 2021

wfurt Jul 1, 2021

wfurt Jul 2, 2021

geoffkizer Jul 3, 2021

geoffkizer Jul 1, 2021

wfurt Jul 1, 2021

geoffkizer Jul 1, 2021

wfurt Jul 1, 2021

geoffkizer Jul 3, 2021

wfurt Jul 6, 2021

geoffkizer commented Jul 1, 2021

wfurt commented Jul 2, 2021

ManickaP commented Jul 2, 2021

ManickaP left a comment

ManickaP commented Jul 6, 2021

wfurt commented Jul 7, 2021

close quic streams on Dispose #54798

close quic streams on Dispose #54798

Conversation

wfurt commented Jun 27, 2021

ghost commented Jun 27, 2021

ManickaP left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geoffkizer commented Jul 1, 2021

wfurt commented Jul 2, 2021

ManickaP commented Jul 2, 2021

ManickaP left a comment

Choose a reason for hiding this comment

ManickaP commented Jul 6, 2021

wfurt commented Jul 7, 2021

ManickaP left a comment •

edited

Loading