Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core,netty: client sends rst stream when server half-closes #4222

Merged
merged 4 commits into from
Mar 22, 2018

Conversation

ericgribkoff
Copy link
Contributor

This addresses #4202. I tried to keep the changes minimal, to allow for easily backporting this to earlier releases if necessary.

Currently, if a server half-closes a stream with a Netty client, both server and client will hold onto the stream resources until (if ever) the client calls halfClose(). The same problem does not occur with an OkHttp client, as OkHttpClientTransport.TransportState#onEndOfStream triggers deletion of the stream reference on the client and sends a RST_STREAM to the server, which lets the Netty server drop the stream resources as well.

This PR changes the Netty client behavior to be similar to that of OkHttp clients: when end of stream is received from the server, the client sends a RST_STREAM if it has not already half-closed its side of the stream. OkHttp currently waits to do this until the deframer closes, which does not seem to be necessary: anything the client sends at this point will be ignored by the server, so we might as well close the stream as early as possible. This PR doesn't change the OkHttp behavior; that can be in a follow-up PR.

Sending the reset stream releases the stream resources in Netty due to triggering Http2ConnectionHandler#processRstStreamWriteResult, which ends up invoking DefaultHttp2Connection.ActiveStreams#deactivate, which removes the stream from the list of active streams. Without sending the reset stream, all of the client's streams remain active and a reference is retained to each.

We also almost certainly want to close the framer, if it isn't already closed, when the end of stream is seen from the server (again, this is left to a follow-up PR). Currently, the framer is only ever closed if the client invokes halfClose() on the stream. For both Netty and OkHttp, the client may continue to invoke ClientCall#sendMessage() even after the server half-closes, and the message goes all the way through the framer, even being reported as outbound bytes to the stats tracers, before ultimately being stopped at the transport layer before going out on the now-dead stream: for Netty, this happens in DefaultHttp2ConnectionEncoder and for OkHttp it happens in OutboundFlowController, but it would be much cleaner and more efficient to close the framer as soon as we know the stream was closed by the server.

There is likely still follow-up work to do on the server side to drop stream resources even when the client does not send the reset stream; this PR only changes client-side behavior.


CancelClientStreamCommand(NettyClientStream.TransportState stream, Status reason) {
this.stream = Preconditions.checkNotNull(stream, "stream");
Preconditions.checkNotNull(reason, "reason");
Preconditions.checkArgument(!reason.isOk(), "Should not cancel with OK status");
Preconditions.checkArgument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im a little uncomfortable with using null here. I think a sentinel value like Status.CANCELED.withDescription("early trailers") would be a safer approach. Anything else that accesses the reason may expect it to be non null. Also, most of gRPC makes the status non null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of null here is to be able to send RST_STREAM to the server without invoking transportReportStatus (again) in NettyClientHandler#cancelStream. This field is accessed in exactly one place (NettyClientHandler#cancelStream, which should be the only thing parsing these commands anyways?), and it checks for null to decide whether to report the status to the transport. I could accomplish the same by adding a RstStreamCommand (and had this in an earlier draft of this PR), but it would be functionally almost identical to CancelClientStreamCommand so seemed like unnecessary redundancy.

Copy link
Contributor

@carl-mastrangelo carl-mastrangelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ejona86 ejona86 added the TODO:release blocker Issue/PR is important enough to delay the release. Removed after release issues resolved label Mar 21, 2018
@ejona86
Copy link
Member

ejona86 commented Mar 21, 2018

We also almost certainly want to close the framer, if it isn't already closed, when the end of stream is seen from the server

This is actually the intended behavior. To do something smarter needs synchronization and only really helps clients who aren't paying attention. This case is more common for cancel(), and we don't bother there. Yes we waste some cycles, but I don't think there's an actual problem.

@@ -293,6 +298,9 @@ public void deframeFailed(Throwable cause) {

void transportHeadersReceived(Http2Headers headers, boolean endOfStream) {
if (endOfStream) {
if (!isOutboundClosed()) {
handler.getWriteQueue().enqueue(new CancelClientStreamCommand(this, null), true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do this after the transportTrailersReceived() call, then there should be no need to avoid the second call to transportReportStatus; we call transportReportStatus lots of times on failures and that's okay. It gets too complicated trying to avoid it.

The Status used here should probably make it obvious that it shouldn't be seen though. Maybe: Status.INTERNAL.withDescription("Cancelled stream after server closed. The server's status should be reported instead of this one")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoot. I am wrong. The second call to transportReportStatus() will have stopDelivery=true. The call from transportTrailersReceived() will have stopDelivery=false. Let's put it back to the way you had it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -280,6 +280,11 @@ public void runOnTransportThread(final Runnable r) {
}
}

@Override
public void deframerClosed(boolean hasPartialMessageIgnored) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -175,8 +174,8 @@ public final void deliverFrame(

@Override
public final void halfClose() {
if (!outboundClosed) {
outboundClosed = true;
if (!transportState().isOutboundClosed()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather this conditional remain like it was (non-volatile). As it is written here, it seems like some other code path could setOutboundClosed(). However, it actually appears to be an unnecessary condition (halfClose is only called once), so let's instead leave it as you have it and do a follow-up PR to remove the condition.

@ericgribkoff ericgribkoff merged commit 5337728 into grpc:master Mar 22, 2018
@ericgribkoff ericgribkoff deleted the server_close_bug branch March 22, 2018 19:38
@ejona86 ejona86 removed the TODO:release blocker Issue/PR is important enough to delay the release. Removed after release issues resolved label Mar 27, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants