Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Is there any graceful way to prevent running out of http2 stream id in Dubbo3.x #14098

Open
4 tasks done
PaulTan94 opened this issue Apr 17, 2024 · 13 comments · May be fixed by #14209
Open
4 tasks done

[Bug] Is there any graceful way to prevent running out of http2 stream id in Dubbo3.x #14098

PaulTan94 opened this issue Apr 17, 2024 · 13 comments · May be fixed by #14209
Labels
component/need-triage Need maintainers to triage type/need-triage Need maintainers to triage

Comments

@PaulTan94
Copy link

Pre-check

  • I am sure that all the content I provide is in English.

Search before asking

  • I had searched in the issues and found no similar issues.

Apache Dubbo Component

Java SDK (apache/dubbo)

Dubbo Version

Dubbo 3.x

Steps to reproduce this issue

Based on the discussion in #13731, I think it is an improper way to make a reconnection triggered by GoAwayFrame(stream id exhausted) cause there will be 100% request failure before the new connection being active and it is unacceptable for production scenario. To make sure the connection is consistently available, is it possible to make a new connection before the stream id is about to be exhausted?

What you expected to happen

  1. Make a new connection before the stream id is about to be exhausted.

Anything else

No response

Are you willing to submit a pull request to fix on your own?

  • Yes I am willing to submit a pull request on my own!

Code of Conduct

@PaulTan94 PaulTan94 added component/need-triage Need maintainers to triage type/need-triage Need maintainers to triage labels Apr 17, 2024
@AlbumenJ
Copy link
Member

AlbumenJ commented May 8, 2024

@EarthChen @icodening PTAL

BTW, I also want to know if there is a good way

@icodening icodening linked a pull request May 18, 2024 that will close this issue
8 tasks
@icodening
Copy link
Contributor

I think should be reconnect when throw Http2NoMoreStreamIdsException, and the message will be suspend until stream is create success.

@oxsean
Copy link
Collaborator

oxsean commented May 18, 2024

Is it possible to proactively reconnect instead of waiting for an exception to occur?

@FoghostCn
Copy link
Contributor

FoghostCn commented May 19, 2024

According to https://datatracker.ietf.org/doc/rfc9113/ 5.1.1

Stream identifiers cannot be reused. Long-lived connections can result in an endpoint exhausting the available range of stream identifiers. A client that is unable to establish a new stream identifier can establish a new connection for new streams. A server that is unable to establish a new stream identifier can send a GOAWAY frame so that the client is forced to open a new connection for new streams.

May be Http2NoMoreStreamIdsException is not required, we should reconnect automatically

@Chenjp
Copy link
Contributor

Chenjp commented Jun 6, 2024

When stream exhaust occur, Server send GOAWAY frame to prevent client-endpoint to open a new stream with higher stream id.

GOAWAY is a part of http/2 error-handling. We may handle it as HTTP/2 transportation layer error.

Currently TripleGoAwayHandler#channelRead, NettyConnectionHandler#onGoAway, NettyConnectionClient#onGoaway implements close-and-reconnect on connection layer.

Suggest to raise RpcException with code NETWORK_EXCEPTION to trigger non-biz RPC call retry mechansim.

@Chenjp
Copy link
Contributor

Chenjp commented Jun 21, 2024

@PaulTan94 @oxsean @icodening @AlbumenJ I have an idea to handle GoAway gracefully, but need a testcase to cover. It's difficult to reproduce the scenario. Can you help?

@PaulTan94
Copy link
Author

@PaulTan94 @oxsean @icodening @AlbumenJ I have an idea to handle GoAway gracefully, but need a testcase to cover. It's difficult to reproduce the scenario. Can you help?

Hi, there is a tricky way to control the stream id step in Netty

  1. Copy the io.netty.handler.codec.http2.DefaultHttp2Connection.java file.
  2. Change the default step(+2) in 'incrementAndGetNextStreamId' and 'incrementExpectedStreamId' method.
  3. Put the changed java file under your project's classpath, ClassLoader will premarily load the class under current project's classpath.

@PaulTan94
Copy link
Author

PaulTan94 commented Jun 23, 2024

According to https://datatracker.ietf.org/doc/rfc9113/ 5.1.1

Stream identifiers cannot be reused. Long-lived connections can result in an endpoint exhausting the available range of stream identifiers. A client that is unable to establish a new stream identifier can establish a new connection for new streams. A server that is unable to establish a new stream identifier can send a GOAWAY frame so that the client is forced to open a new connection for new streams.

May be Http2NoMoreStreamIdsException is not required, we should reconnect automatically

The way mentioned in RFC9113 is mostly for Web scenario, right? For RPC scenario, it is much more proper to start a reconnection proactively instead of waiting for Http2NoMoreStreamIdsException.

@PaulTan94
Copy link
Author

I hava an idea of implementing a handler in Netty handler pipeline, which will monitor the latest stream id and trigger a channel switching (without traffic loss) once the stream id is about to reach the limit. I am not sure that is the right way, but anyway I will post a PR later.

@icodening
Copy link
Contributor

@PaulTan94 @oxsean @icodening @AlbumenJ I have an idea to handle GoAway gracefully, but need a testcase to cover. It's difficult to reproduce the scenario. Can you help?

Hi, there is a tricky way to control the stream id step in Netty

  1. Copy the io.netty.handler.codec.http2.DefaultHttp2Connection.java file.
  2. Change the default step(+2) in 'incrementAndGetNextStreamId' and 'incrementExpectedStreamId' method.
  3. Put the changed java file under your project's classpath, ClassLoader will premarily load the class under current project's classpath.

It is not recommended to do this as the order of class loading cannot be guaranteed

@Chenjp
Copy link
Contributor

Chenjp commented Jul 3, 2024

GoAway failure (caused by stream id exhaustion) occurs in one of billions probability, we need have a trade-off between fix-cost with introduce complexity and one-of-billion impact.
I think NO NEED TO FIX IT.
If you expect a 100% reliable system, it's better to deploy multiple service provider nodes and enable retry time configuration for service consumer.

@Chenjp
Copy link
Contributor

Chenjp commented Aug 5, 2024

No news is good news. @AlbumenJ @PaulTan94 PTAL

@oxsean
Copy link
Collaborator

oxsean commented Aug 5, 2024

I think it's necessary to reconnecting, and this shouldn't cost much @Chenjp

I think should be reconnect when throw Http2NoMoreStreamIdsException, and the message will be suspend until stream is create success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/need-triage Need maintainers to triage type/need-triage Need maintainers to triage
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

6 participants