Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🩹 Add http2 transport configuration to work around http2 upstream bug on tcp drop #29

Merged
merged 3 commits into from
Sep 26, 2023

Conversation

serbrech
Copy link
Member

@serbrech serbrech commented Sep 26, 2023

TL;DR:

When a TCP connection is dropped, there is a bug in the http2 implementation in go that misses to mark the connection as failed. as a result, the connection stays in the pool, and the calls using that connection hit a context deadline.
The connection can stay in the pool fur up to 17 min (with default linux kernel settings).

This configures the transport to enable ReadIdleTimeout and Ping Timeout.
internaslly, it adds the h2 middleware on the transport which will issue http2 ping frames on the connections when no reads are occuring. if the server does not answer the ping (for example, the tcp connection was dropped), the connection is effectively closed, working around the golang upstream issue.
on the next call, a connection will be re-opened, and no context.Timeout is experienced at the app level.

for more details, see :

other context:

We have experienced sporadic context deadline on cosmosdb operations that we attributed to server side issues.
I think this might be the actual root cause, as Julien Strohecker experienced this issue using cosmosdb sdk track 2 in a use case with higher scale than ours.

This same fix is applied in our service already.

@serbrech serbrech changed the title add http2 connection configuration to handle tcp drop 🩹 Add http2 transport configuration to work around http2 upstream bug on tcp drop Sep 26, 2023
@codecov-commenter
Copy link

codecov-commenter commented Sep 26, 2023

Codecov Report

Merging #29 (63da122) into main (3d961b8) will increase coverage by 1.63%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #29      +/-   ##
==========================================
+ Coverage   81.77%   83.40%   +1.63%     
==========================================
  Files           4        4              
  Lines         203      223      +20     
==========================================
+ Hits          166      186      +20     
  Misses         32       32              
  Partials        5        5              
Files Coverage Δ
pkg/middleware/default_http_client.go 100.00% <100.00%> (ø)

Copy link
Contributor

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀🚀🚀🚀🚀

Thanks for contributing lets make this a good holiday project

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants