Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add envoy SO_KEEPALIVE settings #71

Merged
merged 1 commit into from
Jul 18, 2023
Merged

Add envoy SO_KEEPALIVE settings #71

merged 1 commit into from
Jul 18, 2023

Conversation

jackkleeman
Copy link
Contributor

No description provided.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR @jackkleeman. I had one question for clarification. +1 for merging after resolving it.

@@ -285,6 +285,8 @@ data:
lb_policy: CLUSTER_PROVIDED
# assume http2 in upstream
http2_protocol_options: {}
upstream_connection_options:
tcp_keepalive: {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will default to the OS settings for the TCP keepalive configuration, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

@tillrohrmann tillrohrmann Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long would the keep alive timeout be? I found that by default it only kicks in after 2 hours. Won't that mean that we'll run into the TCP retransmission timeout before (after ~ 15 minutes) and therefore, the keepalive timeout won't have any effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is the default - but bear in mind that a tcp connection is fairly likely to be open for many hours before a failure occurs, so we would expect keepalives every 75s for most conns

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand things correctly, then the keep alive will start after 2h of idleness. Hence, whenever there is a package arriving it will be reset. Therefore, I believe that the default settings won't help here and if we recommend enabling upstream tcp keepalive with envoy, then we should set concrete values here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly related question: Francesco concluded on restatedev/restate#626 to use HTTP/2 keep alives instead TCP keep alives. He also mentioned that Envoy should be able to do the same. Should we rather recommend using this mechanism instead of TCP keep alive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, sounds good to me, ill revert this and implement http2

@jackkleeman jackkleeman merged commit 52265a2 into main Jul 18, 2023
1 check passed
@jackkleeman jackkleeman deleted the envoy-keepalive branch July 18, 2023 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants