Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add envoy SO_KEEPALIVE settings #71

Merged
merged 1 commit into from
Jul 18, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/deployment-operations/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,8 @@ data:
lb_policy: CLUSTER_PROVIDED
# assume http2 in upstream
http2_protocol_options: {}
upstream_connection_options:
tcp_keepalive: {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will default to the OS settings for the TCP keepalive configuration, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

@tillrohrmann tillrohrmann Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long would the keep alive timeout be? I found that by default it only kicks in after 2 hours. Won't that mean that we'll run into the TCP retransmission timeout before (after ~ 15 minutes) and therefore, the keepalive timeout won't have any effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is the default - but bear in mind that a tcp connection is fairly likely to be open for many hours before a failure occurs, so we would expect keepalives every 75s for most conns

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand things correctly, then the keep alive will start after 2h of idleness. Hence, whenever there is a package arriving it will be reset. Therefore, I believe that the default settings won't help here and if we recommend enabling upstream tcp keepalive with envoy, then we should set concrete values here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly related question: Francesco concluded on restatedev/restate#626 to use HTTP/2 keep alives instead TCP keep alives. He also mentioned that Envoy should be able to do the same. Should we rather recommend using this mechanism instead of TCP keep alive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, sounds good to me, ill revert this and implement http2

cluster_type:
name: envoy.clusters.dynamic_forward_proxy
typed_config:
Expand Down