-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][io] Update Elasticsearch sink idle cnx timeout to 30s #19377
[fix][io] Update Elasticsearch sink idle cnx timeout to 30s #19377
Conversation
/pulsarbot rerun-failure-checks |
) | ||
private int connectionIdleTimeoutInMs = 5; | ||
private int connectionIdleTimeoutInMs = 30000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this config should be validated against bulkFlushIntervalInMs when bulk API is enabled - something like connectionIdleTimeoutInMs > 2 * bulkFlushIntervalInMs
because it seems the connection will set idle by design in-between flushes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be fine to let the connection get closed in that scenario. My main goal here is to make sure we have working defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I understand both comments are probably outside scope of this PR: First step is to have working defaults, and later maybe make them foolproof...
@@ -83,7 +83,6 @@ public RestClient(ElasticSearchConfig elasticSearchConfig, BulkProcessor.Listene | |||
// idle+expired connection evictor thread | |||
this.executorService = Executors.newSingleThreadScheduledExecutor(); | |||
this.executorService.scheduleAtFixedRate(() -> { | |||
configCallback.connectionManager.closeExpiredConnections(); | |||
configCallback.connectionManager.closeIdleConnections( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Is it required at all to evict idle connections? I wonder what's wrong with long lived connection that has a life cycle coupled with that of the sink instance. If it is not required, we could drop the connectionIdleTimeoutInMs
for good but I maybe missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. The motivation of this PR assumes that closing these connections is necessary, however, I am not sure that it is. The fundamental risk is that something between the client and the server closes the connection. In my mind, the canonical example is a network load balancer with a 4 or 5 minute timeout.
Closing expired and idle connections is one solution to prevent such errors due to inactivity.
While troubleshooting the underlying behavior this PR aims to fix, I came across elastic/elasticsearch#65213, which indicates that an alternative solution is to enable socket keepalives and to decrease the net.ipv4.tcp_keepalive_time
in order to make sure those keepalives are sent before any intermediate server closes the connection due to inactivity. Since that solution requires modifying OS settings, I think this solution might be easier to maintain, even though it'll be less efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After re-reading that elasticsearch issue, it could be reasonable to move in the direction of enabling tcp keep-alives. At the very least, I think we should merge this and fix the existing default values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since that solution requires modifying OS settings
@michaeljmarshall On managed cloud k8s environments, the OS settings are already properly tuned. related comment: #14841 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhotari - that was not the case in the AKS cluster that I was testing with as of yesterday. When I tried to override the settings using https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#setting-sysctls-for-a-pod, I got an error because overriding net.ipv4.tcp_keepalive_time = 300
is considered "unsafe" by default.
(cherry picked from commit 1481c74)
(cherry picked from commit 1481c74)
Motivation
The current Elasticsearch sink has a setting named
connectionIdleTimeoutInMs
. This setting closes expired and idle http connections opened by the Elasticsearch Rest Client. The current default is 5 milliseconds. With this value, I observed connection issues both for timeouts trying to connect to elasticsearch and errors like "connection reset by peer". When overriding the default to 5 seconds, all errors disappeared. I propose we change it to 30 seconds since even 5 seconds is a little low.Modifications
connectionIdleTimeoutInMs
from 5 millis to 30000 milliscloseExpiredConnections
. This call is unnecessary because thecloseIdleConnections
also closes expired connections. Based on looking at the source code, we iterate over connections for each method call, so relying on thecloseIdleConnections
lets us iterate over connections once instead of twice. Source: https://hc.apache.org/httpcomponents-asyncclient-4.1.x/current/httpasyncclient/apidocs/org/apache/http/nio/conn/NHttpClientConnectionManager.html#closeIdleConnections(long,%20java.util.concurrent.TimeUnit)Verifying this change
This is a trivial change.
Does this pull request potentially affect one of the following parts:
This PR changes a default value for a sink. The current default is problematic.
Documentation
doc-not-needed
We generate the docs for this sink, so we don't need to update any site docs. We should include this update on the release notes.
Matching PR in forked repository
PR in forked repository: michaeljmarshall#22