Only connect to new nodes on new cluster state #31547

DaveCTurner · 2018-06-24T19:30:31Z

Today, when a new cluster state is committed we attempt to connect to all of
its nodes as part of the application process. This is the right thing to do
with new nodes, and is a no-op on any already-connected nodes, but is
questionable on known nodes from which we are currently disconnected: there is
a risk that we are partitioned from these nodes so that any attempt to connect
to them will hang until it times out. This can dramatically slow down the
application of new cluster states which hinders the recovery of the cluster
during certain kinds of partition.

If nodes are disconnected from the master then it is likely that they are to be
removed as part of a subsequent cluster state update, so there's no need to try
and reconnect to them like this. Moreover there is no need to attempt to
reconnect to disconnected nodes as part of the cluster state application
process, because we periodically try and reconnect to any disconnected nodes,
and handle their disconnectedness gracefully in the meantime.

This commit alters this behaviour to avoid reconnecting to known nodes during
cluster state application.

Resolves #29025.

Today, when a new cluster state is committed we attempt to connect to all of its nodes as part of the application process. This is the right thing to do with new nodes, and is a no-op on any already-connected nodes, but is questionable on known nodes from which we are currently disconnected: there is a risk that we are partitioned from these nodes so that any attempt to connect to them will hang until it times out. This can dramatically slow down the application of new cluster states which hinders the recovery of the cluster during certain kinds of partition. If nodes are disconnected from the master then it is likely that they are to be removed as part of a subsequent cluster state update, so there's no need to try and reconnect to them like this. Moreover there is no need to attempt to reconnect to disconnected nodes as part of the cluster state application process, because we periodically try and reconnect to any disconnected nodes, and handle their disconnectedness gracefully in the meantime. This commit alters this behaviour to avoid reconnecting to known nodes during cluster state application. Resolves elastic#29025.

elasticmachine · 2018-06-24T19:30:33Z

Pinging @elastic/es-core-infra

elasticmachine · 2018-06-24T19:30:33Z

Pinging @elastic/es-distributed

DaveCTurner

I think this is a worthwhile improvement on the current situation, and it's a simple change, but I think it's worth revisiting in future since there's still room for improvement IMO: if the ConnectionChecker happens to be running when a partition is detected then it can block the cluster state update thread from getting hold of the nodeLocks that it needs for an extended period of time. Reducing the connection timeout (#29022) would help a bit here.

bleskes

LGTM

bleskes · 2018-06-25T09:02:44Z

server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java

@@ -83,8 +83,7 @@ public void connectToNodes(DiscoveryNodes discoveryNodes) {
        for (final DiscoveryNode node : discoveryNodes) {
            final boolean connected;
            try (Releasable ignored = nodeLocks.acquire(node)) {
-                nodes.putIfAbsent(node, 0);
-                connected = transportService.nodeConnected(node);
+                connected = (nodes.putIfAbsent(node, 0) != null);


can we avoid going on the management thread if we are truly connected? This will be relevant on the master where where the node is connected to during join validation. Also, I think the name connected for the variable is a bit mislead now. Maybe flip it and call it shouldConnect?

It will also good to have a comment explaining the rational for this check.

PS - as discussed, it will be good to explore using this component on non-elected-master nodes which will make my condition tweak obsolete.

…d-nodes

DaveCTurner · 2018-06-26T07:19:40Z

@ywelsch as you predicted, testAckedIndexing failed during the validation phase after a number of runs. I pushed 9ea4166.

DaveCTurner · 2018-06-27T07:06:20Z

@ywelsch I've run the full integ test suite overnight (162 times) on 8f977ff with no failures. WDYT?

ywelsch

I'm a bit on the fence for this change. Yes, it will help in the situation described, but I wonder what adversarial effects it will have after a network partition (We work around these adversarial effects in our tests by explicitly reconnecting the nodes). I wonder if we should force reconnectToKnownNodes whenever the node applies a cluster state from a fresh master. Yes, this will slow down publishing a little, but only the first cluster state that's received from the new master, not subsequent ones.

ywelsch · 2018-06-27T09:01:22Z

server/src/test/java/org/elasticsearch/discovery/ClusterDisruptionIT.java

@@ -207,7 +208,7 @@ public void testAckedIndexing() throws Exception {
                                assertTrue("doc [" + id + "] indexed via node [" + ackedDocs.get(id) + "] not found",
                                    client(node).prepareGet("test", "type", id).setPreference("_local").get().isExists());
                            }
-                        } catch (AssertionError | NoShardAvailableActionException e) {
+                        } catch (AssertionError | NoShardAvailableActionException | NodeNotConnectedException e) {


Is this change still required?

DaveCTurner · 2018-07-12T08:20:29Z

I've decided to close this. It mostly helps, but with this approach there's still a chance of blocking a cluster state update on a connection to a blackhole because of the node locks, and I think it'd be better for future debugging/analysis if this wasn't based on luck.

Today, when applying new cluster state we attempt to connect to all of its nodes as a blocking part of the application process. This is the right thing to do with new nodes, and is a no-op on any already-connected nodes, but is questionable on known nodes from which we are currently disconnected: there is a risk that we are partitioned from these nodes so that any attempt to connect to them will hang until it times out. This can dramatically slow down the application of new cluster states which hinders the recovery of the cluster during certain kinds of partition. If nodes are disconnected from the master then it is likely that they are to be removed as part of a subsequent cluster state update, so there's no need to try and reconnect to them like this. Moreover there is no need to attempt to reconnect to disconnected nodes as part of the cluster state application process, because we periodically try and reconnect to any disconnected nodes, and handle their disconnectedness reasonably gracefully in the meantime. This commit alters this behaviour to avoid reconnecting to known nodes during cluster state application. Resolves elastic#29025. Supersedes elastic#31547.

DaveCTurner added >enhancement :Distributed/Network Http and internode communication implementations v7.0.0 :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v6.4.0 labels Jun 24, 2018

DaveCTurner requested review from bleskes and ywelsch June 24, 2018 19:30

DaveCTurner commented Jun 24, 2018

View reviewed changes

bleskes approved these changes Jun 25, 2018

View reviewed changes

DaveCTurner added 4 commits June 25, 2018 11:04

Flip 'connected' to 'shouldConnect' boolean

f485752

Also check that we're not already connected

342cb51

Merge branch 'master' into 2018-06-24-do-not-reconnect-to-disconnecte…

d1e43c6

…d-nodes

Retry validation on NodeNotConnectedException

9ea4166

Allow NetworkDisruption to reconnect to known nodes

8f977ff

ywelsch reviewed Jun 27, 2018

View reviewed changes

DaveCTurner closed this Jul 12, 2018

lcawl added the >non-issue label Aug 22, 2018

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

DaveCTurner deleted the 2018-06-24-do-not-reconnect-to-disconnected-nodes branch March 3, 2019 11:27

DaveCTurner mentioned this pull request Mar 4, 2019

Only connect to new nodes on new cluster state #39629

Merged

andreykaipov mentioned this pull request Mar 13, 2019

Slow re-election when elected master pod is deleted elastic/helm-charts#63

Closed

ywelsch removed v6.4.0 v7.0.0-beta1 labels Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only connect to new nodes on new cluster state #31547

Only connect to new nodes on new cluster state #31547

DaveCTurner commented Jun 24, 2018

elasticmachine commented Jun 24, 2018

elasticmachine commented Jun 24, 2018

DaveCTurner left a comment •

edited

Loading

bleskes left a comment

bleskes Jun 25, 2018

DaveCTurner commented Jun 26, 2018

DaveCTurner commented Jun 27, 2018

ywelsch left a comment

ywelsch Jun 27, 2018

DaveCTurner commented Jul 12, 2018

Only connect to new nodes on new cluster state #31547

Only connect to new nodes on new cluster state #31547

Conversation

DaveCTurner commented Jun 24, 2018

elasticmachine commented Jun 24, 2018

elasticmachine commented Jun 24, 2018

DaveCTurner left a comment • edited Loading

Choose a reason for hiding this comment

bleskes left a comment

Choose a reason for hiding this comment

bleskes Jun 25, 2018

Choose a reason for hiding this comment

DaveCTurner commented Jun 26, 2018

DaveCTurner commented Jun 27, 2018

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Jun 27, 2018

Choose a reason for hiding this comment

DaveCTurner commented Jul 12, 2018

DaveCTurner left a comment •

edited

Loading