Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only connect to new nodes on new cluster state #31547

Conversation

DaveCTurner
Copy link
Contributor

Today, when a new cluster state is committed we attempt to connect to all of
its nodes as part of the application process. This is the right thing to do
with new nodes, and is a no-op on any already-connected nodes, but is
questionable on known nodes from which we are currently disconnected: there is
a risk that we are partitioned from these nodes so that any attempt to connect
to them will hang until it times out. This can dramatically slow down the
application of new cluster states which hinders the recovery of the cluster
during certain kinds of partition.

If nodes are disconnected from the master then it is likely that they are to be
removed as part of a subsequent cluster state update, so there's no need to try
and reconnect to them like this. Moreover there is no need to attempt to
reconnect to disconnected nodes as part of the cluster state application
process, because we periodically try and reconnect to any disconnected nodes,
and handle their disconnectedness gracefully in the meantime.

This commit alters this behaviour to avoid reconnecting to known nodes during
cluster state application.

Resolves #29025.

Today, when a new cluster state is committed we attempt to connect to all of
its nodes as part of the application process. This is the right thing to do
with new nodes, and is a no-op on any already-connected nodes, but is
questionable on known nodes from which we are currently disconnected: there is
a risk that we are partitioned from these nodes so that any attempt to connect
to them will hang until it times out. This can dramatically slow down the
application of new cluster states which hinders the recovery of the cluster
during certain kinds of partition.

If nodes are disconnected from the master then it is likely that they are to be
removed as part of a subsequent cluster state update, so there's no need to try
and reconnect to them like this. Moreover there is no need to attempt to
reconnect to disconnected nodes as part of the cluster state application
process, because we periodically try and reconnect to any disconnected nodes,
and handle their disconnectedness gracefully in the meantime.

This commit alters this behaviour to avoid reconnecting to known nodes during
cluster state application.

Resolves elastic#29025.
@DaveCTurner DaveCTurner added >enhancement :Distributed/Network Http and internode communication implementations v7.0.0 :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v6.4.0 labels Jun 24, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor Author

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a worthwhile improvement on the current situation, and it's a simple change, but I think it's worth revisiting in future since there's still room for improvement IMO: if the ConnectionChecker happens to be running when a partition is detected then it can block the cluster state update thread from getting hold of the nodeLocks that it needs for an extended period of time. Reducing the connection timeout (#29022) would help a bit here.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -83,8 +83,7 @@ public void connectToNodes(DiscoveryNodes discoveryNodes) {
for (final DiscoveryNode node : discoveryNodes) {
final boolean connected;
try (Releasable ignored = nodeLocks.acquire(node)) {
nodes.putIfAbsent(node, 0);
connected = transportService.nodeConnected(node);
connected = (nodes.putIfAbsent(node, 0) != null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid going on the management thread if we are truly connected? This will be relevant on the master where where the node is connected to during join validation. Also, I think the name connected for the variable is a bit mislead now. Maybe flip it and call it shouldConnect?

It will also good to have a comment explaining the rational for this check.

PS - as discussed, it will be good to explore using this component on non-elected-master nodes which will make my condition tweak obsolete.

@DaveCTurner
Copy link
Contributor Author

@ywelsch as you predicted, testAckedIndexing failed during the validation phase after a number of runs. I pushed 9ea4166.

@DaveCTurner
Copy link
Contributor Author

@ywelsch I've run the full integ test suite overnight (162 times) on 8f977ff with no failures. WDYT?

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit on the fence for this change. Yes, it will help in the situation described, but I wonder what adversarial effects it will have after a network partition (We work around these adversarial effects in our tests by explicitly reconnecting the nodes). I wonder if we should force reconnectToKnownNodes whenever the node applies a cluster state from a fresh master. Yes, this will slow down publishing a little, but only the first cluster state that's received from the new master, not subsequent ones.

@@ -207,7 +208,7 @@ public void testAckedIndexing() throws Exception {
assertTrue("doc [" + id + "] indexed via node [" + ackedDocs.get(id) + "] not found",
client(node).prepareGet("test", "type", id).setPreference("_local").get().isExists());
}
} catch (AssertionError | NoShardAvailableActionException e) {
} catch (AssertionError | NoShardAvailableActionException | NodeNotConnectedException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change still required?

@DaveCTurner
Copy link
Contributor Author

I've decided to close this. It mostly helps, but with this approach there's still a chance of blocking a cluster state update on a connection to a blackhole because of the node locks, and I think it'd be better for future debugging/analysis if this wasn't based on luck.

@DaveCTurner DaveCTurner deleted the 2018-06-24-do-not-reconnect-to-disconnected-nodes branch March 3, 2019 11:27
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 4, 2019
Today, when applying new cluster state we attempt to connect to all of its
nodes as a blocking part of the application process. This is the right thing to
do with new nodes, and is a no-op on any already-connected nodes, but is
questionable on known nodes from which we are currently disconnected: there is
a risk that we are partitioned from these nodes so that any attempt to connect
to them will hang until it times out. This can dramatically slow down the
application of new cluster states which hinders the recovery of the cluster
during certain kinds of partition.

If nodes are disconnected from the master then it is likely that they are to be
removed as part of a subsequent cluster state update, so there's no need to try
and reconnect to them like this. Moreover there is no need to attempt to
reconnect to disconnected nodes as part of the cluster state application
process, because we periodically try and reconnect to any disconnected nodes,
and handle their disconnectedness reasonably gracefully in the meantime.

This commit alters this behaviour to avoid reconnecting to known nodes during
cluster state application.

Resolves elastic#29025.
Supersedes elastic#31547.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. :Distributed/Network Http and internode communication implementations >enhancement >non-issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants