Remote cluster DNS cached indefinitely #28858

dylanPowers · 2018-03-01T02:02:24Z

I'm finding that cross cluster search doesn't continuously resolve hostnames and instead caches the addresses indefinitely. Correct me if I'm wrong in my analysis but it appears that hostnames for the seeds are only resolved upon node startup and update to the seed list. I'm seeing this at https://github.com/elastic/elasticsearch/blob/v6.3.2/server/src/main/java/org/elasticsearch/transport/RemoteClusterService.java#L318 and https://github.com/elastic/elasticsearch/blob/v6.3.2/server/src/main/java/org/elasticsearch/transport/RemoteClusterService.java#L333.

This makes cross cluster search impossible to use in any environment with changing IP's such as a containerized environment.

The best idea I have for a workaround is to have each node routinely ping it's own /_remote/info endpoint and whenever an empty seeds list appears from it being disconnected, dynamically remove the seed from the cluster settings, and re-add it.

CpuID · 2018-03-04T23:19:11Z

Experienced the same thing today, running Cross Cluster Nodes in Amazon ECS pointing at Amazon ECS hosted coordinator nodes for AZ-specific clusters (due to cross-AZ data transfer costs hitting 4~ digits previously), behind each AZ specific coord nodes are stateful EC2 based ES data nodes.

Had an AZ specific coordinator replacement occur, and the CCS nodes could no longer access that cluster and would error on queries. Ended up having to replace the CCS nodes so they would rediscover the IP listing on startup.

This would be awesome to have resolved.

CpuID · 2018-03-04T23:24:20Z

@dylanPowers can you confirm what version you are running right now? We are on 5.6.x, was considering upgrading to 6.1.x sooner rather than later until I saw this reported :)

dylanPowers · 2018-03-05T12:46:57Z

Currently running 5.6.x with the intent to upgrade to 6.x for the optional remote cluster feature. I didn't see anything in the source or change log to denote any sort of fix. I ended up creating a separate service that pings all the nodes at /_remote/info to do what I said previously and it works pretty well.

elasticmachine · 2018-03-09T14:18:52Z

Pinging @elastic/es-core-infra

thomasriley · 2018-03-22T12:03:19Z

Hey,

Also suffering from this issue with our Elasticsearch environment (ES 6.2.2) when using Cross Cluster Search. Seems like once it connects to the seed on a specific IP it does not lookup the DNS again.

Elasticsearch should observe the TTL of the DNS so that if the Elasticearch instances behind the DNS for the seed change, Elasticsearch will move the connection to whichever instance is returned by the new DNS lookup. This is very important in a failure scenario when the instance it is connected to is no longer available, it would be preferred if it looks up the DNS name again, not matter what the TTL is currently at.

The impact here is that if you replace the node that is being used as the seed, Kibana users are faced with a not so nice error.

Error: Request to Elasticsearch failed: {"error":{"root_cause":[{"type":"connect_transport_exception","reason":"[][ip:9300] connect_exception"}],"type":"transport_exception","reason":"unable to communicate with remote cluster [logs]","caused_by":{"type":"connect_transport_exception","reason":"[][ip:9300] connect_exception","caused_by":{"type":"annotated_no_route_to_host_exception","reason":"No route to host: hostname-to-out-cluster/ip:9300","caused_by":{"type":"no_route_to_host_exception","reason":"No route to host"}}}},"status":500}

Cheers,
Tom

chernecov · 2018-05-31T18:34:42Z

Hi. Any updates on this topic?
No pressure, but when to expect this problem to be addressed?
Thanks.

Manual elasticsearch restart helps, but setting crontab for that is not the answer :)

CpuID · 2018-06-27T23:36:30Z

This would be great to have resolved... I don't feel confident enough on my Java skills to implement it though. Anyone willing to contribute a fix?

original-brownbear · 2018-08-10T04:29:36Z

Fix incoming in #32764

CpuID · 2018-08-10T04:34:21Z

@original-brownbear woohoo thx :)

* Lazy resolve DNS (i.e. `String` to `DiscoveryNode`) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html) * Changed `InetAddress` type to `String` for that higher up the stack * Passed down `Supplier<DiscoveryNode>` instead of outright `DiscoveryNode` from `RemoteClusterAware#buildRemoteClustersSeeds` on to lazy resolve DNS when the `DiscoveryNode` is actually used (could've also passed down the value of `clusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting)` together with the `List<String>` of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset). * Closes #28858

* Lazy resolve DNS (i.e. `String` to `DiscoveryNode`) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html) * Changed `InetAddress` type to `String` for that higher up the stack * Passed down `Supplier<DiscoveryNode>` instead of outright `DiscoveryNode` from `RemoteClusterAware#buildRemoteClustersSeeds` on to lazy resolve DNS when the `DiscoveryNode` is actually used (could've also passed down the value of `clusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting)` together with the `List<String>` of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset). * Closes elastic#28858

* Lazy resolve DNS (i.e. `String` to `DiscoveryNode`) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html) * Changed `InetAddress` type to `String` for that higher up the stack * Passed down `Supplier<DiscoveryNode>` instead of outright `DiscoveryNode` from `RemoteClusterAware#buildRemoteClustersSeeds` on to lazy resolve DNS when the `DiscoveryNode` is actually used (could've also passed down the value of `clusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting)` together with the `List<String>` of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset). * Closes #28858

javanna added :Search/Search Search-related issues that do not fall into other categories discuss :Distributed/Network Http and internode communication implementations and removed :Search/Search Search-related issues that do not fall into other categories labels Mar 1, 2018

jasontedor added :Distributed/Network Http and internode communication implementations and removed discuss :Distributed/Network Http and internode communication implementations labels Mar 9, 2018

colings86 added the >bug label Apr 24, 2018

jasontedor assigned original-brownbear Aug 8, 2018

original-brownbear mentioned this issue Aug 10, 2018

NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS #32764

Merged

original-brownbear closed this as completed in #32764 Aug 18, 2018

original-brownbear mentioned this issue Aug 19, 2018

NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS (#32764) #32976

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote cluster DNS cached indefinitely #28858

Remote cluster DNS cached indefinitely #28858

dylanPowers commented Mar 1, 2018 •

edited

Loading

CpuID commented Mar 4, 2018

CpuID commented Mar 4, 2018

dylanPowers commented Mar 5, 2018

elasticmachine commented Mar 9, 2018

thomasriley commented Mar 22, 2018

chernecov commented May 31, 2018 •

edited

Loading

CpuID commented Jun 27, 2018

original-brownbear commented Aug 10, 2018

CpuID commented Aug 10, 2018

Remote cluster DNS cached indefinitely #28858

Remote cluster DNS cached indefinitely #28858

Comments

dylanPowers commented Mar 1, 2018 • edited Loading

CpuID commented Mar 4, 2018

CpuID commented Mar 4, 2018

dylanPowers commented Mar 5, 2018

elasticmachine commented Mar 9, 2018

thomasriley commented Mar 22, 2018

chernecov commented May 31, 2018 • edited Loading

CpuID commented Jun 27, 2018

original-brownbear commented Aug 10, 2018

CpuID commented Aug 10, 2018

dylanPowers commented Mar 1, 2018 •

edited

Loading

chernecov commented May 31, 2018 •

edited

Loading