-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote cluster DNS cached indefinitely #28858
Comments
Experienced the same thing today, running Cross Cluster Nodes in Amazon ECS pointing at Amazon ECS hosted coordinator nodes for AZ-specific clusters (due to cross-AZ data transfer costs hitting 4~ digits previously), behind each AZ specific coord nodes are stateful EC2 based ES data nodes. Had an AZ specific coordinator replacement occur, and the CCS nodes could no longer access that cluster and would error on queries. Ended up having to replace the CCS nodes so they would rediscover the IP listing on startup. This would be awesome to have resolved. |
@dylanPowers can you confirm what version you are running right now? We are on 5.6.x, was considering upgrading to 6.1.x sooner rather than later until I saw this reported :) |
Currently running 5.6.x with the intent to upgrade to 6.x for the optional remote cluster feature. I didn't see anything in the source or change log to denote any sort of fix. I ended up creating a separate service that pings all the nodes at /_remote/info to do what I said previously and it works pretty well. |
Pinging @elastic/es-core-infra |
Hey, Also suffering from this issue with our Elasticsearch environment (ES 6.2.2) when using Cross Cluster Search. Seems like once it connects to the seed on a specific IP it does not lookup the DNS again. Elasticsearch should observe the TTL of the DNS so that if the Elasticearch instances behind the DNS for the seed change, Elasticsearch will move the connection to whichever instance is returned by the new DNS lookup. This is very important in a failure scenario when the instance it is connected to is no longer available, it would be preferred if it looks up the DNS name again, not matter what the TTL is currently at. The impact here is that if you replace the node that is being used as the seed, Kibana users are faced with a not so nice error.
Cheers, |
Hi. Any updates on this topic? Manual elasticsearch restart helps, but setting crontab for that is not the answer :) |
This would be great to have resolved... I don't feel confident enough on my Java skills to implement it though. Anyone willing to contribute a fix? |
Fix incoming in #32764 |
@original-brownbear woohoo thx :) |
* Lazy resolve DNS (i.e. `String` to `DiscoveryNode`) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html) * Changed `InetAddress` type to `String` for that higher up the stack * Passed down `Supplier<DiscoveryNode>` instead of outright `DiscoveryNode` from `RemoteClusterAware#buildRemoteClustersSeeds` on to lazy resolve DNS when the `DiscoveryNode` is actually used (could've also passed down the value of `clusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting)` together with the `List<String>` of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset). * Closes #28858
* Lazy resolve DNS (i.e. `String` to `DiscoveryNode`) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html) * Changed `InetAddress` type to `String` for that higher up the stack * Passed down `Supplier<DiscoveryNode>` instead of outright `DiscoveryNode` from `RemoteClusterAware#buildRemoteClustersSeeds` on to lazy resolve DNS when the `DiscoveryNode` is actually used (could've also passed down the value of `clusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting)` together with the `List<String>` of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset). * Closes elastic#28858
* Lazy resolve DNS (i.e. `String` to `DiscoveryNode`) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html) * Changed `InetAddress` type to `String` for that higher up the stack * Passed down `Supplier<DiscoveryNode>` instead of outright `DiscoveryNode` from `RemoteClusterAware#buildRemoteClustersSeeds` on to lazy resolve DNS when the `DiscoveryNode` is actually used (could've also passed down the value of `clusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting)` together with the `List<String>` of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset). * Closes #28858
* Lazy resolve DNS (i.e. `String` to `DiscoveryNode`) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html) * Changed `InetAddress` type to `String` for that higher up the stack * Passed down `Supplier<DiscoveryNode>` instead of outright `DiscoveryNode` from `RemoteClusterAware#buildRemoteClustersSeeds` on to lazy resolve DNS when the `DiscoveryNode` is actually used (could've also passed down the value of `clusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting)` together with the `List<String>` of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset). * Closes #28858
I'm finding that cross cluster search doesn't continuously resolve hostnames and instead caches the addresses indefinitely. Correct me if I'm wrong in my analysis but it appears that hostnames for the seeds are only resolved upon node startup and update to the seed list. I'm seeing this at https://github.com/elastic/elasticsearch/blob/v6.3.2/server/src/main/java/org/elasticsearch/transport/RemoteClusterService.java#L318 and https://github.com/elastic/elasticsearch/blob/v6.3.2/server/src/main/java/org/elasticsearch/transport/RemoteClusterService.java#L333.
This makes cross cluster search impossible to use in any environment with changing IP's such as a containerized environment.
The best idea I have for a workaround is to have each node routinely ping it's own /_remote/info endpoint and whenever an empty seeds list appears from it being disconnected, dynamically remove the seed from the cluster settings, and re-add it.
The text was updated successfully, but these errors were encountered: