Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-node clusters use non-trivial discovery config when they shouldn't #1644

Open
danielmitterdorfer opened this issue Jan 4, 2023 · 2 comments
Assignees
Labels
:Benchmark Candidate Management Anything affecting how Rally sets up Elasticsearch bug Something's wrong good first issue Small, contained changes that are good for newcomers

Comments

@danielmitterdorfer
Copy link
Member

Rally version (get with esrally --version): esrally 2.7.1.dev0 (git revision: 4e1335ee07e16a4af0de1a0b543a86595ca37803) (current master branch)

Description of the problem including expected versus actual behavior:

When running a benchmark with Rally against a single node cluster, the Elasticsearch logs contain repeated warning messages like the following:

[2023-01-04T09:39:10,994][WARN ][o.e.c.c.Coordinator      ] [rally-node-0] This node is a fully-formed single-node cluster with cluster UUID [o1zdVpJiSMOVfeSz9nye_A], but it is configured as if to discover other nodes and form a multi-node cluster via the [discovery.seed_hosts=[127.0.0.1]] setting. Fully-formed clusters do not attempt to discover other nodes, and nodes with different cluster UUIDs cannot belong to the same cluster. The cluster UUID persists across restarts and can only be changed by deleting the contents of the node's data path(s). Remove the discovery configuration to suppress this message.

This indicates that the discovery configuration is not correct, see elastic/elasticsearch#85222 for full details.

Steps to reproduce:

  1. esrally race --track=geonames --challenge=append-no-conflicts-index-only --distribution-version=8.5.3
  2. Inspect the corresponding Elasticsearch log

This can be fixed by not rendering the following line in rally-teams when there is only one node:

https://github.com/elastic/rally-teams/blob/74f96e3fab247e0c31d253408e4affd696a50eff/cars/v1/vanilla/templates/config/elasticsearch.yml#L72

However, as we pass the node ips as a pre-rendered string in the provisioner we cannot use Jinja's length filter to determine the number of items (would return the string length instead):

"all_node_ips": '["%s"]' % '","'.join(self.all_node_ips),

We could instead either provide the node ips as list or instead add an additional variable in the provisioner that contains the node count, which can then be used in the template in rally-teams. The latter is even backwards-compatible if we check for the existence of the template variable before evaluating it.

@danielmitterdorfer danielmitterdorfer added bug Something's wrong :Benchmark Candidate Management Anything affecting how Rally sets up Elasticsearch good first issue Small, contained changes that are good for newcomers labels Jan 4, 2023
@inqueue inqueue self-assigned this Jan 4, 2023
@inqueue
Copy link
Member

inqueue commented Jan 4, 2023

Thanks, @danielmitterdorfer. I will submit a PR proposal soon.

@pquentin
Copy link
Member

pquentin commented Jan 18, 2023

This can be fixed by not rendering the following line in rally-teams when there is only one node:

We discovered the hard way that it's not about the number of nodes since elastic/rally-teams#77 broke our nightly benchmarks with this error:

[2023-01-18T11:56:56,877][INFO ][o.e.t.TransportService ] [rally-node-0] publish_address {192.168.20.29:9300}, bound_addresses {192.168.20.29:9300}
[2023-01-18T11:56:56,985][INFO ][o.e.b.BootstrapChecks ] [rally-node-0] bound or publishing to a non-loopback address, enforcing bootstrap checks
    bootstrap check failure [1] of [1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

It took me a lot of time to realize that this was an error even though this is an INFO log. I reverted in elastic/rally-teams#78 until we find a better solution.

Indeed, the way Elasticsearch differentiates development vs. production mode is the address that Elasticsearch is bound to, not the number of nodes in the cluster: https://www.elastic.co/guide/en/elasticsearch/reference/8.6/bootstrap-checks.html#dev-vs-prod-mode. And on nightly benchmark that address is not localhost. So I suppose we could look at the ips and see if it's a loopback address, but we would have to get that check right (including 127.0.0.1, ::1, localhost and probably others? not sure what Elasticsearch does).

Or we could live with this warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Benchmark Candidate Management Anything affecting how Rally sets up Elasticsearch bug Something's wrong good first issue Small, contained changes that are good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants