-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to manage Elasticsearch nodes separately from benchmarking #697
Labels
:Benchmark Candidate Management
Anything affecting how Rally sets up Elasticsearch
enhancement
Improves the status quo
highlight
A substantial improvement that is worth mentioning separately in release notes
meta
A high-level issue of a larger topic which requires more fine-grained issues / PRs
:Telemetry
Telemetry Devices that gather additional metrics
Milestone
Comments
danielmitterdorfer
added
enhancement
Improves the status quo
meta
A high-level issue of a larger topic which requires more fine-grained issues / PRs
:Telemetry
Telemetry Devices that gather additional metrics
:Benchmark Candidate Management
Anything affecting how Rally sets up Elasticsearch
highlight
A substantial improvement that is worth mentioning separately in release notes
labels
May 23, 2019
Merged
ebadyano
added a commit
to ebadyano/rally
that referenced
this issue
Jul 3, 2019
Ensure that DiskIo telemetry does not rely on Rally being a parent process of Elasticsearch and persists the disk counters at the beginning of a benchmark and can read it again afterwards. Relates to elastic#697
ebadyano
added a commit
to ebadyano/rally
that referenced
this issue
Jul 3, 2019
Ensure that DiskIo telemetry does not rely on Rally being a parent process of Elasticsearch and persists the disk counters at the beginning of a benchmark and can read it again afterwards. Relates to elastic#697
ebadyano
added a commit
to ebadyano/rally
that referenced
this issue
Jul 3, 2019
Ensure that DiskIo telemetry does not rely on Rally being a parent process of Elasticsearch and persists the disk counters at the beginning of a benchmark and can read it again afterwards. Relates to elastic#697
ebadyano
added a commit
to ebadyano/rally
that referenced
this issue
Jul 17, 2019
Ensure that DiskIo telemetry does not rely on Rally being a parent process of Elasticsearch and persists the disk counters at the beginning of a benchmark and can read it again afterwards. Relates to elastic#697
ebadyano
added a commit
that referenced
this issue
Jul 25, 2019
Ensure that DiskIo telemetry does not rely on Rally being a parent process of Elasticsearch and persists the disk counters at the beginning of a benchmark and can read it again afterwards. Relates to #697
ebadyano
added a commit
to ebadyano/rally
that referenced
this issue
Jul 31, 2019
By using ES_JAVA_OPTS we can provision a node, run a benchmark, and then “dynamically” (i.e. without reprovisioning) start the node again with telemetry attached. Relates to elastic#697 Relates to elastic#711
novosibman
pushed a commit
to novosibman/rally
that referenced
this issue
Aug 12, 2019
Ensure that DiskIo telemetry does not rely on Rally being a parent process of Elasticsearch and persists the disk counters at the beginning of a benchmark and can read it again afterwards. Relates to elastic#697
novosibman
pushed a commit
to novosibman/rally
that referenced
this issue
Aug 12, 2019
Ensure that DiskIo telemetry does not rely on Rally being a parent process of Elasticsearch and persists the disk counters at the beginning of a benchmark and can read it again afterwards. Relates to elastic#697
novosibman
pushed a commit
to novosibman/rally
that referenced
this issue
Aug 12, 2019
Ensure that DiskIo telemetry does not rely on Rally being a parent process of Elasticsearch and persists the disk counters at the beginning of a benchmark and can read it again afterwards. Relates to elastic#697
ebadyano
added a commit
that referenced
this issue
Aug 20, 2019
danielmitterdorfer
added a commit
to danielmitterdorfer/rally-eventdata-track
that referenced
this issue
Oct 1, 2019
With this commit we add a smoke test script that allows to run a benchmark in test mode against (almost) all challenges in this track. A few challenges have been excluded intentionally because they rely on other challenges being run first. While it would be possible to make this work with workarounds we should wait for a proper solution with elastic/rally#697
danielmitterdorfer
added a commit
that referenced
this issue
Oct 2, 2019
With this commit we gather cluster-level metrics in the driver instead of the mechanic. As these metrics are gathered via API calls there is no need to gather them on the very same machine were an Elasticsearch node is running. Instead, it defines clearer boundaries in between these two components. Relates #697 Relates #779
novosibman
pushed a commit
to novosibman/rally
that referenced
this issue
Oct 2, 2019
By using ES_JAVA_OPTS we can provision a node, run a benchmark, and then “dynamically” (i.e. without reprovisioning) start the node again with telemetry attached. Relates to elastic#697 Relates to elastic#711
novosibman
pushed a commit
to novosibman/rally
that referenced
this issue
Oct 2, 2019
With this commit we gather cluster-level metrics in the driver instead of the mechanic. As these metrics are gathered via API calls there is no need to gather them on the very same machine were an Elasticsearch node is running. Instead, it defines clearer boundaries in between these two components. Relates elastic#697 Relates elastic#779
danielmitterdorfer
added a commit
to elastic/rally-eventdata-track
that referenced
this issue
Oct 7, 2019
With this commit we add a smoke test script that allows to run a benchmark in test mode against (almost) all challenges in this track. A few challenges have been excluded intentionally because they rely on other challenges being run first. While it would be possible to make this work with workarounds we should wait for a proper solution with elastic/rally#697 Relates #47
danielmitterdorfer
added a commit
to elastic/rally-eventdata-track
that referenced
this issue
Oct 7, 2019
With this commit we add a smoke test script that allows to run a benchmark in test mode against (almost) all challenges in this track. A few challenges have been excluded intentionally because they rely on other challenges being run first. While it would be possible to make this work with workarounds we should wait for a proper solution with elastic/rally#697 Relates #47
Closed
4 tasks
danielmitterdorfer
added a commit
that referenced
this issue
Dec 4, 2019
With this commit we introduce three new subcommands to Rally: * `install`: To install a single Elasticsearch node locally * `start`: To start an Elasticsearch node that has been previously installed * `stop`: To stop a running Elasticsearch node To run a benchmark, users first issue `install`, followed by `start` on all nodes. Afterwards, the benchmark is run using the `benchmark-only` pipeline. Finally, the `stop` command is invoked on all nodes to shutdown the cluster. To ensure that system metrics are stored consistently (i.e. they contain the same metadata like race id and race timestamp), we expose the race id as a command line parameter and defer writing any system metrics until the `stop` command is invoked. We attempt to read race metadata from the Elasticsearch metrics store for that race id which have been written earlier by the benchmark and merge the metadata when we write the system metrics. The current implementation is considered a new experimental addition to the existing mechanism to manage clusters with the intention to eventually replace it. The command line interface is specific to Zen discovery and subject to change as we learn more about its use. Relates #830 Closes #697
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Benchmark Candidate Management
Anything affecting how Rally sets up Elasticsearch
enhancement
Improves the status quo
highlight
A substantial improvement that is worth mentioning separately in release notes
meta
A high-level issue of a larger topic which requires more fine-grained issues / PRs
:Telemetry
Telemetry Devices that gather additional metrics
Currently Rally can be used as a load generator and to manage Elasticsearch instances (building a distribution from sources, installing and configuring Elasticsearch as well as launching / stopping it). It is possible to use Rally as a standalone load generator (
--pipeline=benchmark-only
) but it is not possible to use Rally only to manage Elasticsearch nodes. In order to do that, we should allow to call the provisioning and launch phases as separate subcommands in Rally. We should only support a local mode though, i.e. it is not possible to only setup a cluster from the coordinator node but users must invoke Rally instead on each of the target nodes. As a corollary, each command applies to only one node.We will add the following subcommands:
provision
: Provisions a single node. The artefact might be built from sources or a distribution that has been downloaded.start
: Starts an already provisioned node.stop
: Stops an already started node.Implementation note: None of these commands should rely on the actor system and instead run the commands in the main Rally process.
Provisioning
This covers all steps from retrieving the binary (either download or build it via Gradle) until a fully configured node is installed. Note that there are essentially no new steps, it's just that we expose this now as a dedicated subcommand. One tricky aspect is that we are currently able to create a unique node name by increasing a counter (
rally-node-{N}
), see also:rally/esrally/mechanic/mechanic.py
Lines 178 to 186 in f575790
This will not be possible anymore with the
provision
subcommand as it does not have a global view of the cluster but rather a per-node view.Post condition
After the provisioner has (successfully) run the following conditions are met:
~/.rally/benchmarks/nodes/$ID/install
where$ID
is a globally unique id generated by the provisioner.~/.rally/benchmarks/nodes/$ID/provisioner.json
which contains the necessary data that needs to be exchanged between the provisioner and the launcher (seeNodeConfiguration
for data that are exchanged between those two).$ID
is written to stdout. This id can then be used later on to start and stop this node.Note: It might make sense that we specify all telemetry devices as part of the provisioning process regardless when they get attached (some are only attached on launch) and exchange the necessary information via the provisioner metadata.
Start
This will launch a single node. The launcher process will not be a parent process of Elasticsearch (this is a change to the previous behavior) and will instead terminate once the node has successfully started. As input it will use the provisioner id to read the provisioner metadata (
NodeConfiguration
) to startup the node. Depending on the launcher type we may need to persist the PID of the Elasticsearch process. For plain vanilla Elasticsearch we might just run it as a daemon. Any files should go to~/.rally/benchmarks/races/$ID
if needed.Post condition
After the launcher has (successfully) run the following conditions are met:
$ID
.Stop
This will stop a node that has previously been started. If
--preserve-install=true
is specified on the command line, the node's installation and data directory will be cleaned up. Logs and telemetry data should always be kept (all of this is no change to previous behavior).Post condition
--preserve-install=true
).Preparatory work
As a preparation we should:
~/.rally/benchmarks/nodes/$ID
) - see Change filestore to be indexed by unique ID #720ES_JAVA_OPTS
from the launcher to the provisioner and instead persist the necessary information inconfig/jvm.options
. - see Change telemetry devices to rely on jvm.config instead of ES_JAVA_OPTS #711DiskIo
telemetry device we should ensure that it persists the disk counters at the beginning of a benchmark and can read it again afterwards. - see Update DiskIo telemetry device to persist the counters #731Tasks
start
,stop
andprovision
as subcommands - see Manage Elasticsearch nodes with dedicated subcommands #830Follow-up work
We will continue to support the actor-system-based approach for now but we intend to remove this coordination layer at some point and let users choose freely how they want to manage coordination (e.g. via Ansible) as this also allows for more complex setups. Doing this will require additional preparation though so we will tackle this in a separate issue.
The text was updated successfully, but these errors were encountered: