Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Query Ramp-up #1195

Closed
gingerwizard opened this issue Feb 23, 2021 · 3 comments · Fixed by #1266
Closed

Support Query Ramp-up #1195

gingerwizard opened this issue Feb 23, 2021 · 3 comments · Fixed by #1266
Assignees
Labels
enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc.
Milestone

Comments

@gingerwizard
Copy link

Currently, after a warmup period, all query tasks are executed together at the same time. After the first execution, they respect their configured scheduler for subsequent executions.

In cases with a high number of tasks (e.g. executing using a parallel block) this absence of a ramp up, can produce an initial extremely high load and result in excessive utilization. This is best illustrated through an example:

image

Even considering the warmup period (darker in the first visual) it's apparent we suffer from a spike before the model stabilizes.

We should consider both short term improvements here e.g. maybe tasks call their scheduler before executing the first time?,
and longer-term possible improvements e.g. a ramp-up period.

@danielmitterdorfer

@gingerwizard gingerwizard added the enhancement Improves the status quo label Feb 23, 2021
@gingerwizard gingerwizard changed the title Support Query Rampup Support Query Ramp-up Feb 23, 2021
@danielmitterdorfer danielmitterdorfer added this to the 2.x milestone Feb 23, 2021
@danielmitterdorfer
Copy link
Member

We should consider both short term improvements here e.g. maybe tasks call their scheduler before executing the first time

I had a look at this proposal and the change for this is actually pretty small:

rally/esrally/driver/driver.py

Lines 1742 to 1771 in 3875968

async def __call__(self):
next_scheduled = 0
if self.task_progress_control.infinite:
param_source_knows_progress = hasattr(self.params, "percent_completed")
self.task_progress_control.start()
while True:
try:
# does not contribute at all to completion. Hence, we cannot define completion.
percent_completed = self.params.percent_completed if param_source_knows_progress else None
#current_params = await self.loop.run_in_executor(self.io_pool_exc, self.params.params)
yield (next_scheduled, self.task_progress_control.sample_type, percent_completed, self.runner,
self.params.params())
next_scheduled = self.sched.next(next_scheduled)
self.task_progress_control.next()
except StopIteration:
return
else:
self.task_progress_control.start()
while not self.task_progress_control.completed:
try:
#current_params = await self.loop.run_in_executor(self.io_pool_exc, self.params.params)
yield (next_scheduled,
self.task_progress_control.sample_type,
self.task_progress_control.percent_completed,
self.runner,
self.params.params())
next_scheduled = self.sched.next(next_scheduled)
self.task_progress_control.next()
except StopIteration:
return

We basically need to call the scheduler's next method before the generator yields insted of afterwards.

@gingerwizard
Copy link
Author

gingerwizard commented Feb 24, 2021

This maybe has the potential to cause a deviation in nightly performance @danielmitterdorfer if the number of requests isn't high - especially around our 95th and 99th percentile. If anything it might bring greater stability to the results 🤞

@danielmitterdorfer
Copy link
Member

I don't think that it would cause a deviation in the nightlies for most tracks because we only test with a single client and the effect only shows up with multiple clients. But for more complex benchmarks this is more likely an issue.

danielmitterdorfer added a commit to danielmitterdorfer/rally that referenced this issue Feb 24, 2021
With this commit we consult the scheduler prior to issuing a request
instead of afterwards. If we don't do that, clients can coordinate
creating a large initial load spike in Elasticsearch. With this
countermeasure, it is possible that clients avoid this initial spike if
a non-determistic scheduler, such as the poisson scheduler is chosen.

Relates elastic#1195
@danielmitterdorfer danielmitterdorfer self-assigned this Mar 8, 2021
danielmitterdorfer added a commit that referenced this issue Mar 10, 2021
With this commit we consult the scheduler prior to issuing a request
instead of afterwards. If we don't do that, clients can coordinate
creating a large initial load spike in Elasticsearch. With this
countermeasure, it is possible that clients avoid this initial spike if
a non-determistic scheduler, such as the poisson scheduler is chosen.

Relates #1195
@danielmitterdorfer danielmitterdorfer added the :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. label May 5, 2021
@danielmitterdorfer danielmitterdorfer modified the milestones: 2.x, 2.2.1 May 5, 2021
danielmitterdorfer added a commit to danielmitterdorfer/rally that referenced this issue May 19, 2021
With this commit we allow users to ramp-up load gradually by specifying
the task property `ramp-up-time-period`. If a non-zero value is
specified, Rally will gradually add clients during that time period
until the target client count as specified by `clients` is reached. This
reduces the potential for initial load spikes when running with many
concurrent clients.

Closes elastic#1195
danielmitterdorfer added a commit that referenced this issue Jun 1, 2021
With this commit we allow users to ramp-up load gradually by specifying
the task property `ramp-up-time-period`. If a non-zero value is
specified, Rally will gradually add clients during that time period
until the target client count as specified by `clients` is reached. This
reduces the potential for initial load spikes when running with many
concurrent clients.

Closes #1195
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants