Support Query Ramp-up #1195

gingerwizard · 2021-02-23T15:11:26Z

Currently, after a warmup period, all query tasks are executed together at the same time. After the first execution, they respect their configured scheduler for subsequent executions.

In cases with a high number of tasks (e.g. executing using a parallel block) this absence of a ramp up, can produce an initial extremely high load and result in excessive utilization. This is best illustrated through an example:

Even considering the warmup period (darker in the first visual) it's apparent we suffer from a spike before the model stabilizes.

We should consider both short term improvements here e.g. maybe tasks call their scheduler before executing the first time?,
and longer-term possible improvements e.g. a ramp-up period.

@danielmitterdorfer

danielmitterdorfer · 2021-02-24T08:55:14Z

We should consider both short term improvements here e.g. maybe tasks call their scheduler before executing the first time

I had a look at this proposal and the change for this is actually pretty small:

rally/esrally/driver/driver.py

Lines 1742 to 1771 in 3875968

    
           async def __call__(self): 
        
               next_scheduled = 0 
        
               if self.task_progress_control.infinite: 
        
                   param_source_knows_progress = hasattr(self.params, "percent_completed") 
        
                   self.task_progress_control.start() 
        
                   while True: 
        
                       try: 
        
                           # does not contribute at all to completion. Hence, we cannot define completion. 
        
                           percent_completed = self.params.percent_completed if param_source_knows_progress else None 
        
                           #current_params = await self.loop.run_in_executor(self.io_pool_exc, self.params.params) 
        
                           yield (next_scheduled, self.task_progress_control.sample_type, percent_completed, self.runner, 
        
                                  self.params.params()) 
        
                           next_scheduled = self.sched.next(next_scheduled) 
        
                           self.task_progress_control.next() 
        
                       except StopIteration: 
        
                           return 
        
               else: 
        
                   self.task_progress_control.start() 
        
                   while not self.task_progress_control.completed: 
        
                       try: 
        
                           #current_params = await self.loop.run_in_executor(self.io_pool_exc, self.params.params) 
        
                           yield (next_scheduled, 
        
                                  self.task_progress_control.sample_type, 
        
                                  self.task_progress_control.percent_completed, 
        
                                  self.runner, 
        
                                  self.params.params()) 
        
                           next_scheduled = self.sched.next(next_scheduled) 
        
                           self.task_progress_control.next() 
        
                       except StopIteration: 
        
                           return

We basically need to call the scheduler's next method before the generator yields insted of afterwards.

gingerwizard · 2021-02-24T11:47:33Z

This maybe has the potential to cause a deviation in nightly performance @danielmitterdorfer if the number of requests isn't high - especially around our 95th and 99th percentile. If anything it might bring greater stability to the results 🤞

danielmitterdorfer · 2021-02-24T12:36:52Z

I don't think that it would cause a deviation in the nightlies for most tracks because we only test with a single client and the effect only shows up with multiple clients. But for more complex benchmarks this is more likely an issue.

With this commit we consult the scheduler prior to issuing a request instead of afterwards. If we don't do that, clients can coordinate creating a large initial load spike in Elasticsearch. With this countermeasure, it is possible that clients avoid this initial spike if a non-determistic scheduler, such as the poisson scheduler is chosen. Relates elastic#1195

With this commit we consult the scheduler prior to issuing a request instead of afterwards. If we don't do that, clients can coordinate creating a large initial load spike in Elasticsearch. With this countermeasure, it is possible that clients avoid this initial spike if a non-determistic scheduler, such as the poisson scheduler is chosen. Relates #1195

With this commit we allow users to ramp-up load gradually by specifying the task property `ramp-up-time-period`. If a non-zero value is specified, Rally will gradually add clients during that time period until the target client count as specified by `clients` is reached. This reduces the potential for initial load spikes when running with many concurrent clients. Closes elastic#1195

With this commit we allow users to ramp-up load gradually by specifying the task property `ramp-up-time-period`. If a non-zero value is specified, Rally will gradually add clients during that time period until the target client count as specified by `clients` is reached. This reduces the potential for initial load spikes when running with many concurrent clients. Closes #1195

gingerwizard added the enhancement Improves the status quo label Feb 23, 2021

gingerwizard changed the title ~~Support Query Rampup~~ Support Query Ramp-up Feb 23, 2021

danielmitterdorfer added this to the 2.x milestone Feb 23, 2021

danielmitterdorfer mentioned this issue Feb 24, 2021

Throttle requests from the beginning #1199

Merged

danielmitterdorfer self-assigned this Mar 8, 2021

danielmitterdorfer added the :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. label May 5, 2021

danielmitterdorfer modified the milestones: 2.x, 2.2.1 May 5, 2021

danielmitterdorfer mentioned this issue May 19, 2021

Support gradual ramp-up #1266

Merged

danielmitterdorfer closed this as completed in #1266 Jun 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Query Ramp-up #1195

Support Query Ramp-up #1195

gingerwizard commented Feb 23, 2021

danielmitterdorfer commented Feb 24, 2021

gingerwizard commented Feb 24, 2021 •

edited

Loading

danielmitterdorfer commented Feb 24, 2021

Support Query Ramp-up #1195

Support Query Ramp-up #1195

Comments

gingerwizard commented Feb 23, 2021

danielmitterdorfer commented Feb 24, 2021

gingerwizard commented Feb 24, 2021 • edited Loading

danielmitterdorfer commented Feb 24, 2021

gingerwizard commented Feb 24, 2021 •

edited

Loading