feat(dup): optimize time-lag by reducing repeat delay #450

neverchanje · 2020-04-29T07:02:20Z

What problem did this PR solve?

Previously, the time-lag between master and slave was 10+seconds. It's unacceptable for most of our users. After listening for their requirement I found a 1-second-time-lag is mostly satisfying for them. So here I did some optimization experiments:

Reduce the repeat delay time when one replica finds no committed mutations on disk:

    if (_replica->private_log()->max_commit_on_disk() < _start_decree) {
-        // wait 10 seconds for next try if no mutation was added.
-        repeat(10_s);
+        // wait 100ms for next try if no mutation was added.
+        repeat(100_ms);
        return;
    }

This optimization greatly decreases the time lag to 1s.

Another optimization is based on that load_from_private_log will sleep 10s when it is unable to read more from plog. One naive implementation is to retry at a shorter period (200ms e.g.). But the server load may increase due to more retries.

void load_from_private_log::replay_log_block()
{
  error_s err = mutation_log::replay_block();
  if (!err.is_ok()) {
    repeat(10_s); // optimize: =>200ms
    return;
  }
}

I made a test to see if this optimization works:

repeat delay: 200ms

repeat delay: 10s

As we can see, this opt makes no diff (1K Write-QPS) on bytes-read and time-lag, which means there was completely no failure retry during the above tests. I believe retry is also rare in real-world cases. So I abandon this optimization in this PR.

acelyc111 · 2020-04-30T15:19:07Z

Can we use semaphore instead of sleep? it'll will reduce lag time much more.

neverchanje added the component/duplication label Apr 29, 2020

feat(dup): optimize time-lag by reducing repeat delay

520cd28

neverchanje force-pushed the dup-repeat branch from 95153d0 to 520cd28 Compare April 29, 2020 15:18

acelyc111 approved these changes Apr 30, 2020

View reviewed changes

Merge branch 'master' into dup-repeat

0238cbe

hycdong approved these changes Apr 30, 2020

View reviewed changes

hycdong merged commit 966c04d into XiaoMi:master Apr 30, 2020

neverchanje mentioned this pull request May 14, 2020

Release 2.0.0 apache/incubator-pegasus#536

Closed

neverchanje deleted the dup-repeat branch May 14, 2020 08:22

neverchanje added the 2.0.0 label Jun 5, 2020

neverchanje mentioned this pull request Jun 10, 2020

Release 1.12.4 apache/incubator-pegasus#547

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dup): optimize time-lag by reducing repeat delay #450

feat(dup): optimize time-lag by reducing repeat delay #450

neverchanje commented Apr 29, 2020 •

edited

Loading

acelyc111 commented Apr 30, 2020

feat(dup): optimize time-lag by reducing repeat delay #450

feat(dup): optimize time-lag by reducing repeat delay #450

Conversation

neverchanje commented Apr 29, 2020 • edited Loading

What problem did this PR solve?

acelyc111 commented Apr 30, 2020

neverchanje commented Apr 29, 2020 •

edited

Loading