Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

feat(dup): optimize time-lag by reducing repeat delay #450

Merged
merged 2 commits into from
Apr 30, 2020

Conversation

neverchanje
Copy link
Contributor

@neverchanje neverchanje commented Apr 29, 2020

What problem did this PR solve?

Previously, the time-lag between master and slave was 10+seconds. It's unacceptable for most of our users. After listening for their requirement I found a 1-second-time-lag is mostly satisfying for them. So here I did some optimization experiments:

  1. Reduce the repeat delay time when one replica finds no committed mutations on disk:
    if (_replica->private_log()->max_commit_on_disk() < _start_decree) {
-        // wait 10 seconds for next try if no mutation was added.
-        repeat(10_s);
+        // wait 100ms for next try if no mutation was added.
+        repeat(100_ms);
        return;
    }

This optimization greatly decreases the time lag to 1s.

image

  1. Another optimization is based on that load_from_private_log will sleep 10s when it is unable to read more from plog. One naive implementation is to retry at a shorter period (200ms e.g.). But the server load may increase due to more retries.
void load_from_private_log::replay_log_block()
{
  error_s err = mutation_log::replay_block();
  if (!err.is_ok()) {
    repeat(10_s); // optimize: =>200ms
    return;
  }
}

I made a test to see if this optimization works:

  • repeat delay: 200ms

image

  • repeat delay: 10s

image

image

As we can see, this opt makes no diff (1K Write-QPS) on bytes-read and time-lag, which means there was completely no failure retry during the above tests. I believe retry is also rare in real-world cases. So I abandon this optimization in this PR.

@hycdong hycdong merged commit 966c04d into XiaoMi:master Apr 30, 2020
@acelyc111
Copy link
Member

Can we use semaphore instead of sleep? it'll will reduce lag time much more.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants