Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify / improve progressiveTimeout #3840

Closed
chrisbobbe opened this issue Jan 24, 2020 · 5 comments · Fixed by #3841
Closed

Clarify / improve progressiveTimeout #3840

chrisbobbe opened this issue Jan 24, 2020 · 5 comments · Fixed by #3841
Assignees

Comments

@chrisbobbe
Copy link
Contributor

chrisbobbe commented Jan 24, 2020

Here's an initial proposal for a new way of handing our progressiveTimeout sleeps, based on a discussion at #3829 (comment).

The most important change is that this doesn't use a global state for the sleep duration. We don't usually want the sleeping durations for different, potentially unrelated network requests to interfere with each other, especially given another proposal made in response to that issue, that we can give the caller the option to make the promise reject if it's time to give up on that network request, which would help with features like this:

Instead, when the oldest outbox message reaches a certain age (perhaps 60 seconds? 120 seconds?) and we've been unable to send, we should stop trying to send it; give the user feedback that it failed; and give an option to retry the message or delete it.

@chrisbobbe
Copy link
Contributor Author

chrisbobbe commented Jan 24, 2020

type BackoffMachineOptions = {
  firstDuration?: number,
  durationCeiling?: number,
  maxCallsBeforeReject?: number | null,
  maxTimeElapsedBeforeReject?: number | null,
};

/**
 * Makes a machine that can sleep for a timeout that grows exponentially
 * in duration with the number of calls, with a base of 2, up to a ceiling.
 * The machine should be created before a loop starts, and .wait() should
 * be called in each iteration of the loop. The machine should not be re-used
 * after exiting the loop.
 *
 * E.g., if firstDuration is 100 and durationCeiling is 10 * 1000 = 10000,
 * the sequence is
 *
 * 100, 200, 400, 800, 1600, 3200, 6400, 10000, 10000, 10000, ...
 *
 * It may optionally be configured so the promise returned by .wait() rejects,
 * either after a certain number of calls or after a threshold of elapsed time
 * since the first call. (N.B., this elapsed time will include the time spent
 * on awaited tasks in subsequent iterations of the loop, such as network
 * requests.)
 */
export const makeBackoffMachine = (options: BackoffMachineOptions = {}) => {
  const {
    firstDuration = 100,
    durationCeiling = 10 * 1000,
    maxCallsBeforeReject = null,
    maxTimeElapsedBeforeReject = null,
  } = options;

  const base = 2;

  let startTime: number;
  let calls: number = 0;

  return {
    async wait(): Promise<void> {
      if (startTime === undefined) {
        startTime = Date.now();
      }

      await new Promise((resolve, reject) => {
        if (maxCallsBeforeReject !== null && calls === maxCallsBeforeReject) {
          reject();
        }
        if (
          maxTimeElapsedBeforeReject !== null
          && Date.now() - startTime >= maxTimeElapsedBeforeReject
        ) {
          reject();
        }

        setTimeout(
          resolve,
          Math.min(
            // Duration should not exceed durationCeiling
            durationCeiling,
            firstDuration * base ** calls,
          ),
        );
      });

      calls++;
    },
  };
};

and then callers would look like

const backoffMachine = makeBackoffMachine(/* options, as desired */);
while (true) {
  try {
    // ...make network request(s)
  } catch (e) {
    // ...break on non-transient errors
    await backoffMachine.wait();
  }
}

@chrisbobbe
Copy link
Contributor Author

@gnprice Would you prefer that I put this in a PR, as it's kind of a large code block here? It's rough, though, so I'd mark it as WIP.

@chrisbobbe
Copy link
Contributor Author

I supposed it was clearer to make the timeout duration a function of calls, rather than the dynamic approach of calculating one duration using the previous duration (as was done previously), but there may be other considerations I'm not thinking of. Also, I believe I've correctly renamed the constant 2 from "exponent" to "base."

@chrisbobbe
Copy link
Contributor Author

chrisbobbe commented Jan 24, 2020

Hmm, looking at that last code block (what callers look like), it wouldn't handle well the proposed case where backoffMachine.wait(); returns a promise that rejects. The most obvious solution is to put await backoffMachine.wait(); in a try block of its own, with a catch that breaks the loop if it rejects. But it's already in a catch block, so we might have a style/readability concern there.

@chrisbobbe
Copy link
Contributor Author

chrisbobbe commented Jan 24, 2020

(Now making changes based on feedback in person — in particular, not making it configurable; instead, the "give up" logic will be handled at .wait() call sites.)

chrisbobbe pushed a commit to chrisbobbe/zulip-mobile that referenced this issue Jan 27, 2020
Fixes: zulip#3840.

Replace progressiveTimeout with makeBackoffMachine, which is called
above a loop, and .wait() can be called inside the loop on the
returned backoffMachine. Give access to numSleepsCompleted and
timeElapsedSoFar so "give up" logic (e.g., on too many network
requests with transient failures) can be handled near .wait() call
sites. Prepare for work on zulip#3829 (a permanent timeout on sending
outbox messages).

Previously, the state logic in progressiveTimeout was global, which
meant a potentially desirable general throttle on all network
requests that used it, but the drawbacks were greater. A particular
call to progressiveTimeout was nondeterministic, because other calls
may have been made recently, affecting the resulting delay duration.
A 60-second threshold was used as a heuristic to distinguish request
retry loops from each other, but this is much more effectively done
with the new interface.
@chrisbobbe chrisbobbe self-assigned this Jan 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant