Clarify / improve progressiveTimeout #3840

chrisbobbe · 2020-01-24T19:40:35Z

Here's an initial proposal for a new way of handing our progressiveTimeout sleeps, based on a discussion at #3829 (comment).

The most important change is that this doesn't use a global state for the sleep duration. We don't usually want the sleeping durations for different, potentially unrelated network requests to interfere with each other, especially given another proposal made in response to that issue, that we can give the caller the option to make the promise reject if it's time to give up on that network request, which would help with features like this:

Instead, when the oldest outbox message reaches a certain age (perhaps 60 seconds? 120 seconds?) and we've been unable to send, we should stop trying to send it; give the user feedback that it failed; and give an option to retry the message or delete it.

chrisbobbe · 2020-01-24T19:41:35Z

type BackoffMachineOptions = {
  firstDuration?: number,
  durationCeiling?: number,
  maxCallsBeforeReject?: number | null,
  maxTimeElapsedBeforeReject?: number | null,
};

/**
 * Makes a machine that can sleep for a timeout that grows exponentially
 * in duration with the number of calls, with a base of 2, up to a ceiling.
 * The machine should be created before a loop starts, and .wait() should
 * be called in each iteration of the loop. The machine should not be re-used
 * after exiting the loop.
 *
 * E.g., if firstDuration is 100 and durationCeiling is 10 * 1000 = 10000,
 * the sequence is
 *
 * 100, 200, 400, 800, 1600, 3200, 6400, 10000, 10000, 10000, ...
 *
 * It may optionally be configured so the promise returned by .wait() rejects,
 * either after a certain number of calls or after a threshold of elapsed time
 * since the first call. (N.B., this elapsed time will include the time spent
 * on awaited tasks in subsequent iterations of the loop, such as network
 * requests.)
 */
export const makeBackoffMachine = (options: BackoffMachineOptions = {}) => {
  const {
    firstDuration = 100,
    durationCeiling = 10 * 1000,
    maxCallsBeforeReject = null,
    maxTimeElapsedBeforeReject = null,
  } = options;

  const base = 2;

  let startTime: number;
  let calls: number = 0;

  return {
    async wait(): Promise<void> {
      if (startTime === undefined) {
        startTime = Date.now();
      }

      await new Promise((resolve, reject) => {
        if (maxCallsBeforeReject !== null && calls === maxCallsBeforeReject) {
          reject();
        }
        if (
          maxTimeElapsedBeforeReject !== null
          && Date.now() - startTime >= maxTimeElapsedBeforeReject
        ) {
          reject();
        }

        setTimeout(
          resolve,
          Math.min(
            // Duration should not exceed durationCeiling
            durationCeiling,
            firstDuration * base ** calls,
          ),
        );
      });

      calls++;
    },
  };
};

and then callers would look like

const backoffMachine = makeBackoffMachine(/* options, as desired */);
while (true) {
  try {
    // ...make network request(s)
  } catch (e) {
    // ...break on non-transient errors
    await backoffMachine.wait();
  }
}

chrisbobbe · 2020-01-24T19:44:49Z

@gnprice Would you prefer that I put this in a PR, as it's kind of a large code block here? It's rough, though, so I'd mark it as WIP.

chrisbobbe · 2020-01-24T19:50:34Z

I supposed it was clearer to make the timeout duration a function of calls, rather than the dynamic approach of calculating one duration using the previous duration (as was done previously), but there may be other considerations I'm not thinking of. Also, I believe I've correctly renamed the constant 2 from "exponent" to "base."

chrisbobbe · 2020-01-24T20:07:34Z

Hmm, looking at that last code block (what callers look like), it wouldn't handle well the proposed case where backoffMachine.wait(); returns a promise that rejects. The most obvious solution is to put await backoffMachine.wait(); in a try block of its own, with a catch that breaks the loop if it rejects. But it's already in a catch block, so we might have a style/readability concern there.

chrisbobbe · 2020-01-24T21:29:55Z

(Now making changes based on feedback in person — in particular, not making it configurable; instead, the "give up" logic will be handled at .wait() call sites.)

Fixes: zulip#3840. Replace progressiveTimeout with makeBackoffMachine, which is called above a loop, and .wait() can be called inside the loop on the returned backoffMachine. Give access to numSleepsCompleted and timeElapsedSoFar so "give up" logic (e.g., on too many network requests with transient failures) can be handled near .wait() call sites. Prepare for work on zulip#3829 (a permanent timeout on sending outbox messages). Previously, the state logic in progressiveTimeout was global, which meant a potentially desirable general throttle on all network requests that used it, but the drawbacks were greater. A particular call to progressiveTimeout was nondeterministic, because other calls may have been made recently, affecting the resulting delay duration. A 60-second threshold was used as a heuristic to distinguish request retry loops from each other, but this is much more effectively done with the new interface.

chrisbobbe mentioned this issue Jan 24, 2020

Time out sending a message, and indicate failure #3829

Open

chrisbobbe mentioned this issue Jan 27, 2020

makeBackoffMachine: Improve backoff sleep logic by making it local. #3841

Merged

chrisbobbe self-assigned this Jan 27, 2020

rk-for-zulip closed this as completed in #3841 Apr 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify / improve progressiveTimeout #3840

Clarify / improve progressiveTimeout #3840

chrisbobbe commented Jan 24, 2020 •

edited

Loading

chrisbobbe commented Jan 24, 2020 •

edited

Loading

chrisbobbe commented Jan 24, 2020

chrisbobbe commented Jan 24, 2020

chrisbobbe commented Jan 24, 2020 •

edited

Loading

chrisbobbe commented Jan 24, 2020 •

edited

Loading

Clarify / improve progressiveTimeout #3840

Clarify / improve progressiveTimeout #3840

Comments

chrisbobbe commented Jan 24, 2020 • edited Loading

chrisbobbe commented Jan 24, 2020 • edited Loading

chrisbobbe commented Jan 24, 2020

chrisbobbe commented Jan 24, 2020

chrisbobbe commented Jan 24, 2020 • edited Loading

chrisbobbe commented Jan 24, 2020 • edited Loading

chrisbobbe commented Jan 24, 2020 •

edited

Loading

chrisbobbe commented Jan 24, 2020 •

edited

Loading

chrisbobbe commented Jan 24, 2020 •

edited

Loading

chrisbobbe commented Jan 24, 2020 •

edited

Loading