-
-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time out sending a message, and indicate failure #3829
Comments
See also #3731 . |
Also "Edit message", I think.
That would probably be better than nothing, but ideally we should have a distinction (visible and otherwise) between the two. Notably, if the user sends a message and then mutes the relevant topic, they'll probably still want to be informed that the message couldn't be sent. |
This may warrant a separate issue, but I wonder if progressiveTimeout, which is used here, a) might be improved with the option to make the returned promise reject after a certain number of tries (or on a duration-based condition), and b) is doing what it's currently intended to do. a) is more related to this issue. We could handle message sending failures locally, by breaking out of the loop and not calling progressiveTimeout after a certain condition. Or, we could build into progressiveTimeout an option to reject after a threshold is reached (maybe based on the number of tries, or most recent or cumulative duration), which makes sense conceptually as something we may want to do at other sites. Before doing this, I think there are some implications of the global nature of progressiveTimeout that would become more urgent to tease out. On the one hand, the fact that it's global is nice because, as long as progressiveTimeout is used at all appropriate call sites, it throttles the volume of all requests, not just those from a particular site, so it reduces general network activity and pressure on the server. However, different kinds of requests may have transient failures for different reasons, right, and I'm not sure we always want request B's timeout to start at, say, 40 seconds, just because request A hasn't succeeded yet. Thinking about the global nature of progressiveTimeout becomes more urgent if we decide to reject the promise, say, when the most recent timeout from any call to progressiveTimeout was 60 seconds (or at some other threshold). For b), it appears that progressiveTimeout's current intention is, when run in a loop, to do an exponential backoff up to and including a delay that exceeds 60s, and then reset to zero and repeat. This resetting is explained at the end of the JSDoc, but there may be an argument for making that note more prominent, as I didn't assume it would reset to zero:
In any case, from reading the code and as confirmed empirically, this resetting never actually happens. The |
Hang on. Even if request A succeeds after several increasing progressiveTimeouts, request B's first timeout will not be zero, right — it'll be longer than A's last one! |
The intention is that it backs off up to a maximum interval (10s), then stays there as long as we're in a continuous retry loop; and then in the future, after that retry loop is over and things might be all working again, starts the backoff sequence over from zero. The 60s thing isn't super principled, but I think it can basically be seen as an attempt to implement "that retry loop finished, one way or another, and this is a new retry loop where we should start by assuming the best."
Yeah, I agree. I think it'd be better for it to track its state for one retry loop at a time, rather than globally. A good way to structure that might be to stick the state in a closure, something like export const makeBackoffMachine = () => {
let lastTimeoutTime = 0;
let lastDuration = 0;
return {
wait(): Promise<void> {
// ... the body of the existing `progressiveTimeout` ...
}
};
}; and then callers would look like const backoffMachine = makeBackoffMachine();
while (true) {
// ...
await backoffMachine.wait();
} Hmm, yeah, and this should probably be a separate issue thread. 😉 |
OK, thanks for the clarification and suggestion! I'm working on an implementation of this; I'll post in a new issue soon. I think I understand the 60s much better now. I believe, with the suggested implementation, that "a new retry loop" in
will be determined much more easily and accurately (i.e., when a new backoff machine has been made), so the 60s heuristic won't actually be needed, and it'll all be clearer and more deterministic without it. Opened as #3840. |
Replace progressiveTimeout with makeBackoffMachine, which is called above a loop, and .wait() can be called inside the loop on the returned backoffMachine. Give access to numSleepsCompleted and timeElapsedSoFar so "give up" logic (e.g., on too many network requests with transient failures) can be handled near .wait() call sites. Prepare for work on zulip#3829 (a permanent timeout on sending outbox messages). Previously, the state logic in progressiveTimeout was global, which meant a potentially desirable general throttle on all network requests that used it, but the drawbacks were greater. A particular call to progressiveTimeout was nondeterministic, because other calls may have been made recently, affecting the resulting delay duration. A 60-second threshold was used as a heuristic to distinguish request retry loops from each other, but this is much more effectively done with the new interface.
Fixes: 3840. Replace progressiveTimeout with makeBackoffMachine, which is called above a loop, and .wait() can be called inside the loop on the returned backoffMachine. Give access to numSleepsCompleted and timeElapsedSoFar so "give up" logic (e.g., on too many network requests with transient failures) can be handled near .wait() call sites. Prepare for work on zulip#3829 (a permanent timeout on sending outbox messages). Previously, the state logic in progressiveTimeout was global, which meant a potentially desirable general throttle on all network requests that used it, but the drawbacks were greater. A particular call to progressiveTimeout was nondeterministic, because other calls may have been made recently, affecting the resulting delay duration. A 60-second threshold was used as a heuristic to distinguish request retry loops from each other, but this is much more effectively done with the new interface.
Fixes: 3840. Replace progressiveTimeout with makeBackoffMachine, which is called above a loop, and .wait() can be called inside the loop on the returned backoffMachine. Give access to numSleepsCompleted and timeElapsedSoFar so "give up" logic (e.g., on too many network requests with transient failures) can be handled near .wait() call sites. Prepare for work on zulip#3829 (a permanent timeout on sending outbox messages). Previously, the state logic in progressiveTimeout was global, which meant a potentially desirable general throttle on all network requests that used it, but the drawbacks were greater. A particular call to progressiveTimeout was nondeterministic, because other calls may have been made recently, affecting the resulting delay duration. A 60-second threshold was used as a heuristic to distinguish request retry loops from each other, but this is much more effectively done with the new interface.
Fixes: zulip#3840. Replace progressiveTimeout with makeBackoffMachine, which is called above a loop, and .wait() can be called inside the loop on the returned backoffMachine. Give access to numSleepsCompleted and timeElapsedSoFar so "give up" logic (e.g., on too many network requests with transient failures) can be handled near .wait() call sites. Prepare for work on zulip#3829 (a permanent timeout on sending outbox messages). Previously, the state logic in progressiveTimeout was global, which meant a potentially desirable general throttle on all network requests that used it, but the drawbacks were greater. A particular call to progressiveTimeout was nondeterministic, because other calls may have been made recently, affecting the resulting delay duration. A 60-second threshold was used as a heuristic to distinguish request retry loops from each other, but this is much more effectively done with the new interface.
Improve state logic for sleep durations in API request retry loops. progressiveTimeout used a global state, which had the benefit of enabling a general throttle affecting all request retry loops that used it. But different requests may have transient failures for different reasons, so per-request state handling makes more sense, and high request traffic is still mitigated by exponential backoff. Also, previously, any call to progressiveTimeout was nondeterministic, because other calls may have been done recently, affecting the delay duration. A 60-second threshold was used as a heuristic to distinguish request retry loops from each other, but this is more effectively done by managing the state per-loop in the first place, and the next sleep duration becomes a pure function of the number of sleeps completed. Preparation for zulip#3829.
Improve state logic for sleep durations in API request retry loops. progressiveTimeout used a global state, which had the benefit of enabling a general throttle affecting all request retry loops that used it. But different requests may have transient failures for different reasons, so per-request state handling makes more sense, and high request traffic is still mitigated by exponential backoff. Also, previously, any call to progressiveTimeout was nondeterministic, because other calls may have been done recently, affecting the delay duration. A 60-second threshold was used as a heuristic to distinguish request retry loops from each other, but this is more effectively done by managing the state per-loop in the first place, and the next sleep duration becomes a pure function of the number of sleeps completed. Preparation for zulip#3829.
Improve state logic for sleep durations in API request retry loops. progressiveTimeout used a global state, which had the benefit of enabling a general throttle affecting all request retry loops that used it. But different requests may have transient failures for different reasons, so per-request state handling makes more sense, and high request traffic is still mitigated by exponential backoff. Also, previously, any call to progressiveTimeout was nondeterministic, because other calls may have been done recently, affecting the delay duration. A 60-second threshold was used as a heuristic to distinguish request retry loops from each other, but this is more effectively done by managing the state per-loop in the first place, and the next sleep duration becomes a pure function of the number of sleeps completed. Preparation for zulip#3829.
Improve state logic for sleep durations in API request retry loops. progressiveTimeout uses a global state, with the effect that one network request's retry attempts affect the sleep durations used for any other request. This has the benefit of enabling a general throttle on all request retry loops that used progressiveTimeout. But different requests may have transient failures for different reasons, so it's not obvious that we want this general throttle. And hard-to-trace bugs can creep in when the behavior of progressiveTimeout can't be determined from a particular call site. Also, progressiveTimeout uses a 60-second threshold as a heuristic to distinguish request retry loops from each other, with the simplifying assumption that different types of requests will not occur within 60 seconds of each other. This distinction is more effectively done by managing the state per-loop in the first place, and doing so eliminates more of those hard-to-trace bugs mentioned in the previous paragraph. So, introduce the ability to handle state locally. Capped exponential backoff still mitigates high request traffic, but per-request. Preparation for zulip#3829.
Improve state logic for sleep durations in API request retry loops. progressiveTimeout uses a global state, with the effect that one network request's retry attempts affect the sleep durations used for any other request. This has the benefit of enabling a general throttle on all request retry loops that used progressiveTimeout. But different requests may have transient failures for different reasons, so it's not obvious that we want this general throttle. And hard-to-trace bugs can creep in when the behavior of progressiveTimeout can't be determined from a particular call site. Also, progressiveTimeout uses a 60-second threshold as a heuristic to distinguish request retry loops from each other, with the simplifying assumption that different types of requests will not occur within 60 seconds of each other. This distinction is more effectively done by managing the state per-loop in the first place, and doing so eliminates more of those hard-to-trace bugs mentioned in the previous paragraph. So, introduce the ability to handle state locally. Capped exponential backoff still mitigates high request traffic, but per-request. Preparation for zulip#3829.
Improve state logic for sleep durations in API request retry loops. progressiveTimeout uses a global state, with the effect that one network request's retry attempts affect the sleep durations used for any other request. This has the benefit of enabling a general throttle on all request retry loops that used progressiveTimeout. But different requests may have transient failures for different reasons, so it's not obvious that we want this general throttle. And hard-to-trace bugs can creep in when the behavior of progressiveTimeout can't be determined from a particular call site. Also, progressiveTimeout uses a 60-second threshold as a heuristic to distinguish request retry loops from each other, with the simplifying assumption that different types of requests will not occur within 60 seconds of each other. This distinction is more effectively done by managing the state per-loop in the first place, and doing so eliminates more of those hard-to-trace bugs mentioned in the previous paragraph. So, introduce the ability to handle state locally. Capped exponential backoff still mitigates high request traffic, but per-request. Preparation for zulip#3829, to be completed in this series of commits.
Improve state logic for sleep durations in API request retry loops. progressiveTimeout uses a global state, with the effect that one network request's retry attempts affect the sleep durations used for any other request. This has the benefit of enabling a general throttle on all request retry loops that used progressiveTimeout. But different requests may have transient failures for different reasons, so it's not obvious that we want this general throttle. And hard-to-trace bugs can creep in when the behavior of progressiveTimeout can't be determined from a particular call site. Also, progressiveTimeout uses a 60-second threshold as a heuristic to distinguish request retry loops from each other, with the simplifying assumption that different types of requests will not occur within 60 seconds of each other. This distinction is more effectively done by managing the state per-loop in the first place, and doing so eliminates more of those hard-to-trace bugs mentioned in the previous paragraph. So, introduce the ability to handle state locally. Capped exponential backoff still mitigates high request traffic, but per-request. Preparation for zulip#3829, to be completed in this series of commits.
Improve state logic for sleep durations in API request retry loops. progressiveTimeout uses a global state, with the effect that one network request's retry attempts affect the sleep durations used for any other request. This has the benefit of enabling a general throttle on all request retry loops that used progressiveTimeout. But different requests may have transient failures for different reasons, so it's not obvious that we want this general throttle. And hard-to-trace bugs can creep in when the behavior of progressiveTimeout can't be determined from a particular call site. Also, progressiveTimeout uses a 60-second threshold as a heuristic to distinguish request retry loops from each other, with the simplifying assumption that different types of requests will not occur within 60 seconds of each other. This distinction is more effectively done by managing the state per-loop in the first place, and doing so eliminates more of those hard-to-trace bugs mentioned in the previous paragraph. So, introduce the ability to handle state locally. Capped exponential backoff still mitigates high request traffic, but per-request. Preparation for #3829, to be completed in this series of commits.
#3841 has been merged, clearing the way for work on this issue. Anyone interested in claiming this? |
Let's say me. (I'm already working on #3881, which has a fair range of intersection, and I think if someone else claimed it we'd be stepping on each other's toes a lot.) |
Currently, when we try to send a message and can't, we keep the message around (as an
Outbox
message) and keep retrying indefinitely until it succeeds.(Specifically, we do so in a loop with backoff, then at each subsequent message send and at each "initial fetch", so in particular at each launch of the app. See
sendOutbox
and its call sites.)This isn't great, because most messages are eventually better to send never than send late. In particular a short conversational reply like "sure" or "no, sorry" may have a completely different meaning if the discussion has moved on.
Instead, when the oldest outbox message reaches a certain age (perhaps 60 seconds? 120 seconds?) and we've been unable to send, we should stop trying to send it; give the user feedback that it failed; and give an option to retry the message or delete it. This is similar to the experience in an SMS client, or in SMS-style chat apps like FB Messenger.
The feedback is important (so that users can be confident that messages are delivered whenever they don't see the feedback), and a bit tricky UI-wise:
The text was updated successfully, but these errors were encountered: