-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More clarity around waiting for an active worker to finish #916
Comments
We have emulated chrome behavior here and don't wait currently. :-) Is this a real problem in practice? It sounds like a lot of add complexity for what is probably fairly rare problem. |
It looks like if someone is holding on to the ServiceWorker object in js we will mark it redundant, but not immediately kill it. |
By "soon-to-be activating" you mean while |
No, this is referring to service workers in the |
I think we're seeing this issue on Inbox and it's causing broken page loads. The page starts loading, there's sometimes one second of network quiescence that triggers our update on navigation, then skipWaiting() results in termination of the worker with unfinished fetch events. I see Firefox uses DOMContentLoaded rather than the network quiescence heuristic to trigger update, maybe that would be safer. |
It seems like waiting for events to clear is essentially creating a new state. Can we instead just restructure things such that:
This lets us use the existing wait mechanism without inventing a new one. Only real downside is we don't fire new events on soon-to-be-active worker, but I don't think that should really break anything. It's just like getting update slightly later. Thoughts? |
I think I like that. You're right, the pseudo-state was the annoying part to implement around. The one worry implementation-wise is to ensure closing the last document doesn't end up with stalled fetch events: they should either terminate or finish normally, so that activation can proceed in a timely manner. Also, a constant stream of events could delay activation indefinitely, even with skipWaiting (e.g., lots of XHR or postMessage), but we could have a fixed timeout to escape that. @jakearchibald @jungkees WDYT? |
Some ideas: Implementations could dynamically lower their "kill a long waitUntil" timer if skipWaiting has been called. This would help deal with stalled fetches. We could also cancel the fetch group associated with the service worker going redundant. I think this is weakly spec'd right now, but it's basically the same mechanism that cancels network when a pages are closed. Aborting the SW's fetch group should ensure all outstanding fetches resolve quickly. Of course, we may want both since other code can hold a waitUntil open. |
Yep, I think the implementations can handle avoiding stalled fetches. What do spec people think about wanderview's idea:
This sounds reasonable to me and I'm itching to fix chromium bug 616331. |
Yeah, I like @wanderview's idea. Wait until all waitUntils & respondWiths in the incumbent worker are done. We can drop our grace period in this case too, to reduce the chance of overlapping events keeping things open. If we see problems with this we can look at queueing events made after skipWaiting so they run in the new worker when it activates. |
OK here's roughly what I plan to start implementing to fix the Chrome bug:
|
Our gecko bug for this is here: https://bugzilla.mozilla.org/show_bug.cgi?id=1170543 Incidentally, I think we should count "in-flight event" from the moment the task is queued to perform the dispatch. So we should maintain a counter of events on the registration or something. The counter is then decremented when the waitUntil/respondWith promises settle or immediately after the event if neither has been called. |
Yes I think that's what Chromium will be doing. |
If the browser's need for a request is dropped (page closes or user hits X on a navigation), then the |
Yea, I agree we should handle FetchEvent just like other events. Even when the last controlled page closes we will still have to handle push/bg-sync events that are queued up. Seems safest to just flush fetch events naturally. |
What happen with side effects happening inside the |
Yes, seems safest for wait for fetch events to settle normally. Just have to ensure there is no implementation bug preventing it from being settled after the page is closed. |
Are you worried about the window being closed while the FetchEvent is still in the event queue or after the SW has started processing it? Or maybe being closed between start of FetchEvent processing and queueing the event? I was trying to think about how we would exercise the case in a WPT test. |
Both actually. Chrome has a bug where if a request gets killed the corresponding fetch event remains "in-flight" until timeout (I described it more in https://bugs.chromium.org/p/chromium/issues/detail?id=616331). But this is an implementation bug, not sure if a WPT would be suitable. |
We could make a WPT test for the "SW has started processing the FetchEvent while the owning window was closed" case, but probably not for the others. |
Can we add milestone v1 to this? Chrome already implemented it. |
@mattto When do you expect this change to ship in chrome? I'd like to try to get the fix into FF around the same time frame. Thanks! |
It should be in Chrome 53, expected to reach stable channel in early September. |
This patch makes changes as per discussion at w3c/ServiceWorker#916 Namely, * The waiting worker gets promoted to active once there are no controllees and the incumbent worker has no in-flight events (fetch/sync/push/message etc). * skipWaiting bypasses the no controllees requirement, but we still wait until there are no in-flight events. * As usual, incoming fetch/sync/push events always go to the active worker. BUG=616331 Review-Url: https://codereview.chromium.org/2119143002 Cr-Commit-Position: refs/heads/master@{#405628} (cherry picked from commit 6c95b0f) Review URL: https://codereview.chromium.org/2164113003 . Cr-Commit-Position: refs/branch-heads/2785@{#258} Cr-Branched-From: 6862397-refs/heads/master@{#403382}
Pre F2F notes: This seems agreed. Let's make sure that cancelling a request means the related fetch event's |
F2F:
|
FYI, this has been fixed in gecko and should ship with FF 49 around September 13. Also, the WPT test blink wrote is here and will be uplifted to the WPT repo on our next sync: |
I think there are a couple other places we need to check for "idle" state or no extension promises:
https://slightlyoff.github.io/ServiceWorker/spec/service_worker/#dfn-use
We then also need to add a hook for when the active worker goes idle to clear the registration if the uninstalling flag is set and there are no controlled clients. I discovered this because I have a bug from a user showing that we unexpectedly abort a fetch event when they race an |
Another nuance to clarify. If there is a waiting worker and the uninstalling flag is set when the active worker goes idle, what happens first?
I think giving the unregister priority here makes sense since the waiting worker is by definition not actively handling events yet. |
Also, should we delay activation if the previous .active worker is still in the activating state? For example, wait for its activate event to finish processing. It seems like the safest thing to do would be to wait. |
This adds concepts and algorithms to track events (all service worker events: lifecycle events, functional events, extendable message events) that have pending extension promises so they can be used to check if the registration's waiting worker can be promoted. This also changes Install algorithm and related steps to match to the implementations behavior. That is, instead of waiting for the promotion condition in Install algorithm, newly introduced Try Activate algorithm just tries to activate depending on the condition and return. Try Activate is called when: - A service worker is installed. - The last client controlled by the existing active worker is unloaded. - skipWaiting() is called. - The extend lifetime promises for the existing active worker settle. Related issue: #916.
This adds concepts and algorithms to track events (all service worker events: lifecycle events, functional events, extendable message events) that have pending extension promises so they can be used to check if the registration's waiting worker can be promoted. This also changes Install algorithm and related steps to match to the implementations behavior. That is, instead of waiting for the promotion condition in Install algorithm, newly introduced Try Activate algorithm just tries to activate depending on the condition and return. Try Activate is called when: - A service worker is installed. - The last client controlled by the existing active worker is unloaded. - skipWaiting() is called. - The extend lifetime promises for the existing active worker settle. Related issue: #916.
Status update: f1a1120 addressed the resolution in #916 (comment) by introducing Try Activate. There are still two remaining issues from @wanderview's follow-up comments. I plan to do:
|
This is a follow-up change, to [1], which addresses the issues posed in [2] and the comments afterwards. This change makes the callers to the Clear Registration algorithm to check if no client is using the registration and the registration's associated service workers have no pending extended events before invoking the algorithm. Unregister, Handle Service Worker Client Unload, and the microtasks queued by the extension promises' settlement handlers invoke Try Clear Registration. Other error handling steps on registrations that have no associated worker and Handle User Agent Shutdown invoke Clear Registration directly as they do. [1] f1a1120 [2] #916 (comment) Summary of the issues: #916 (comment). Fixes #916.
This is a follow-up change, to [1], which addresses the issues posed in [2] and the comments afterwards. This change makes the callers to the Clear Registration algorithm to check if no client is using the registration and the registration's associated service workers have no pending extended events before invoking the algorithm. Unregister, Handle Service Worker Client Unload, and the microtasks queued by the extension promises' settlement handlers invoke Try Clear Registration. Other error handling steps on registrations that have no associated worker and Handle User Agent Shutdown invoke Clear Registration directly as they do. [1] f1a1120 [2] #916 (comment) Summary of the issues: #916 (comment). Fixes #916.
I'm working on a Chrome bug where we're immediately terminating the incumbent worker during activation instead of waiting for it to finish its "in-progress requests" as the spec says. Since update occurs soon after page load, with skipWaiting() we run into this issue in the wild.
I'm wondering though, what exactly is a "request". Is this referring to fetch event only or is it also intended to mean functional events?
Elsewhere the spec also mentions "in-flight requests" which are "triggered by" a worker, which seems a different thing that "in-progress requests" being handled by a worker.
The spec also doesn't really mention what to do with new incoming requests: do they go to the incumbent worker or the soon-to-be activating worker?
My current plan is to:
How does Firefox deal with this issue?
The text was updated successfully, but these errors were encountered: