Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Allow keeping service worker alive #1558

Closed
AshleyScirra opened this issue Dec 17, 2020 · 10 comments
Closed

[Feature request] Allow keeping service worker alive #1558

AshleyScirra opened this issue Dec 17, 2020 · 10 comments

Comments

@AshleyScirra
Copy link

We have a use case for Service Workers that sounds simple but turns out to be surprisingly complicated: in essence we have a map of URL to Blob that we want the Service Worker to serve in fetch events. (As it happens I first mentioned this in #1556.) This is to allow various features of local preview in our browser-based game development software Construct 3.

My first (and it turns out, naïve) approach was to have a client post the map of URL to Blob to the Service Worker. The SW stores this in a global variable, and then does lookups in the fetch event and serves the corresponding blob if there's a match (else falls back to network etc). But it turns out the browser is allowed to terminate the SW after a period of inactivity - in which case the map in the global variable is wiped, and fetches to those URLs subsequently fail (going to network and returning 404). Oops! It turns out in Chrome this is 30 seconds, and we only realised after we shipped all this and a user eventually figured out that their network requests started failing after 30 seconds, and filed a bug with us (Scirra/Construct-bugs#4422).

The problem is, how else can we implement this?

Idea 1: save the map to storage. We could write the map to IndexedDB. This won't work though, for several reasons:

  • We could be serving a large amount of local content, corresponding to an entire game. The user may not have enough storage quota to save the whole map.
  • In some browsers with private browsing, or certain privacy settings, storage APIs throw on any attempt to use them and so cannot be used at all in these modes.
  • An error could occur while writing to storage. We know through our work with the editor that storage isn't reliable at scale - we see about 50 storage failures a day in our telemetry, with a mix of AbortError, DataError, UnknownError, QuotaExceededError, TimeoutError...
  • Writing all this data to storage could be slow, degrading the user experience.

Basically we can't rely on storage, so let's eliminate that. The map will have to stay in memory.

Idea 2: store the map on the client. The client can keep its own map of URL to blob. Then in the SW fetch event, it can post to the client asking it if it has something to serve for that URL, and if it does the client can post back the blob to serve. (This seems weirdly circuitous, but it keeps the map on a client which will be kept alive.)

The problem here is, how do you know the client has loaded enough to answer a message? If it hasn't loaded enough to attach an event listener, the SW will simply not get a response. I guess it could use a timeout, but this is a performance critical network path. What if it needs to make several requests to load the client before it can start listening for messages? All those network requests will be delayed by the timeout. And to improve the guarantee that the client can respond in time, even if it's busy, you really want as long a timeout as possible. There doesn't seem to be a good trade-off here.

Idea 3: keep the service worker alive. As long as the service worker stays alive, its map will be kept in memory. It turns out we can make the client keep the SW alive with a little postMessage dance: the client posts "keepalive" to the SW; the SW calls waitUntil() on the message event with a few seconds of timeout; and then posts back to the client, which then immediately sends back a "keepalive" message... This ensures the SW is basically permanently waiting on a waitUntil() promise. As soon as the client is closed, it stops sending "keepalive" messages, and so the SW will fall back to idle and so can be terminated (i.e. this doesn't keep the SW alive forever, only the duration of the client).

My concern with this is firstly it feels like a hack, and secondly it appears the ability to keep a SW alive seems to have been considered a bug in some cases and steps taken to mitigate it.

So my suggestion is: why not add a way in the spec to indicate that the SW ought to be kept alive? It can be limited to a single client, or all active clients, to avoid keeping it alive permanently after its clients are closed. Then we have a clean way to get this behavior without having to resort to what feel like postMessage hacks, and we can rest assured browser makers won't regard this as some kind of bug and add mitigations, which will subsequently break this case again. It also already seems to be possible to keep a SW alive, so long as it has work to do (e.g. regular fetches that go through the SW fetch event), so in that sense this wouldn't actually be adding a new capability, only making it more convenient to do so.

It looks like there are other cases where it looks like keeping the SW alive would be useful for streaming: #882, #1182

Or maybe there is some other approach that could cover this? Something like import maps but for all fetches would solve our use case too, but those currently only cover JS Modules (and are still in development). But it looks like there are several use cases for keeping a SW alive.

@wanderview
Copy link
Member

I think storing the map to disk is the right solution. The Cache API may be better than IDB, depending on how you are using the blobs.

Responding to a couple of concerns about storage:

We could be serving a large amount of local content, corresponding to an entire game. The user may not have enough storage quota to save the whole map.

Do you expect devices to have more RAM than storage often? Or is it that some browsers set very restrictive quotas?

In addition, blobs in many browsers end up getting stored on disk anyway in temporary files. Moving these to a storage API would not result in a net increase in disk usage. It may shift around in quota accounting, though.

In some browsers with private browsing, or certain privacy settings, storage APIs throw on any attempt to use them and so cannot be used at all in these modes.

I don't think browsers that throw on storage in private browsing allow service worker usage in private browsing either. This is because creating a service worker registration is basically a storage operation itself. Do you have an example of this?

Writing all this data to storage could be slow, degrading the user experience.

Conversely trying to keep a large set of data in RAM may also degrade the experience by creating memory pressure, etc. It kind of depends. You could balance between the two by having an in-memory cache of responses in the SW if you want to. It would help until the SW is killed and would naturally repopulate when its woken up and requests start flowing again.

Some other thoughts:

  • If your client makes frequent network requests then the SW will be kept alive longer naturally.
  • You can also simulate keeping the SW alive from your client by sending it periodic postMessage() ping messages; e.g. once every couple minutes, etc.

@AshleyScirra
Copy link
Author

AshleyScirra commented Dec 17, 2020

I don't think browsers that throw on storage in private browsing allow service worker usage in private browsing either.

Ah, I guess you're right about this. I was mainly thinking of Firefox private browsing, which indeed doesn't allow service workers at all. (Which itself rules out this use case... transient storage like in Chrome is much easier to develop with.)

Do you expect devices to have more RAM than storage often?

The main reason is restrictive quotas. It's hard to come by authoritative data on the limits, but this MDN documentation says Firefox only guarantees a minimum of 10mb, which is far too small. Maybe the user has filled up all their local storage with photos and videos. The browser could also set the storage quota far below the RAM size, e.g. private browsing mode could arbitrarily set the quota to something like 100mb, perhaps to avoid abuse, mitigate fingerprinting, etc. These details also tend to change over time as well. I'd also add that cleaning up storage so you don't end up leaking quota is trickier as well, especially if there is a chance of ungraceful exits (e.g. tab crashes, OOM), whereas RAM is always freed in these cases.

IIRC Blobs don't have to be written to disk - the browser can opt to keep them in memory (and presumably will if there's no storage space available?)

As noted we also can't guarantee that writing to disk will actually succeed even if there is quota available, since storage errors are surprisingly common. Normally in a SW this isn't too big a deal since in the case of storage failure or cache miss you can fall back to network. However in our case there is no fallback, so in the event of a storage error, subsequent fetches fail.

Conversely trying to keep a large set of data in RAM may also degrade the experience by creating memory pressure, etc.

I don't think keeping data in RAM is as problematic as writing it to storage. Storing 200mb data in memory should be no problem for the vast majority of devices. Writing 200mb to storage could take several seconds on some devices. I think it's more likely that writing to storage will pose a significant negative impact to performance.

@wanderview
Copy link
Member

wanderview commented Dec 17, 2020

The main reason is restrictive quotas. It's hard to come by authoritative data on the limits, but this MDN documentation says Firefox only guarantees a minimum of 10mb, which is far too small.

I don't think that is up-to-date. We've attempted to document the current limits in:

https://web.dev/storage-for-the-web/#how-much

Edit: Or maybe they are in agreement. I didn't see that you were referencing the minimum. All platforms, though, must deal with devices that are out of storage. cc @asutherland for firefox quota limitations.

@AshleyScirra
Copy link
Author

AshleyScirra commented Dec 17, 2020

The problem is how much space are you guaranteed to have? You can rely on using a reasonable amount of RAM (subject to OOM in extreme circumstances). However you can't rely on having much storage. No matter the quota scheme, if the user has 10mb of storage space left, you can't write more than that to storage.

Edit: I'd add that in our case, we already have the Blobs loaded and ready to go, so I think that means writing them to storage strictly increases the resource demands necessary to serve them.

@mkruisselbrink
Copy link
Collaborator

I agree that storing in cache storage does seem like the way to go here.

We could be serving a large amount of local content, corresponding to an entire game. The user may not have enough storage quota to save the whole map.

For what it's worth, in Chrome we are considering counting temporary blobs against an origins storage quota as well. But at least compared to other browsers Chrome is probably more generous with storage quota anyway, so that shouldn't be a problem (I think storage quota for a single origin is typically larger than the total quota for blobs we allow today).

Lacking a way of moving a blob to cache storage or IndexedDB writing it to storage will temporarily increase the total quota needed. It might be theoretically possible to support such a move operation, but that has the risk of severely limiting what optimizations etc implementations can do to make blobs behave sanely in general (i.e. blob slices sharing storage, etc). Tthe on-disk formats of temporary blobs and blobs in storage APIs being different would also still mean that even if we had such a hypothetical move operation it would probably still need to be implemented by copying and then deleting the original.

@AshleyScirra
Copy link
Author

AshleyScirra commented Dec 18, 2020

It seems to me there's an irony here: the browser understandably wants to terminate the SW where possible to save resources, but in this case dealing with that requires using a lot of resources, writing potentially large amounts of data to storage. Isn't keeping the SW alive the less resource intensive option?

@asutherland
Copy link

asutherland commented Dec 18, 2020

I've commented in the initial issue that seems to be motivating the implementation reasons that lead to this request.

If your app needs a large amount of durable storage and the user understands that, please use navigator.storage.persist and this will exempt you from storage limitations on Firefox to the maximum extent possible. Across supported browsers it will help avoid incidental eviction of the data as well.

I hear you on the telemetry about storage stability and reliability. Our number one focus on the Firefox Workers & Storage team is improving the reliability of storage and we've been refactoring much of the code-base to improve error handling and to generate telemetry data which can let us understand where things are failing in the field and address those problems. This is something where improvements are actively happening and have been happening, so I don't think it makes sense to give up on storage. Additionally, as @wanderview notes, Service Worker scripts themselves are stored in Cache API storage in Firefox, so you're pretty much already guaranteed to have working storage if you have a ServiceWorker in Firefox. (And if you have a broken IDB database, if you delete it and then re-open it, the next one is guaranteed to be unbroken unless there are hardware problems.)

I'm in broad agreement that the right course of action is to explicitly store the data in storage. Firefox is explicitly pursuing tab unloading like I believe some other browsers already do, and it is likely far better in the tab unloading accounting for your app's resources to be charged against disk quota rather than memory usage.

Also, I should note that the HTTP cache in Firefox I think tends to not bother storing large resources, so if you're using Blobs for everything and induce an OOM, the user will likely have to re-download everything again when they reopen the tab, which may lead to another OOM, etc. These kinds of crashes are usually not actionable when we look at crash reports, whereas we can and do respond to bugs reported against Firefox's storage implementations, etc.

@jakearchibald
Copy link
Contributor

It seems to me there's an irony here: the browser understandably wants to terminate the SW where possible to save resources, but in this case dealing with that requires using a lot of resources, writing potentially large amounts of data to storage. Isn't keeping the SW alive the less resource intensive option?

RAM is also a resource, and as @wanderview said, users tend to have less of it than disk storage.

Keeping the service worker open will cost CPU and ram. Using the cache API uses disk resources, but those are cheaper in the longer term.

If you're wanting some kind of temporary storage, you might be interested in whatwg/storage#71.

@dberardo-com
Copy link

is it possible to make sure that all fetch events go through the service worker that is listening to them? i have seen some fetch requests being ignored by the service worker while it was still "activating" ... i.e. when the user visited the page after long time of inactivitiy ... can this be prevented ? i.e. can all fetch request wait that the service worker wakes up first, then fire?

@James-E-A
Copy link

i have seen some fetch requests being ignored by the service worker while it was still "activating" ... can this be prevented ? i.e. can all fetch request wait that the service worker wakes up first, then fire?

Short answer: No. If you need to make sure fetch requests get intercepted by a Service Worker, then wait for that service worker to be activated before sending those fetch requests.

Longer answer: Yes, I think this might be possible with a hacky work-around. You could have your Service Worker identify as "activated" synchronously/immediately, while deferring the real semantic activation and serving all events event.respondWith(actualActivationPromise.then(() => makeResponse(event)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants