Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Required and optional payloads for precaching #155

Closed
jeffposnick opened this issue Jan 17, 2017 · 14 comments
Closed

Required and optional payloads for precaching #155

jeffposnick opened this issue Jan 17, 2017 · 14 comments

Comments

@jeffposnick
Copy link
Contributor

Something that we've thought about in the past (see GoogleChromeLabs/sw-precache#145) is support for optional precaching payloads that can be toggling on or off based on some criteria like the presence of the Save-Data: on header (though that's not exposed on the install handler at the moment).

It might be a good idea to brainstorm about what the interface could look like in the context of sw-precaching, apart from the question of how to signal to the service worker which payloads to download.

A straightforward approach would be to take in two different glob patterns, one for the required and then the other for an optional payload. By default only the required ones get cached in the install handler.

If we follow the pattern suggested at #136 (comment), where the PrecacheManager is responsible for creating a Route with a CacheFirst strategy for requests it knows about, I don't think we'd have to do much extra to handle requests that fall into the "optional" bucket. The CacheFirst behavior will fall back to the network for optional URLs that aren't already cached, using the response to populate the cache for next time.

We would have to add in some extra logic to handle manifest updates, to check whether an optional URL is already cached, and ignore manifest hash updates for optional URLs when it is not already cached.

CC: @alxhub @gauntface @addyosmani

@gauntface
Copy link

There is nothing to stop a developer implementing this themselves - just alter the caching array based on some criteria.

let cacheArray = [
  {url: '/index.html', revision: '1234'}
];

if (someExternalSaveDataValue) {
    cacheArray = cacheArray.concat([
      '/styles/optional-css.1234.css'
    ]);
}

precacheManager.cacheRevisionedAssets(cacheArray);

I suppose the loss here is that on a manifest change and optional parameter would be deleted if it was previously cached and in a data saver install it won't make the cache list (You'd want the cache revision to be checked and the optional cache entry deleted if the revision is now out of date).

When you say two glob patterns, I'm not sure what you mean, a glob pattern that gets compared to the URL's? Feels a bit error prone (i.e. I know I'd mess up the regex for the value some where).

I'd probably go down a more declarative approach:

  1. Two methods, one for required assets to cache and one for optional assets.
  2. Add an optional field to the precache manager (false by default) and then have some method that tell's precache to be "save data mode" which could be passed to the constructor.

We could also do the same for the unrevisioned assets (i.e. only cache specific assets and ignore the others.

@jeffposnick
Copy link
Contributor Author

Sorry, I was conflating sw-lib and sw-precaching. I was talking about sw-lib when I brought up glob patterns. sw-lib could potentially take in a glob pattern parameter for an "optional" set of files.

And then what you describe sw-precaching doing, with the declarative approach that read in the output of sw-lib, sounds right on.

Beyond that, I don't think there's that much to change, apart from that extra logic when the manifest is updated:

  1. If an updated hash corresponds to a URL in the "required" payload, then always fetch the update.
  2. If an updated hash corresponds to a URL in the "optional" payload, then only fetch the update if it's already in the cache.

@addyosmani addyosmani modified the milestone: Service Worker Framework (beta) Feb 17, 2017
@gauntface gauntface removed this from the Service Worker Framework (beta) milestone Feb 28, 2017
@gauntface
Copy link

I'm going to close this in favor of seeing if anyone requests this.

At the moment developers have the following options:

  1. Generate two manifests with workbox-build, one set of manifests can be required, one manifest for optional. Then inject manually into sw or two seperate imports, which would need revisioning to ensure they cache bust.
  2. At run time in SW remove optional fields from full list of assets

@jamesshannon
Copy link

Just wanted to say that I've been thinking about this for my "prefetchcache top 10 articles" idea, which is a use case largely for news publishers.

It would dovetail into this nicely, where those 10 articles would be marked as "optional", and then I wouldn't need to develop heuristics around Save-Data, CT, and ECT.

@jeffposnick jeffposnick changed the title required and optional payloads for sw-precaching Required and optional payloads for precaching Feb 8, 2018
@jeffposnick
Copy link
Contributor Author

I'm going to re-open this issue for consideration in a future release, and move over @merlinnot's comment from #1283 (comment) to consolidate:

I have an app which loads some data through web channels. I’d like to be able to load an application shell and all relevant data before service worker starts downloading any assets, but I’d like the service worker to cache those files which are a part of the shell/current view.

My desired scenario goes as follows:

  1. User requests the page
  2. index.html is downloaded alongside with service worker (HTTP2 push)
  3. Service worker is installed and activated.
  4. Appliaction shell is downloaded and precached by a service worker.
  5. Relevant data is downloaded via web channels/web sockets/whatever
  6. Some additional modules which should also be cached and are not critical are downloaded
  7. Service worker precaching is programmatically activated so it downloads any other resources in the background.

Hope you will find my use case relevant :)

@gauntface
Copy link

I'm still confused as to what the request is here.

All of the above sounds like its logic specific to an app and it's separate to precaching (i.e. precache a small set and cache other stuff seperately at another time). So what is Workbox doing / not-doing that is required here?

@merlinnot
Copy link

Hmm, maybe I've made a little typo in point 4: resources should be cached, not _pre_cached.

Basically that would be nice to enable precaching programatically so it doesn't use network resources until I want it to. Currently there's no way to use it in the way I've presented which would be the most efficient.

@gauntface
Copy link

My hunch is that you might be able to do this with the lower level PrecacheController this does a lot of the same behavior as the default workbox.precaching.precacheAndRoute([....]), the main difference is that you have to call the specific methods to cause precaching and cleanup.

@djeeg
Copy link

djeeg commented Mar 17, 2018

you might be able to do this with the lower level PrecacheController

I think my default webpack precache has become too large (install is starting to trigger burst throttling on nginx)

So I thought to split up precaching into two smaller groups of files

const files = self.__precacheManifest;
const installfiles = files.filter(i => i.url.indexOf("chunk-") == -1);
const delayedfiles = files.filter(i => i.url.indexOf("chunk-") != -1);

I thought it would be cool to use the same underlying cache for both groups
Unfortuately it doesnt looks like I can use the same cacheName for each file group (as workbox cleans up files that arent registered)
Gives me this message:

During precaching cleanup, 13 cached requests were deleted and 13 entries were deleted from IndexedDB

Fine, I have to create a custom PrecacheController
However I really would like to be able to leverage the same addRoute/fetch logic that workbox.precaching uses

If I tweak workbox-precaching slightly

//---------add method arg here
const _getPrecachedUrl = (url, {
  ignoreUrlParametersMatching = [/^utm_/],
  directoryIndex = 'index.html',
  cleanUrls = true,
  urlManipulation = null
} = {}, precacheController) => {
...
//---------add method arg here
moduleExports.addRoute = (options, arg_cacheName, arg_precacheController) => {
  if (fetchListenersAdded) {
	//---------disable return
    //return;
  }
...
  self.addEventListener('fetch', event => {
	//---------use arg or default here
    const precachedUrl = _getPrecachedUrl(event.request.url, options, arg_precacheController || precacheController);
...
	//---------use arg or default here
    let responsePromise = caches.open(arg_cacheName || cacheName).then(cache => {

It seems to be possible

workbox.precaching.precache(installfiles);
workbox.precaching.addRoute({});

var delayed_precachename = workbox.core.cacheNames.precache + "-delayed";
const delayed_precacheController = new workbox.precaching.PrecacheController(delayed_precachename);
delayed_precacheController.addToCacheList(delayedfiles);
self.addEventListener('install', (event) => {
    event.waitUntil(Promise.resolve(0));
});
self.addEventListener('activate', (event) => {
    setTimeout(function() {
        delayed_precacheController.install({
            plugins: precacheplugins
        }).then(function() {
            delayed_precacheController.activate()
        });
    }, 15000);
    event.waitUntil(Promise.resolve(0));
});
workbox.precaching.addRoute({}, delayed_precachename, delayed_precacheController);

The first group of files is cached immediately
15 seconds later the second group is cached
And files are loaded from each cache as normal.

Is this barking up the wroing tree though?

@brianchirls
Copy link

I posted a related use case on stack overflow, and @jeffposnick suggested I move the discussion here, so I'll paste below:

There are cases where an application might need a relatively large resource as a strict requirement, but only in certain cases that are not easily detectable from a service worker. For example:

  • Safari has a prefixed and non-conforming implementation of the Web Audio API. I found a great shim, but it's over 300kb. It's critical for the web app to function in Safari but unnecessary in other browsers.
  • Some media are available in multiple formats that may not always be supported. Video is typically too large to precache and has problems with range requests, but it could apply to WebP images or short audio files (e.g. Opus vs. AAC). If you include all formats in the precache manifest, by default it will download all of them.

One approach would be to manually exclude certain files from the precache manifest and then to conditionally load those files from the scripts on the main thread to be stored in the runtime cache. But then those files are not precached - they're only loaded after the new version activates, by which point you may no longer be online.

Is there a solution that allows the following?:

  • Have the service worker send a message to the main thread with the URL of a "test" script that checks the various conditions.
  • Load and run that script on the main thread and send the service worker the list of required conditional assets
  • Add those assets to the precache manifest to be diff'ed against the previous version and downloaded as necessary
  • The service worker should not switch over to the new version until all precached assets are loaded, including the conditional ones.

@johanarnor
Copy link

We've got a use case related to navigateFallback that I think would be solve by having optional payloads for precaching. So we're using (or trying to use) GenerateSW to generate a service worker for our React single-page app. We've configured navigateFallback to point to our app-shell index.html, so the app can be refreshed when on a sub-route.

However, we also have a lot of static files that are not part of the single-page app. E.g. data policy and terms & conditions as static .html, robots.txt, images, etc etc. And with navigateFallback configured as above, they all result in a "404"/not found in the single-page app.

We could use navigateFallbackDenylist/navigateFallbackAllowlist, but since we've have a lot of routes in the single-page app as well as a lot of static files and no clear structure between them, it becomes a suboptimal approach.

I imagine that if optional precaching was implemented, I could just let all those static files be optional. We use copy-webpack-plugin, so all the routes would be visible in the build step. And since they are now part of the precache, I suppose they will not be redirected to our app-shell.

In the meanwhile, is the only solution to try to maintain a navigateFallbackDenylist/navigateFallbackAllowlist or could I switch from GenerateSW to a more low-level API that would solve our problem? I'm quite new with service workers, so any pointers would be much appreciated!

@jeffposnick
Copy link
Contributor Author

Hello @johanarnor—I'm not sure that optional payloads for precaching are the best approach to solving what you're trying to do.

If you want to use precaching and navigateFallback to implement the App Shell pattern, but the combination of navigateFallbackDenylist + navigateFallbackAllowlist don't give you enough control, then I'd think the best approach would be to add in an additional route, prior to the call to precacheAndRoute(...), that matched all of the URLs you need exempted from navigateFallback, and use the NetworkOnly strategy to handle that route. That would mean switching from using GenerateSW to using InjectManifest, giving you more control over the structure of your service worker file.

Please note that you say that some examples of URLs that you need exempted from navigateFallback include images and robots.txt. I would not expect (generally speaking) that your average user needs to navigate directly to an image's URL, or to your robots.txt URL. The navigateFallback logic only applies to in-scope navigation requests, not to subresource requests (like an <img> tag on a page) or to requests made by a web crawler for your robots.txt file.

@johanarnor
Copy link

Many thanks @jeffposnick for your detailed reply. Much appreciated! You're right that optional payloads probably wouldn't solve my issue. It was kind of build around an assumption that all webpack assets that weren't precached could be optionally precached instead.

So I switched to InjectManifest instead and was able to achieve the desired outcome simply by splitting the webpack assets into different "buckets" and dividing them between precache and runtime cache with different caching strategires.

You're also right about users not generally navigating to files like robots.txt, but I still felt that removing this ability would probably cause confusion in the dev team.

@tomayac
Copy link
Member

tomayac commented Apr 25, 2024

Hi there,

Workbox is moving to a new engineering team within Google. As part of this move, we're declaring a partial bug bankruptcy to allow the new team to start fresh. We realize this isn't optimal, but realistically, this is the only way we see it working. For transparency, here're the criteria we applied:

Thanks, and we hope for your understanding!
The Workbox team

@tomayac tomayac closed this as completed Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants