Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate requests #2593

Closed
ianldgs opened this issue Aug 2, 2020 · 13 comments
Closed

Deduplicate requests #2593

ianldgs opened this issue Aug 2, 2020 · 13 comments

Comments

@ianldgs
Copy link

ianldgs commented Aug 2, 2020

Library Affected:
workbox-core, workbox-strategies

Browser & Platform:
all browsers

Issue or Feature Request Description:
Support for deduplicating requests.
Or
Support for creating custom strategies
Or
Extend plugin API

Use case
2 API calls to the same endpoint, with the same query, being done at the same time

What I've got so far
I have cloned the repo (V6) and created a custom strategy. However, the structure is not being very helpful for this case. I cannot base my strategy on another strategy, or even use the strategy as secondary and it depends on a lot of internals.*

Idea
I do understand that, apart from the problems, this strategy wouldn't be as useful as the others currently provided and that creating custom strategies isn't something that people do every day. So the most elegant solution would probably be to provide a fetch method on the plugin API. The requestWillFetch doesn't work for this case.

PS: If you like the idea, I can create a pull request. I've got a bit familiar with the project and internals.

@jeffposnick
Copy link
Contributor

Hmm, that's interesting. I'd like to hear more about exactly how you envision this working, and whether caching comes into play at all, or if it's network-only.

Here's what I think you're asking for: a custom Strategy that will detect whether a request is for a URL that is already in the middle of being requested. While there's already an in-flight request, the second request will pause; once there's a response for the first request, that response will be cloned and used to satisfy all other paused requests.

In that model, there's no caches that come into play, and the pausing only happens if there is an actual overlap between two inflight requests; if two requests are made sequentially, without and overlap, then the second one would behave independent of the first.

Does that sound like what you had in mind?

@jeffposnick
Copy link
Contributor

I also wanted to note that, especially with the upcoming support for custom Strategy subclasses in Workbox v6, we're really hoping that developers will address creative use cases like this not with pull requests against Workbox, but by publishing their own Strategy classes and plugins on npm.

An ecosystem of third-party modules that worked with Workbox would be 💯 .

@ianldgs
Copy link
Author

ianldgs commented Aug 4, 2020

Hmm, that's interesting. I'd like to hear more about exactly how you envision this working, and whether caching comes into play at all, or if it's network-only.

Here's what I think you're asking for: a custom Strategy that will detect whether a request is for a URL that is already in the middle of being requested. While there's already an in-flight request, the second request will pause; once there's a response for the first request, that response will be cloned and used to satisfy all other paused requests.

In that model, there's no caches that come into play, and the pausing only happens if there is an actual overlap between two inflight requests; if two requests are made sequentially, without and overlap, then the second one would behave independent of the first.

Does that sound like what you had in mind?

That's exactly what I had in mind at first. Check this early-stage implementation (still need some work) https://github.com/ianldgs/workbox/blob/feat/dedup/packages/workbox-strategies/src/DedupRequests.ts.

But then I realized that if I wanted to add offline support I wouldn't be able to use the NetworkFirst strategy as a fallback, or even as a primary strategy. So the code for offline support would have to live inside of this new strategy, not reusing the other one. And if I wanted any other alternative strategy, I would have to re-implement, copying the logic again from another strategy.

For those reasons, I'm thinking that it might make more sense to have a plugin, so one could do:

workbox.routing.registerRoute(
  ({ request }) => request.url.startsWith("/api"),
  new workbox.strategies.NetworkFirst({
    cacheName: "api-cache",
    networkTimeoutSeconds: 5,
    plugins: [
+      new DedupRequestsPlugin(),
      new workbox.expiration.ExpirationPlugin({
        maxAgeSeconds: 2 * 24 * 60 * 60, // 2 days
      }),
    ],
  })
);

But to have this, I would actually need to be able to override the fetch logic from inside the plugin.

I also wanted to note that, especially with the upcoming support for custom Strategy subclasses in Workbox v6, we're really hoping that developers will address creative use cases like this not with pull requests against Workbox, but by publishing their own Strategy classes and plugins on npm.

An ecosystem of third-party modules that worked with Workbox would be 💯 .

Yep, that would be great! Currently, the only way I was able to create the custom strategy was to clone the entire repository.

@jeffposnick
Copy link
Contributor

CC: @philipwalton for his thoughts, as he did most of the Strategy class design.

Passing in a reference to the StrategyHandler (which is what wraps the Fetch API and the Cache Storage API) to the plugin lifecycle methods seems like it could lead to infinite recursion, as the StrategyHandler is what's responsible for calling the plugin lifecycle methods to begin with. Calling fetch() from within a service worker never triggers the service worker's fetch event handler to prevent the same thing.

Yep, that would be great! Currently, the only way I was able to create the custom strategy was to clone the entire repository.

I'd imagine that the way this should work while testing is for you to set up a peerDepdendency on workbox-strategies in your package.json, and then do something like:

import {Strategy} from 'workbox-strategies';

class DeduplicateStrategy extends Strategy {
  async _handle(request: Request, handler: StrategyHandler): Promise<Response> {
    // Your code here.
  }
}

And then folks using your custom class (or plugin) would be responsible for installing it and workbox-strategies from npm.

@ianldgs
Copy link
Author

ianldgs commented Aug 7, 2020

FYI, I've done a POC on adding fetch to the plugin API: ianldgs@37dd59d

It allows me to write a plugin like the following:

function DedupRequestsPlugin() {
  /**
   * @type {Map<string, Promise<Response | undefined>>}
   */
  const requestsInProgress = new Map();

  return {
    /**
     * @param {RequestInfo} request
     * @param {RequestInit|undefined} options
     */
    async fetch(request, options) {
      const requestKey = `${request.method} ${request.url}`;

      const requestInProgress = requestsInProgress.get(requestKey);
      if (requestInProgress) return requestInProgress;

      try {
        const responsePromise = fetch(request, options);

        requestsInProgress.set(
          requestKey,
          responsePromise.then((r) => r.clone())
        );

        const response = await responsePromise.then((r) => r.clone());
        return response;
      } finally {
        requestsInProgress.delete(requestKey);
      }
    },
  };
}

And to use:

workbox.routing.registerRoute(
  ({ request }) => request.url.startsWith("/api"),
  new workbox.strategies.NetworkFirst({
    cacheName: "api-cache",
    networkTimeoutSeconds: 5,
    plugins: [
+      DedupRequestsPlugin(),
      new workbox.expiration.ExpirationPlugin({
        maxAgeSeconds: 2 * 24 * 60 * 60, // 2 days
      }),
    ],
  })
);

Or actually use it with any strategy I want.

@philipwalton
Copy link
Member

I think you can do this fairly cleanly using a plugin—as long as you're OK with it only deduping requests using strategies that have added that plugin to them.

Since all plugin callbacks are async, you should be able to await pending requests in the handlerWillStart() callback:

{
  async handlerWillStart({request}) {
    // 1. Add the URL to a global request map
    // 2. If there's an existing request being processed,
    // wait for it to finish.
    await noPendingRequestsForURL(request.url)
  }
  async handlerWillRespond({request}) {
    // Remove a URL from the global map once it's ready to respond.
    clearURLFromRequestMap(request.url)    
  }
}

Does that sound like it would work for your use case?

@ianldgs
Copy link
Author

ianldgs commented Aug 10, 2020

I think you can do this fairly cleanly using a plugin—as long as you're OK with it only deduping requests using strategies that have added that plugin to them.

Since all plugin callbacks are async, you should be able to await pending requests in the handlerWillStart() callback:

{
  async handlerWillStart({request}) {
    // 1. Add the URL to a global request map
    // 2. If there's an existing request being processed,
    // wait for it to finish.
    await noPendingRequestsForURL(request.url)
  }
  async handlerWillRespond({request}) {
    // Remove a URL from the global map once it's ready to respond.
    clearURLFromRequestMap(request.url)    
  }
}

Does that sound like it would work for your use case?

I like this solution! Will try it out and if it works, I'll close this.

@ianldgs
Copy link
Author

ianldgs commented Aug 12, 2020

Update:

I've tried it but didn't manage to make it work.
For some reason, the CacheFirst strategy in the second request doesn't find the response on the cache and tries to fetch it from the network again, even though it waited for the first handler to finish - I believe it's because the cache wasn't written in time.
Also, a plugin based on those callbacks wouldn't work and would actually make it worse if used on the NetworkFirst strategy.

Posting the code I used in case you spot something I didn't. I also tried with the requestWillFetch and fetchDidSucceed, but no luck either.

Is there any particular reason you wouldn't accept a PR to allow overriding fetch in a plugin?

function DedupRequestsPlugin() {
  console.log("DedupRequestsPlugin");

  /**
   * Temporary cache for requests being done at the same time.
   *
   * @type {Map<string, { promise: Promise<any>, done: () => any }>}
   */
  const requestsInProgress = new Map();

  return {
    /** @param context {{request: Request}} */
    async handlerWillStart({ request }) {
      console.log("requestWillFetch", request.url);

      const requestInProgress = requestsInProgress.get(request.url);
      if (requestInProgress) {
        console.log("is in progress, waiting");
        await requestInProgress.promise;
        console.log("finished");
        return request;
      }

      console.log("not in progress, acting normally");

      let done;
      requestsInProgress.set(request.url, {
        promise: new Promise((resolve) => {
          done = resolve;
        }),
        done,
      });

      console.log("added to in-progress cache");

      return request;
    },
    /** @param context {{request: Request, response: Response}} */
    async handlerWillRespond({ request, response }) {
      console.log("fetchDidSucceed", request.url);

      const requestInProgress = requestsInProgress.get(request.url);
      if (requestInProgress) {
        console.log("cleaning up");
        requestInProgress.done();
        requestsInProgress.delete(request.url);
      }

      console.log("ok");

      return response;
    },
  };
}

@philipwalton
Copy link
Member

For some reason, the CacheFirst strategy in the second request doesn't find the response on the cache and tries to fetch it from the network again, even though it waited for the first handler to finish - I believe it's because the cache wasn't written in time.

Ahh, right, that makes sense. If you use the handlerDidComplete() callback, that will wait until the response has been written to the cache—but it will also wait for all other work passed to the event's waitUntil() method to complete, which (depending on your case) could be longer than you want.

Is there any particular reason you wouldn't accept a PR to allow overriding fetch in a plugin?

I'm not necessarily opposed to it, but it does make me a bit uncomfortable. I think we'd want to think a bit more about the potential uses (and abuses) before adding something like that.

Before we go down that road I'd like to explore whether or not it's possible to do by either writing your own custom strategy (as @jeffposnick suggested above) or even extending whatever built-in strategy you're using.

For example, you could extend the CacheFirst strategy, reuse all its behavior, and just check for in-progress requests prior to responding:

class DedupeCacheFirst extends CacheFirst {
  async _handle(request, handler) {
    // Check for a duplicate, pending request
    if (isDuplicateRequestPending(request)) {
      return await getResponseForDuplicateRequest(request);
    }
    return await super._handle(request, handler);
  }
}

Do you think that work would?

@jeffposnick
Copy link
Contributor

jeffposnick commented Aug 15, 2020

I played around with this a bit today and got the following working:

// In sw.ts:

import {NetworkFirst, StrategyHandler} from 'workbox-strategies';
import {registerRoute} from 'workbox-routing';
import {clientsClaim} from 'workbox-core';

clientsClaim();

class DedupeNetworkFirst extends NetworkFirst {
  private _inflightRequests: Map<string, Promise<Response>>;

  constructor(options?: any) {
    super(options);
    this._inflightRequests = new Map();
  }

  async _handle(request: Request, handler: StrategyHandler) {
    let inflightResponsePromise = this._inflightRequests.get(request.url);

    if (inflightResponsePromise) {
      const inflightResponse = await inflightResponsePromise;

      return inflightResponse.clone();
    } else {
      inflightResponsePromise = super._handle(request, handler);
      this._inflightRequests.set(request.url, inflightResponsePromise);

      try {
        const response = await inflightResponsePromise;
        return response.clone();
      } finally {
        this._inflightRequests.delete(request.url);
      }
    }
  }
}

registerRoute(
  ({url}) => url.href.startsWith('https://httpbin.org/delay/'),
  new DedupeNetworkFirst(),
);

You can try this out with a simple web page like:

<html>
  <body>
    <p>
      Click to make a new request to https://httpbin.org/delay/5:
      <button id="makeRequest">Make Request</button>
    </p>
    <script>
      let count = 1;
      navigator.serviceWorker.register('sw.js');
      document.querySelector('#makeRequest').addEventListener('click', async () => {
        const id = `request ${count++}`;
        console.time(id);
        await fetch('https://httpbin.org/delay/5');
        console.timeEnd(id);
      });
    </script>
  </body>
</html>

Checking the Network panel and the logged messages, I think that gives the behavior that @ianldgs originally asked for. In that all the duplicate requests that are made while an original request is still in-flight will wait for the original response, and once the response is available, they'll all use a clone of it to fulfill their respective requests.

Screen Shot 2020-08-15 at 6 56 20 PM

(The little gear icon indicates that only one of the requests processed by the service worker actually resulted in an outgoing request to the network.)

@ianldgs
Copy link
Author

ianldgs commented Aug 17, 2020

Hey @jeffposnick and @philipwalton, this looks really good, thanks a lot! I will go with this approach.

It didn't look like the best approach to me after playing with it, because the _handle method looks like internal API and it's called "handle", but receives a "handler" as a parameter. I imagined it's possible that you can change it at any point, since v6 is in alpha. Speaking of which, do you have any plans to change it?

I'm not necessarily opposed to it, but it does make me a bit uncomfortable. I think we'd want to think a bit more about the potential uses (and abuses) before adding something like that.

Fair enough! It does feel a bit weird, yes, since the fetch callback wouldn't follow the exact same path of the others.

I'm closing this since I'm pretty sure no changes are needed to meet the requirements I have.

Have a nice week guys!

@ianldgs ianldgs closed this as completed Aug 17, 2020
@jeffposnick
Copy link
Contributor

Glad to hear it!

@philipwalton, that's good feedback for when it's time to document extending Strategy classes in v6.

@philipwalton
Copy link
Member

It didn't look like the best approach to me after playing with it, because the _handle method looks like internal API and it's called "handle", but receives a "handler" as a parameter. I imagined it's possible that you can change it at any point, since v6 is in alpha. Speaking of which, do you have any plans to change it?

Yeah, I agree it does make it look like you're not supposed to define it, but hopefully it'll be clear with good documentation.

The idea is that the _handle() method actually is private (technically protected) since it should never be called outside of the class defining it.

This is similar to Node's Readable Streams implementation, where when you create your own streams subclass you have to define the _read() method yourself. I remember this initially struck me as odd the first time I read it, but after thinking about it more it made sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants