Remove incumbent/fetching record from Cache behavior #1190

jungkees · 2017-08-22T10:13:08Z

This change removes the incumbent/fetching record concept that allowed
committing a fetching resource to cache and match/matchAll to return a
fully written existing resource, if any, with priority over a fetching
resource. After this change, add/addAll/put promises resolve when the
resources ared fully fetched and committed to cache. match/matchAll will
not return any on-going fetching resources.

This changes the specification type of underlying cache objects from a
map (request to response map) to a list (request response list). The
change is to make the arguments and the return values of the Cache
methods and algorithms (Batch Cache Operations and Query Cache) conform
to one another. The list type seems fine as the algorithms tend to
iterate through the list to find certain items with search options.
Looking at the details of the acutal implementations, I plan to update
it further if needed.

Fixes #884.

Preview | Diff

This change removes the incumbent/fetching record concept that allowed committing a fetching resource to cache and match/matchAll to return a fully written existing resource, if any, with priority over a fetching resource. After this change, add/addAll/put promises resolve when the resources ared fully fetched and committed to cache. match/matchAll will not return any on-going fetching resources. This changes the specification type of underlying cache objects from a map (request to response map) to a list (request response list). The change is to make the arguments and the return values of the Cache methods and algorithms (Batch Cache Operations and Query Cache) conform to one another. The list type seems fine as the algorithms tend to iterate through the list to find certain items with search options. Looking at the details of the acutal implementations, I plan to update it further if needed. Fixes #884.

jungkees · 2017-08-22T10:16:15Z

/cc @wanderview @mattto @aliams @hober

jakearchibald · 2017-08-25T14:15:29Z

Whoop! We now have PR previews!

jakearchibald

It isn't a problem caused by this PR, but I think we're using JS objects too much in the cache spec. Ideally we should only create new Request and Response objects when we're about to return them, on the main thread.

jakearchibald · 2017-08-25T14:40:10Z

docs/index.bs

@@ -1861,13 +1861,9 @@ spec: webappsec-referrer-policy; urlPrefix: https://w3c.github.io/webappsec-refe
  <section>
    <h3 id="cache-constructs">Constructs</h3>

-    A <dfn id="dfn-fetching-record">fetching record</dfn> is a <a>Record</a> {\[[key]], \[[value]]} where \[[key]] is a {{Request}} and \[[value]] is a {{Response}}.
+    A <dfn id="dfn-request-response-list">request response list</dfn> is a [=list=] of [=pairs=] consisting of |request| (a {{Request}} object) and |response| (a {{Response}} object).


Does it make sense to be storing JS objects here? Would it be better to store the concepts and create the objects just as we return them? (I realise that wasn't changed in this PR)

Yea, what Jake recommends matches implementations. We explicitly don't always return the same Response js object from match() even if its stored in the same entry.

Absolutely. I changed it to store the request and response structs instead of the JS objects and to create the JS objects in matchAll() and keys() when requested.

jakearchibald · 2017-08-25T14:47:17Z

docs/index.bs

-                1. For each <a>fetching record</a> |entry| of its <a>request to response map</a>, in key insertion order:
-                    1. Add a  copy of |entry|.\[[value]] to |responseArray|.
+                1. [=list/For each=] |item| of the [=context object=]'s [=request response list=]:
+                    1. Add a copy of |item|'s |response| to |responseArray|.


Given that it's a JS object, we can't really just say "copy", but I think the right answer here is to use concepts rather than objects, where "copy" appears to be fine.

This could be part of an additional PR though, as this PR doesn't introduce this problem.

I changed it to using the internal structs instead of the JS objects, so it seems fine. But as Fetch provides the cloning algorithm, I'll look at it further whether using that'd make it more precise.

jakearchibald · 2017-08-25T15:17:42Z

docs/index.bs

          1. And then, if an exception was <a lt="throw">thrown</a>, then:
-              1. Set the <a>context object</a>'s <a>request to response map</a> to |itemsCopy|.
+              1. Set |cache| to |itemsCopy|.


Isn't this just setting a local variable? As in, it no longer reverts the operation.

Again, this isn't a new problem, but isn't there a bit of a race condition here? Since we're replacing the whole cache with an older copy, there may be concurrent operations of this resulting in data loss.

Isn't this just setting a local variable?

I think |cache| becomes a reference to the request response list. It's confusing though.

isn't there a bit of a race condition here?

I wanted making the cache write atomic (the step 3.3) would do some magic, but it's absolutely a part that I should audit and improve.

jakearchibald · 2017-08-25T15:19:19Z

docs/index.bs

  </section>

  <section algorithm>
    <h3 id="batch-cache-operations-algorithm"><dfn>Batch Cache Operations</dfn></h3>

      : Input
-      :: |operations|, an array of  {{CacheBatchOperation}} dictionary objects
+      :: |operations|, a [=list=] of {{CacheBatchOperation}} dictionary objects


We never expose this, so it doesn't need to be a dictionary right?

Yes. I replaced it with a struct.

jakearchibald · 2017-08-25T15:19:44Z

docs/index.bs

-              1. Let |resultArray| be an empty array.
-              1. For each |operation| in |operations|:
+              1. Let |resultList| be an empty [=list=].
+              1. [=list/For each=] |operation| in |operations|:
                  1. If |operation|.{{CacheBatchOperation/type}} matches neither "delete" nor "put", <a>throw</a> a <code>TypeError</code>.


We should probably make this an enum or list of possible values.

For now, I made it as possible values for the newly defined cache batch operations's type item.

jakearchibald · 2017-08-25T15:26:39Z

docs/index.bs

-                          Note: The cache commit is allowed as long as the response's headers are available.
-
+                      1. Set |requestResponseList| to the result of running [=Query Cache=] with |operation|.{{CacheBatchOperation/request}}.
+                      1. If |requestResponseList| [=list/is not empty=], [=list/replace=] the [=list/item=] of |cache| that matches |requestResponseList|[0] with |operation|.{{CacheBatchOperation/request}}/|operation|.{{CacheBatchOperation/response}}.


What if there are multiple matches, shouldn't we be removing those? It might be easiest to remove all matches then just append the new entry.

Oh this is now done in addAll. We should probably just append here in that case.

I addressed it as you suggested. I think we still can do better for the return value of Batch Cache Operations somehow. I'll look into it as a separate work.

wanderview · 2017-08-25T15:56:55Z

Since this PR touches the case where the body stream errors during a put(), it would be nice to add WPT tests to cover that. I think it should be somewhat easy to do now that we have ReadableStream bodies.

jungkees · 2017-08-29T11:47:24Z

@jakearchibald, @wanderview, thanks for reviewing. I couldn't make it before I leave for a vacation. I'm off until 6th of Sep. Will follow up on it when I come back.

The changes include: - Replace CacheBatchOperations dictionary which isn't exposed to JavaScript surface with cache batch operations struct. - Do not store JS objects in the storage but store request and response structs instead. - Create and return JS objects in the target realm when requested (from matchAll() and keys()). - Simplify "put" operation related steps by moving/refactoring the post-Batch Cache Operation steps, which clear the invalid items, from addAll() and put() into Batch Cache Operations. - Move the argument validation steps of cache.keys() out of the parallel thread to main thread. - Fix cacheStorage.keys() to run the steps async. (For now, it still runs just in parallel, but later I plan to use the parallel queue concept: https://html.spec.whatwg.org/#parallel-queue).

jungkees · 2017-09-28T11:59:03Z

Sorry for coming back to this late. PTAL.

jakearchibald

This is coming along nicely!

jakearchibald · 2017-09-29T09:19:52Z

docs/index.bs


-    A <a>fetching record</a> has an associated <dfn id="dfn-incumbent-record">incumbent record</dfn> (a <a>fetching record</a>). It is initially set to null.
+    When a [=request response list=] is referenced from within an algorithm, an attribute getter, an attribute setter, or a method, it designates the instance that the [=context object=] represents, unless specified otherwise.


Feels like this would be better as a new definition. As in:

The relevant request response list is the instance that the context object represents.

Then it can be linked to when used.

Good idea. Addressed.

jakearchibald · 2017-09-29T09:20:02Z

docs/index.bs


-    Each [=/origin=] has an associated <a>name to cache map</a>.
+    When a [=name to cache map=] is referenced from within an algorithm, an attribute getter, an attribute setter, or a method, it designates the instance of the [=context object=]'s associated [=CacheStorage/global object=]'s [=environment settings object=]'s [=environment settings object/origin=], unless specified otherwise.


jakearchibald · 2017-09-29T09:44:51Z

docs/index.bs

-        1. Run these substeps <a>in parallel</a>:
-            1. Let |responseArray| be an empty array.
+                1. Set |r| to the associated [=Request/request=] of the result of invoking the initial value of {{Request}} as constructor with |request| as its argument. If this [=throws=] an exception, return [=a promise rejected with=] that exception.
+        1. Let |realm| be the [=current Realm Record=].


Should this be https://html.spec.whatwg.org/multipage/webappapis.html#concept-relevant-realm for the context object?

I referenced it from the example in https://html.spec.whatwg.org/#event-loop-for-spec-authors. But I think the relevant realm of the context object is correct indeed. Addressed as such.

jakearchibald · 2017-09-29T09:48:14Z

docs/index.bs

@@ -1861,15 +1862,15 @@ spec: webappsec-referrer-policy; urlPrefix: https://w3c.github.io/webappsec-refe
  <section>
    <h3 id="cache-constructs">Constructs</h3>

-    A <dfn id="dfn-fetching-record">fetching record</dfn> is a <a>Record</a> {\[[key]], \[[value]]} where \[[key]] is a {{Request}} and \[[value]] is a {{Response}}.
+    A <dfn id="dfn-request-response-list">request response list</dfn> is a [=list=] of [=pairs=] consisting of a request (a [=/request=]) and a response (a [=/response=]).


Should request and response here be defined as for="request response list"? Then they can be linked to when used.

I think the request and the response definitions then should belong to something like a request response pair concept instead of the list itself. I don't see any other particular needs for the definition of the pair though. It seems okay as-is as the request and the response can be identified as the items of the pairs in the list.

jakearchibald · 2017-09-29T09:54:37Z

docs/index.bs

+                1. [=list/For each=] |requestResponse| of |requestResponses|:
+                    1. Add a copy of |requestResponse|'s response to |responses|.
+            1. [=Queue a task=], on |promise|'s [=relevant settings object=]'s [=responsible event loop=] using the [=DOM manipulation task source=], to perform the following steps:
+                1. Let |responseArray| be an empty JavaScript array, in |realm|.


I think we could create a sequence, then use https://heycam.github.io/webidl/#dfn-create-frozen-array to turn it into an array.

We need to update the IDL to return a FrozenArray rather than a sequence too.

Addressed. But I want to make sure if I used "in realm" correctly in this change. I suppose the created frozen array is actually converted to a JavaScript array by Web IDL. If so, designating the realm when creating a frozen array as I did here makes sense?

jakearchibald · 2017-09-29T10:19:25Z

docs/index.bs

+
+                    Note: The cache commit is allowed when the response's body is fully received.
+
+                * To [=process response done=] for |response|, do nothing.


I don't think we need this line.

jakearchibald · 2017-09-29T10:24:48Z

docs/index.bs

+                1. If |r|'s [=request/method=] is not \`<code>GET</code>\` and |options|.ignoreMethod is false, return [=a promise resolved with=] an empty array.
+            1. Else if |request| is a string, then:
+                1. Set |r| to the associated [=Request/request=] of the result of invoking the initial value of {{Request}} as constructor with |request| as its argument. If this [=throws=] an exception, return [=a promise rejected with=] that exception.
+        1. Let |realm| be the [=current Realm Record=].


As above, we might not need this if we use frozenarray.

In the above case, I changed it to using a frozen array but still left "in realm" when creating the frozen array. If we use frozen array here, don't we need to specify a realm?

jakearchibald · 2017-09-29T10:26:27Z

docs/index.bs

+            1. [=map/For each=] <var ignore>cacheName</var> → |cache| of the [=name to cache map=]:
+                1. Set |promise| to the result of [=transforming=] itself with a fulfillment handler that, when called with argument |response|, performs the following substeps [=in parallel=]:
+                    1. If |response| is not undefined, return |response|.
+                    1. Return the result of running the algorithm specified in {{Cache/match(request, options)}} method of {{Cache}} interface with |request| and |options| as the arguments (providing |cache| as thisArgument to the `\[[Call]]` internal method of {{Cache/match(request, options)}}.)


I don't think you need the slash in [[Call]]

jakearchibald · 2017-09-29T10:27:10Z

docs/index.bs

-                    1. Return true.
-            1. Return false.
+                    1. Resolve |promise| with true.
+                    1. Abort these steps.


Could probably roll this into the line above. "Resolve promise with true and abort these steps".

jakearchibald · 2017-09-29T10:29:10Z

docs/index.bs

            1. If |cacheExists| is true, then:
-                1. Delete a <a>Record</a> {\[[key]], \[[value]]} <var ignore>entry</var> from its <a>name to cache map</a> where |cacheName| matches entry.\[[key]].
+                1. [=map/Remove=] the [=name to cache map=][|cacheName|].
                1. Return true.


I'm not sure we can "return" from in parallel steps.

I'm not sure we can. For this particular case, I removed "in parallel". I think that's okay as the fulfillment handler's already scheduled async in the microtask queue. But for other similar cases, I didn't change them in this PR as I wasn't sure if they can all run in the main thread. I'll look at them as a separate work.

Although it's on the microtask queue, it's still blocking the event loop. I think we need to create a promise and return it.

Yes, actually the fulfillment handler steps were very much incorrect. I made them run in the event loop and create a promise there and resolve/reject the promise from the parallel steps. Also, while changing them, I changed the interface of Batch Cache Operations such that it runs synchronously without returning a promise and the call sites invoke it from a promise job.

jungkees · 2017-10-11T08:57:36Z

@jakearchibald, thanks for reviewing. I addressed your comments. PTAL.

wanderview · 2017-10-11T15:05:33Z

I'm sorry, but I don't think I will have time to review this. Just FYI, so you don't wait for me. Thanks for working on this.

jakearchibald · 2017-10-11T15:15:13Z

@annevk @domenic

We've got a couple of instances in the service worker spec that follow this pattern:

Let realm be the context object's relevant realm.
Then later, in parallel:
Resolve promise with a frozen array created from someList, in realm.

…where someList is an infra list.

Do we need to do this? I thought it would be enough to resolve the promise with someList, and IDL takes care of the FrozenArray creation in the correct realm, but I can't find a direct spec reference for that.

domenic · 2017-10-11T16:18:30Z

You need to convert lists into FrozenArrays while specifying the realm; there's no way IDL could automatically figure out what kind of object you want to convert it to, or what global you want to create it in.

jungkees · 2017-10-12T01:43:30Z

Okay. So, my try seems to be okay here.

annevk · 2017-10-12T08:23:46Z

@jakearchibald you don't resolve from "in parallel". You need to queue a task on an event loop. That event loop will have a realm that you can use to create the frozen array and resolve the promise. (You don't get in parallel access to resolve either.)

jungkees · 2017-10-12T08:46:08Z

@annevk, that queue a task step is missing in @jakearchibald's example above.

Resolve promise with a frozen array created from someList, in realm.

is run in a queued task as you pointed indeed.

jakearchibald · 2017-10-12T08:46:45Z

@annevk "resolve" automatically queues a task on the promise's event loop https://www.w3.org/2001/tag/doc/promises-guide#shorthand-manipulating.

jakearchibald · 2017-10-12T08:50:03Z

@domenic The return type is defined in IDL, so I thought it be capable of some casting. A promise created in one realm could resolve with an object from another, but that feels like the exception.

I guess I could write my own helper to do this.

annevk · 2017-10-12T08:59:08Z

@jakearchibald that hook is broken. It doesn't allow you to specify the task source.

jungkees · 2017-10-12T09:05:37Z

Does it make sense to define some sort of default task source (used when not specified otherwise) for promise jobs? That hook is actually handy and makes it read simple in many cases.

annevk · 2017-10-12T09:23:02Z

Action-at-a-distance leads to harder to understand algorithms, I think. If someone fixed the hook to ensure it takes all the arguments it needs it would be much clearer.

- Adjust variable scope - Change to early-exit in fetch abort cases - Fix async steps of fulfillment handlers - Change the interface of Batch Cache Operations algorithm . Change to not return a promise . Remove the in parallel steps and make it work synchronously . Change the call sites to call it in a created promise's in parallel steps

jungkees · 2017-10-13T05:51:37Z

If someone fixed the hook to ensure it takes all the arguments it needs it would be much clearer.

Yes, I think this would help simplifying the caller side steps. I'll take a look when I find time for it.

jungkees · 2017-10-13T05:52:06Z

@jakearchibald, I uploaded another snapshot addressing your additional comments. PTAL.

jakearchibald

LGTM! A huge step forward

jungkees · 2017-10-13T14:14:27Z

@jakearchibald, thanks a lot for your review!

jungkees requested a review from jakearchibald August 22, 2017 10:13

jakearchibald requested changes Aug 25, 2017

View reviewed changes

jakearchibald requested changes Sep 29, 2017

View reviewed changes

Address comments

d31dd3d

jakearchibald approved these changes Oct 12, 2017

View reviewed changes

jakearchibald mentioned this pull request Oct 12, 2017

Be explicit about realms and FrozenArray creation WICG/background-fetch#61

Closed

jakearchibald approved these changes Oct 13, 2017

View reviewed changes

Temporarily copy HTML files from master to avoid huge merge conflicts

8c0f52c

Temporarily copy HTMLs again

fbf6f44

jungkees merged commit c8ab714 into master Oct 13, 2017

jungkees deleted the remove-incumbent-fetching-concept branch October 13, 2017 14:09


		A <a>fetching record</a> has an associated <dfn id="dfn-incumbent-record">incumbent record</dfn> (a <a>fetching record</a>). It is initially set to null.
		When a [=request response list=] is referenced from within an algorithm, an attribute getter, an attribute setter, or a method, it designates the instance that the [=context object=] represents, unless specified otherwise.


		Each [=/origin=] has an associated <a>name to cache map</a>.
		When a [=name to cache map=] is referenced from within an algorithm, an attribute getter, an attribute setter, or a method, it designates the instance of the [=context object=]'s associated [=CacheStorage/global object=]'s [=environment settings object=]'s [=environment settings object/origin=], unless specified otherwise.


		Note: The cache commit is allowed when the response's body is fully received.

		* To [=process response done=] for \|response\|, do nothing.

Remove incumbent/fetching record from Cache behavior #1190

Remove incumbent/fetching record from Cache behavior #1190

Conversation

jungkees commented Aug 22, 2017 • edited by pr-preview bot Loading

jungkees commented Aug 22, 2017

jakearchibald commented Aug 25, 2017

jakearchibald left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakearchibald Aug 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wanderview commented Aug 25, 2017

jungkees commented Aug 29, 2017

jungkees commented Sep 28, 2017

jakearchibald left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jungkees commented Oct 11, 2017

wanderview commented Oct 11, 2017

jakearchibald commented Oct 11, 2017 • edited Loading

domenic commented Oct 11, 2017

jungkees commented Oct 12, 2017

annevk commented Oct 12, 2017

jungkees commented Oct 12, 2017

jakearchibald commented Oct 12, 2017

jakearchibald commented Oct 12, 2017

annevk commented Oct 12, 2017

jungkees commented Oct 12, 2017

annevk commented Oct 12, 2017

jungkees commented Oct 13, 2017

jungkees commented Oct 13, 2017

jakearchibald left a comment • edited Loading

Choose a reason for hiding this comment

jungkees commented Oct 13, 2017

jungkees commented Aug 22, 2017 •

edited by pr-preview bot

Loading

jakearchibald Aug 25, 2017 •

edited

Loading

jakearchibald commented Oct 11, 2017 •

edited

Loading

jakearchibald left a comment •

edited

Loading