-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove incumbent/fetching record from Cache behavior #1190
Conversation
This change removes the incumbent/fetching record concept that allowed committing a fetching resource to cache and match/matchAll to return a fully written existing resource, if any, with priority over a fetching resource. After this change, add/addAll/put promises resolve when the resources ared fully fetched and committed to cache. match/matchAll will not return any on-going fetching resources. This changes the specification type of underlying cache objects from a map (request to response map) to a list (request response list). The change is to make the arguments and the return values of the Cache methods and algorithms (Batch Cache Operations and Query Cache) conform to one another. The list type seems fine as the algorithms tend to iterate through the list to find certain items with search options. Looking at the details of the acutal implementations, I plan to update it further if needed. Fixes #884.
Whoop! We now have PR previews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't a problem caused by this PR, but I think we're using JS objects too much in the cache spec. Ideally we should only create new Request
and Response
objects when we're about to return them, on the main thread.
docs/index.bs
Outdated
@@ -1861,13 +1861,9 @@ spec: webappsec-referrer-policy; urlPrefix: https://w3c.github.io/webappsec-refe | |||
<section> | |||
<h3 id="cache-constructs">Constructs</h3> | |||
|
|||
A <dfn id="dfn-fetching-record">fetching record</dfn> is a <a>Record</a> {\[[key]], \[[value]]} where \[[key]] is a {{Request}} and \[[value]] is a {{Response}}. | |||
A <dfn id="dfn-request-response-list">request response list</dfn> is a [=list=] of [=pairs=] consisting of |request| (a {{Request}} object) and |response| (a {{Response}} object). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to be storing JS objects here? Would it be better to store the concepts and create the objects just as we return them? (I realise that wasn't changed in this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, what Jake recommends matches implementations. We explicitly don't always return the same Response js object from match() even if its stored in the same entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. I changed it to store the request and response structs instead of the JS objects and to create the JS objects in matchAll() and keys() when requested.
docs/index.bs
Outdated
1. For each <a>fetching record</a> |entry| of its <a>request to response map</a>, in key insertion order: | ||
1. Add a copy of |entry|.\[[value]] to |responseArray|. | ||
1. [=list/For each=] |item| of the [=context object=]'s [=request response list=]: | ||
1. Add a copy of |item|'s |response| to |responseArray|. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that it's a JS object, we can't really just say "copy", but I think the right answer here is to use concepts rather than objects, where "copy" appears to be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be part of an additional PR though, as this PR doesn't introduce this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to using the internal structs instead of the JS objects, so it seems fine. But as Fetch provides the cloning algorithm, I'll look at it further whether using that'd make it more precise.
docs/index.bs
Outdated
1. And then, if an exception was <a lt="throw">thrown</a>, then: | ||
1. Set the <a>context object</a>'s <a>request to response map</a> to |itemsCopy|. | ||
1. Set |cache| to |itemsCopy|. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just setting a local variable? As in, it no longer reverts the operation.
Again, this isn't a new problem, but isn't there a bit of a race condition here? Since we're replacing the whole cache with an older copy, there may be concurrent operations of this resulting in data loss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just setting a local variable?
I think |cache| becomes a reference to the request response list. It's confusing though.
isn't there a bit of a race condition here?
I wanted making the cache write atomic (the step 3.3) would do some magic, but it's absolutely a part that I should audit and improve.
docs/index.bs
Outdated
</section> | ||
|
||
<section algorithm> | ||
<h3 id="batch-cache-operations-algorithm"><dfn>Batch Cache Operations</dfn></h3> | ||
|
||
: Input | ||
:: |operations|, an array of {{CacheBatchOperation}} dictionary objects | ||
:: |operations|, a [=list=] of {{CacheBatchOperation}} dictionary objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never expose this, so it doesn't need to be a dictionary right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I replaced it with a struct.
docs/index.bs
Outdated
1. Let |resultArray| be an empty array. | ||
1. For each |operation| in |operations|: | ||
1. Let |resultList| be an empty [=list=]. | ||
1. [=list/For each=] |operation| in |operations|: | ||
1. If |operation|.{{CacheBatchOperation/type}} matches neither "delete" nor "put", <a>throw</a> a <code>TypeError</code>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably make this an enum or list of possible values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I made it as possible values for the newly defined cache batch operations's type item.
docs/index.bs
Outdated
Note: The cache commit is allowed as long as the response's headers are available. | ||
|
||
1. Set |requestResponseList| to the result of running [=Query Cache=] with |operation|.{{CacheBatchOperation/request}}. | ||
1. If |requestResponseList| [=list/is not empty=], [=list/replace=] the [=list/item=] of |cache| that matches |requestResponseList|[0] with |operation|.{{CacheBatchOperation/request}}/|operation|.{{CacheBatchOperation/response}}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there are multiple matches, shouldn't we be removing those? It might be easiest to remove all matches then just append the new entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is now done in addAll. We should probably just append here in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed it as you suggested. I think we still can do better for the return value of Batch Cache Operations somehow. I'll look into it as a separate work.
Since this PR touches the case where the body stream errors during a put(), it would be nice to add WPT tests to cover that. I think it should be somewhat easy to do now that we have ReadableStream bodies. |
@jakearchibald, @wanderview, thanks for reviewing. I couldn't make it before I leave for a vacation. I'm off until 6th of Sep. Will follow up on it when I come back. |
The changes include: - Replace CacheBatchOperations dictionary which isn't exposed to JavaScript surface with cache batch operations struct. - Do not store JS objects in the storage but store request and response structs instead. - Create and return JS objects in the target realm when requested (from matchAll() and keys()). - Simplify "put" operation related steps by moving/refactoring the post-Batch Cache Operation steps, which clear the invalid items, from addAll() and put() into Batch Cache Operations. - Move the argument validation steps of cache.keys() out of the parallel thread to main thread. - Fix cacheStorage.keys() to run the steps async. (For now, it still runs just in parallel, but later I plan to use the parallel queue concept: https://html.spec.whatwg.org/#parallel-queue).
Sorry for coming back to this late. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is coming along nicely!
docs/index.bs
Outdated
|
||
A <a>fetching record</a> has an associated <dfn id="dfn-incumbent-record">incumbent record</dfn> (a <a>fetching record</a>). It is initially set to null. | ||
When a [=request response list=] is referenced from within an algorithm, an attribute getter, an attribute setter, or a method, it designates the instance that the [=context object=] represents, unless specified otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like this would be better as a new definition. As in:
The relevant request response list is the instance that the context object represents.
Then it can be linked to when used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Addressed.
docs/index.bs
Outdated
|
||
Each [=/origin=] has an associated <a>name to cache map</a>. | ||
When a [=name to cache map=] is referenced from within an algorithm, an attribute getter, an attribute setter, or a method, it designates the instance of the [=context object=]'s associated [=CacheStorage/global object=]'s [=environment settings object=]'s [=environment settings object/origin=], unless specified otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
docs/index.bs
Outdated
1. Run these substeps <a>in parallel</a>: | ||
1. Let |responseArray| be an empty array. | ||
1. Set |r| to the associated [=Request/request=] of the result of invoking the initial value of {{Request}} as constructor with |request| as its argument. If this [=throws=] an exception, return [=a promise rejected with=] that exception. | ||
1. Let |realm| be the [=current Realm Record=]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be https://html.spec.whatwg.org/multipage/webappapis.html#concept-relevant-realm for the context object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I referenced it from the example in https://html.spec.whatwg.org/#event-loop-for-spec-authors. But I think the relevant realm of the context object is correct indeed. Addressed as such.
@@ -1861,15 +1862,15 @@ spec: webappsec-referrer-policy; urlPrefix: https://w3c.github.io/webappsec-refe | |||
<section> | |||
<h3 id="cache-constructs">Constructs</h3> | |||
|
|||
A <dfn id="dfn-fetching-record">fetching record</dfn> is a <a>Record</a> {\[[key]], \[[value]]} where \[[key]] is a {{Request}} and \[[value]] is a {{Response}}. | |||
A <dfn id="dfn-request-response-list">request response list</dfn> is a [=list=] of [=pairs=] consisting of a request (a [=/request=]) and a response (a [=/response=]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should request and response here be defined as for="request response list"? Then they can be linked to when used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the request and the response definitions then should belong to something like a request response pair concept instead of the list itself. I don't see any other particular needs for the definition of the pair though. It seems okay as-is as the request and the response can be identified as the items of the pairs in the list.
docs/index.bs
Outdated
1. [=list/For each=] |requestResponse| of |requestResponses|: | ||
1. Add a copy of |requestResponse|'s response to |responses|. | ||
1. [=Queue a task=], on |promise|'s [=relevant settings object=]'s [=responsible event loop=] using the [=DOM manipulation task source=], to perform the following steps: | ||
1. Let |responseArray| be an empty JavaScript array, in |realm|. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could create a sequence, then use https://heycam.github.io/webidl/#dfn-create-frozen-array to turn it into an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update the IDL to return a FrozenArray rather than a sequence too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed. But I want to make sure if I used "in realm" correctly in this change. I suppose the created frozen array is actually converted to a JavaScript array by Web IDL. If so, designating the realm when creating a frozen array as I did here makes sense?
docs/index.bs
Outdated
|
||
Note: The cache commit is allowed when the response's body is fully received. | ||
|
||
* To [=process response done=] for |response|, do nothing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
docs/index.bs
Outdated
1. If |r|'s [=request/method=] is not \`<code>GET</code>\` and |options|.ignoreMethod is false, return [=a promise resolved with=] an empty array. | ||
1. Else if |request| is a string, then: | ||
1. Set |r| to the associated [=Request/request=] of the result of invoking the initial value of {{Request}} as constructor with |request| as its argument. If this [=throws=] an exception, return [=a promise rejected with=] that exception. | ||
1. Let |realm| be the [=current Realm Record=]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, we might not need this if we use frozenarray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the above case, I changed it to using a frozen array but still left "in realm" when creating the frozen array. If we use frozen array here, don't we need to specify a realm?
docs/index.bs
Outdated
1. [=map/For each=] <var ignore>cacheName</var> → |cache| of the [=name to cache map=]: | ||
1. Set |promise| to the result of [=transforming=] itself with a fulfillment handler that, when called with argument |response|, performs the following substeps [=in parallel=]: | ||
1. If |response| is not undefined, return |response|. | ||
1. Return the result of running the algorithm specified in {{Cache/match(request, options)}} method of {{Cache}} interface with |request| and |options| as the arguments (providing |cache| as thisArgument to the `\[[Call]]` internal method of {{Cache/match(request, options)}}.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need the slash in [[Call]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
docs/index.bs
Outdated
1. Return true. | ||
1. Return false. | ||
1. Resolve |promise| with true. | ||
1. Abort these steps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could probably roll this into the line above. "Resolve promise with true and abort these steps".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
docs/index.bs
Outdated
1. If |cacheExists| is true, then: | ||
1. Delete a <a>Record</a> {\[[key]], \[[value]]} <var ignore>entry</var> from its <a>name to cache map</a> where |cacheName| matches entry.\[[key]]. | ||
1. [=map/Remove=] the [=name to cache map=][|cacheName|]. | ||
1. Return true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can "return" from in parallel steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can. For this particular case, I removed "in parallel". I think that's okay as the fulfillment handler's already scheduled async in the microtask queue. But for other similar cases, I didn't change them in this PR as I wasn't sure if they can all run in the main thread. I'll look at them as a separate work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it's on the microtask queue, it's still blocking the event loop. I think we need to create a promise and return it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, actually the fulfillment handler steps were very much incorrect. I made them run in the event loop and create a promise there and resolve/reject the promise from the parallel steps. Also, while changing them, I changed the interface of Batch Cache Operations such that it runs synchronously without returning a promise and the call sites invoke it from a promise job.
@jakearchibald, thanks for reviewing. I addressed your comments. PTAL. |
I'm sorry, but I don't think I will have time to review this. Just FYI, so you don't wait for me. Thanks for working on this. |
We've got a couple of instances in the service worker spec that follow this pattern:
…where someList is an infra list. Do we need to do this? I thought it would be enough to resolve the promise with someList, and IDL takes care of the FrozenArray creation in the correct realm, but I can't find a direct spec reference for that. |
You need to convert lists into FrozenArrays while specifying the realm; there's no way IDL could automatically figure out what kind of object you want to convert it to, or what global you want to create it in. |
Okay. So, my try seems to be okay here. |
@jakearchibald you don't resolve from "in parallel". You need to queue a task on an event loop. That event loop will have a realm that you can use to create the frozen array and resolve the promise. (You don't get in parallel access to resolve either.) |
@annevk, that queue a task step is missing in @jakearchibald's example above.
is run in a queued task as you pointed indeed. |
@annevk "resolve" automatically queues a task on the promise's event loop https://www.w3.org/2001/tag/doc/promises-guide#shorthand-manipulating. |
@domenic The return type is defined in IDL, so I thought it be capable of some casting. A promise created in one realm could resolve with an object from another, but that feels like the exception. I guess I could write my own helper to do this. |
@jakearchibald that hook is broken. It doesn't allow you to specify the task source. |
Does it make sense to define some sort of default task source (used when not specified otherwise) for promise jobs? That hook is actually handy and makes it read simple in many cases. |
Action-at-a-distance leads to harder to understand algorithms, I think. If someone fixed the hook to ensure it takes all the arguments it needs it would be much clearer. |
- Adjust variable scope - Change to early-exit in fetch abort cases - Fix async steps of fulfillment handlers - Change the interface of Batch Cache Operations algorithm . Change to not return a promise . Remove the in parallel steps and make it work synchronously . Change the call sites to call it in a created promise's in parallel steps
Yes, I think this would help simplifying the caller side steps. I'll take a look when I find time for it. |
@jakearchibald, I uploaded another snapshot addressing your additional comments. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! A huge step forward
@jakearchibald, thanks a lot for your review! |
This change removes the incumbent/fetching record concept that allowed
committing a fetching resource to cache and match/matchAll to return a
fully written existing resource, if any, with priority over a fetching
resource. After this change, add/addAll/put promises resolve when the
resources ared fully fetched and committed to cache. match/matchAll will
not return any on-going fetching resources.
This changes the specification type of underlying cache objects from a
map (request to response map) to a list (request response list). The
change is to make the arguments and the return values of the Cache
methods and algorithms (Batch Cache Operations and Query Cache) conform
to one another. The list type seems fine as the algorithms tend to
iterate through the list to find certain items with search options.
Looking at the details of the acutal implementations, I plan to update
it further if needed.
Fixes #884.
Preview | Diff