Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove incumbent/fetching record from Cache behavior #1190

Merged
merged 6 commits into from
Oct 13, 2017

Conversation

jungkees
Copy link
Collaborator

@jungkees jungkees commented Aug 22, 2017

This change removes the incumbent/fetching record concept that allowed
committing a fetching resource to cache and match/matchAll to return a
fully written existing resource, if any, with priority over a fetching
resource. After this change, add/addAll/put promises resolve when the
resources ared fully fetched and committed to cache. match/matchAll will
not return any on-going fetching resources.

This changes the specification type of underlying cache objects from a
map (request to response map) to a list (request response list). The
change is to make the arguments and the return values of the Cache
methods and algorithms (Batch Cache Operations and Query Cache) conform
to one another. The list type seems fine as the algorithms tend to
iterate through the list to find certain items with search options.
Looking at the details of the acutal implementations, I plan to update
it further if needed.

Fixes #884.


Preview | Diff

This change removes the incumbent/fetching record concept that allowed
committing a fetching resource to cache and match/matchAll to return a
fully written existing resource, if any, with priority over a fetching
resource. After this change, add/addAll/put promises resolve when the
resources ared fully fetched and committed to cache. match/matchAll will
not return any on-going fetching resources.

This changes the specification type of underlying cache objects from a
map (request to response map) to a list (request response list). The
change is to make the arguments and the return values of the Cache
methods and algorithms (Batch Cache Operations and Query Cache) conform
to one another. The list type seems fine as the algorithms tend to
iterate through the list to find certain items with search options.
Looking at the details of the acutal implementations, I plan to update
it further if needed.

Fixes #884.
@jungkees
Copy link
Collaborator Author

/cc @wanderview @mattto @aliams @hober

@jakearchibald
Copy link
Contributor

Whoop! We now have PR previews!

Copy link
Contributor

@jakearchibald jakearchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't a problem caused by this PR, but I think we're using JS objects too much in the cache spec. Ideally we should only create new Request and Response objects when we're about to return them, on the main thread.

docs/index.bs Outdated
@@ -1861,13 +1861,9 @@ spec: webappsec-referrer-policy; urlPrefix: https://w3c.github.io/webappsec-refe
<section>
<h3 id="cache-constructs">Constructs</h3>

A <dfn id="dfn-fetching-record">fetching record</dfn> is a <a>Record</a> {\[[key]], \[[value]]} where \[[key]] is a {{Request}} and \[[value]] is a {{Response}}.
A <dfn id="dfn-request-response-list">request response list</dfn> is a [=list=] of [=pairs=] consisting of |request| (a {{Request}} object) and |response| (a {{Response}} object).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to be storing JS objects here? Would it be better to store the concepts and create the objects just as we return them? (I realise that wasn't changed in this PR)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, what Jake recommends matches implementations. We explicitly don't always return the same Response js object from match() even if its stored in the same entry.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. I changed it to store the request and response structs instead of the JS objects and to create the JS objects in matchAll() and keys() when requested.

docs/index.bs Outdated
1. For each <a>fetching record</a> |entry| of its <a>request to response map</a>, in key insertion order:
1. Add a copy of |entry|.\[[value]] to |responseArray|.
1. [=list/For each=] |item| of the [=context object=]'s [=request response list=]:
1. Add a copy of |item|'s |response| to |responseArray|.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it's a JS object, we can't really just say "copy", but I think the right answer here is to use concepts rather than objects, where "copy" appears to be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be part of an additional PR though, as this PR doesn't introduce this problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to using the internal structs instead of the JS objects, so it seems fine. But as Fetch provides the cloning algorithm, I'll look at it further whether using that'd make it more precise.

docs/index.bs Outdated
1. And then, if an exception was <a lt="throw">thrown</a>, then:
1. Set the <a>context object</a>'s <a>request to response map</a> to |itemsCopy|.
1. Set |cache| to |itemsCopy|.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just setting a local variable? As in, it no longer reverts the operation.

Again, this isn't a new problem, but isn't there a bit of a race condition here? Since we're replacing the whole cache with an older copy, there may be concurrent operations of this resulting in data loss.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just setting a local variable?

I think |cache| becomes a reference to the request response list. It's confusing though.

isn't there a bit of a race condition here?

I wanted making the cache write atomic (the step 3.3) would do some magic, but it's absolutely a part that I should audit and improve.

docs/index.bs Outdated
</section>

<section algorithm>
<h3 id="batch-cache-operations-algorithm"><dfn>Batch Cache Operations</dfn></h3>

: Input
:: |operations|, an array of {{CacheBatchOperation}} dictionary objects
:: |operations|, a [=list=] of {{CacheBatchOperation}} dictionary objects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never expose this, so it doesn't need to be a dictionary right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I replaced it with a struct.

docs/index.bs Outdated
1. Let |resultArray| be an empty array.
1. For each |operation| in |operations|:
1. Let |resultList| be an empty [=list=].
1. [=list/For each=] |operation| in |operations|:
1. If |operation|.{{CacheBatchOperation/type}} matches neither "delete" nor "put", <a>throw</a> a <code>TypeError</code>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make this an enum or list of possible values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I made it as possible values for the newly defined cache batch operations's type item.

docs/index.bs Outdated
Note: The cache commit is allowed as long as the response's headers are available.

1. Set |requestResponseList| to the result of running [=Query Cache=] with |operation|.{{CacheBatchOperation/request}}.
1. If |requestResponseList| [=list/is not empty=], [=list/replace=] the [=list/item=] of |cache| that matches |requestResponseList|[0] with |operation|.{{CacheBatchOperation/request}}/|operation|.{{CacheBatchOperation/response}}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are multiple matches, shouldn't we be removing those? It might be easiest to remove all matches then just append the new entry.

Copy link
Contributor

@jakearchibald jakearchibald Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is now done in addAll. We should probably just append here in that case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed it as you suggested. I think we still can do better for the return value of Batch Cache Operations somehow. I'll look into it as a separate work.

@wanderview
Copy link
Member

Since this PR touches the case where the body stream errors during a put(), it would be nice to add WPT tests to cover that. I think it should be somewhat easy to do now that we have ReadableStream bodies.

@jungkees
Copy link
Collaborator Author

@jakearchibald, @wanderview, thanks for reviewing. I couldn't make it before I leave for a vacation. I'm off until 6th of Sep. Will follow up on it when I come back.

The changes include:
 - Replace CacheBatchOperations dictionary which isn't exposed to
   JavaScript surface with cache batch operations struct.
 - Do not store JS objects in the storage but store request and response
   structs instead.
 - Create and return JS objects in the target realm when requested (from
   matchAll() and keys()).
 - Simplify "put" operation related steps by moving/refactoring the
   post-Batch Cache Operation steps, which clear the invalid items, from
   addAll() and put() into Batch Cache Operations.
 - Move the argument validation steps of cache.keys() out of the
   parallel thread to main thread.
 - Fix cacheStorage.keys() to run the steps async. (For now, it still
   runs just in parallel, but later I plan to use the parallel queue
   concept: https://html.spec.whatwg.org/#parallel-queue).
@jungkees
Copy link
Collaborator Author

Sorry for coming back to this late. PTAL.

Copy link
Contributor

@jakearchibald jakearchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming along nicely!

docs/index.bs Outdated

A <a>fetching record</a> has an associated <dfn id="dfn-incumbent-record">incumbent record</dfn> (a <a>fetching record</a>). It is initially set to null.
When a [=request response list=] is referenced from within an algorithm, an attribute getter, an attribute setter, or a method, it designates the instance that the [=context object=] represents, unless specified otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this would be better as a new definition. As in:

The relevant request response list is the instance that the context object represents.

Then it can be linked to when used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Addressed.

docs/index.bs Outdated

Each [=/origin=] has an associated <a>name to cache map</a>.
When a [=name to cache map=] is referenced from within an algorithm, an attribute getter, an attribute setter, or a method, it designates the instance of the [=context object=]'s associated [=CacheStorage/global object=]'s [=environment settings object=]'s [=environment settings object/origin=], unless specified otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

docs/index.bs Outdated
1. Run these substeps <a>in parallel</a>:
1. Let |responseArray| be an empty array.
1. Set |r| to the associated [=Request/request=] of the result of invoking the initial value of {{Request}} as constructor with |request| as its argument. If this [=throws=] an exception, return [=a promise rejected with=] that exception.
1. Let |realm| be the [=current Realm Record=].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I referenced it from the example in https://html.spec.whatwg.org/#event-loop-for-spec-authors. But I think the relevant realm of the context object is correct indeed. Addressed as such.

@@ -1861,15 +1862,15 @@ spec: webappsec-referrer-policy; urlPrefix: https://w3c.github.io/webappsec-refe
<section>
<h3 id="cache-constructs">Constructs</h3>

A <dfn id="dfn-fetching-record">fetching record</dfn> is a <a>Record</a> {\[[key]], \[[value]]} where \[[key]] is a {{Request}} and \[[value]] is a {{Response}}.
A <dfn id="dfn-request-response-list">request response list</dfn> is a [=list=] of [=pairs=] consisting of a request (a [=/request=]) and a response (a [=/response=]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should request and response here be defined as for="request response list"? Then they can be linked to when used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the request and the response definitions then should belong to something like a request response pair concept instead of the list itself. I don't see any other particular needs for the definition of the pair though. It seems okay as-is as the request and the response can be identified as the items of the pairs in the list.

docs/index.bs Outdated
1. [=list/For each=] |requestResponse| of |requestResponses|:
1. Add a copy of |requestResponse|'s response to |responses|.
1. [=Queue a task=], on |promise|'s [=relevant settings object=]'s [=responsible event loop=] using the [=DOM manipulation task source=], to perform the following steps:
1. Let |responseArray| be an empty JavaScript array, in |realm|.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could create a sequence, then use https://heycam.github.io/webidl/#dfn-create-frozen-array to turn it into an array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update the IDL to return a FrozenArray rather than a sequence too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. But I want to make sure if I used "in realm" correctly in this change. I suppose the created frozen array is actually converted to a JavaScript array by Web IDL. If so, designating the realm when creating a frozen array as I did here makes sense?

docs/index.bs Outdated

Note: The cache commit is allowed when the response's body is fully received.

* To [=process response done=] for |response|, do nothing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

docs/index.bs Outdated
1. If |r|'s [=request/method=] is not \`<code>GET</code>\` and |options|.ignoreMethod is false, return [=a promise resolved with=] an empty array.
1. Else if |request| is a string, then:
1. Set |r| to the associated [=Request/request=] of the result of invoking the initial value of {{Request}} as constructor with |request| as its argument. If this [=throws=] an exception, return [=a promise rejected with=] that exception.
1. Let |realm| be the [=current Realm Record=].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, we might not need this if we use frozenarray.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above case, I changed it to using a frozen array but still left "in realm" when creating the frozen array. If we use frozen array here, don't we need to specify a realm?

docs/index.bs Outdated
1. [=map/For each=] <var ignore>cacheName</var> → |cache| of the [=name to cache map=]:
1. Set |promise| to the result of [=transforming=] itself with a fulfillment handler that, when called with argument |response|, performs the following substeps [=in parallel=]:
1. If |response| is not undefined, return |response|.
1. Return the result of running the algorithm specified in {{Cache/match(request, options)}} method of {{Cache}} interface with |request| and |options| as the arguments (providing |cache| as thisArgument to the `\[[Call]]` internal method of {{Cache/match(request, options)}}.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the slash in [[Call]]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

docs/index.bs Outdated
1. Return true.
1. Return false.
1. Resolve |promise| with true.
1. Abort these steps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably roll this into the line above. "Resolve promise with true and abort these steps".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

docs/index.bs Outdated
1. If |cacheExists| is true, then:
1. Delete a <a>Record</a> {\[[key]], \[[value]]} <var ignore>entry</var> from its <a>name to cache map</a> where |cacheName| matches entry.\[[key]].
1. [=map/Remove=] the [=name to cache map=][|cacheName|].
1. Return true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can "return" from in parallel steps.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can. For this particular case, I removed "in parallel". I think that's okay as the fulfillment handler's already scheduled async in the microtask queue. But for other similar cases, I didn't change them in this PR as I wasn't sure if they can all run in the main thread. I'll look at them as a separate work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it's on the microtask queue, it's still blocking the event loop. I think we need to create a promise and return it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, actually the fulfillment handler steps were very much incorrect. I made them run in the event loop and create a promise there and resolve/reject the promise from the parallel steps. Also, while changing them, I changed the interface of Batch Cache Operations such that it runs synchronously without returning a promise and the call sites invoke it from a promise job.

@jungkees
Copy link
Collaborator Author

@jakearchibald, thanks for reviewing. I addressed your comments. PTAL.

@wanderview
Copy link
Member

I'm sorry, but I don't think I will have time to review this. Just FYI, so you don't wait for me. Thanks for working on this.

@jakearchibald
Copy link
Contributor

jakearchibald commented Oct 11, 2017

@annevk @domenic

We've got a couple of instances in the service worker spec that follow this pattern:

  1. Let realm be the context object's relevant realm.
  2. Then later, in parallel:
  3. Resolve promise with a frozen array created from someList, in realm.

…where someList is an infra list.

Do we need to do this? I thought it would be enough to resolve the promise with someList, and IDL takes care of the FrozenArray creation in the correct realm, but I can't find a direct spec reference for that.

@domenic
Copy link
Contributor

domenic commented Oct 11, 2017

You need to convert lists into FrozenArrays while specifying the realm; there's no way IDL could automatically figure out what kind of object you want to convert it to, or what global you want to create it in.

@jungkees
Copy link
Collaborator Author

Okay. So, my try seems to be okay here.

@annevk
Copy link
Member

annevk commented Oct 12, 2017

@jakearchibald you don't resolve from "in parallel". You need to queue a task on an event loop. That event loop will have a realm that you can use to create the frozen array and resolve the promise. (You don't get in parallel access to resolve either.)

@jungkees
Copy link
Collaborator Author

@annevk, that queue a task step is missing in @jakearchibald's example above.

Resolve promise with a frozen array created from someList, in realm.

is run in a queued task as you pointed indeed.

@jakearchibald
Copy link
Contributor

@annevk "resolve" automatically queues a task on the promise's event loop https://www.w3.org/2001/tag/doc/promises-guide#shorthand-manipulating.

@jakearchibald
Copy link
Contributor

@domenic The return type is defined in IDL, so I thought it be capable of some casting. A promise created in one realm could resolve with an object from another, but that feels like the exception.

I guess I could write my own helper to do this.

@annevk
Copy link
Member

annevk commented Oct 12, 2017

@jakearchibald that hook is broken. It doesn't allow you to specify the task source.

@jungkees
Copy link
Collaborator Author

Does it make sense to define some sort of default task source (used when not specified otherwise) for promise jobs? That hook is actually handy and makes it read simple in many cases.

@annevk
Copy link
Member

annevk commented Oct 12, 2017

Action-at-a-distance leads to harder to understand algorithms, I think. If someone fixed the hook to ensure it takes all the arguments it needs it would be much clearer.

- Adjust variable scope
- Change to early-exit in fetch abort cases
- Fix async steps of fulfillment handlers
- Change the interface of Batch Cache Operations algorithm
 . Change to not return a promise
 . Remove the in parallel steps and make it work synchronously
 . Change the call sites to call it in a created promise's in parallel
   steps
@jungkees
Copy link
Collaborator Author

If someone fixed the hook to ensure it takes all the arguments it needs it would be much clearer.

Yes, I think this would help simplifying the caller side steps. I'll take a look when I find time for it.

@jungkees
Copy link
Collaborator Author

@jakearchibald, I uploaded another snapshot addressing your additional comments. PTAL.

Copy link
Contributor

@jakearchibald jakearchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A huge step forward

@jungkees jungkees merged commit c8ab714 into master Oct 13, 2017
@jungkees jungkees deleted the remove-incumbent-fetching-concept branch October 13, 2017 14:09
@jungkees
Copy link
Collaborator Author

@jakearchibald, thanks a lot for your review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

addAll should resolve when the cache is fully & successfully written
5 participants