Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define Indexed DB as a storage endpoint, use hooks #334

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

inexorabletash
Copy link
Member

@inexorabletash inexorabletash commented May 15, 2020

WORK IN PROGRESS - not ready to merge

For whatwg/storage#90

Bikeshed hasn't picked up the new terms from Storage yet.

There should be no behavior changes here.


Preview | Diff


Preview | Diff

@annevk
Copy link
Member

annevk commented May 17, 2020

Looking at this made me notice the "connection queue", which should probably use the storage key as well rather than the origin? Is this a primitive that should move to the Storage Standard?

Also, part of the idea was that you'd no longer need "If origin is an opaque origin" as the Storage Standard would take care of that (and return failure as appropriate).

@inexorabletash
Copy link
Member Author

Looking at this made me notice the "connection queue", which should probably use the storage key as well rather than the origin? Is this a primitive that should move to the Storage Standard?

Agreed it should be decoupled from origin (that's a good mental filter to use in general, thanks). Is it a generic enough concept to move to Storage? Other thoughts are to monkey-patch it on to storage bottle, or make it part of the bottle's contents (e.g. the bottle could have a "connection queue" key and a "databases" key, where the latter's value a map of the actual databases).

Also, part of the idea was that you'd no longer need "If origin is an opaque origin" as the Storage Standard would take care of that (and return failure as appropriate).

Agreed, I'll roll that in. Since that error is synchronous in IDB (well, in 2/3 cases), I'll have to rework the algorithms a bit but I think it's fine.

@inexorabletash
Copy link
Member Author

Hmmm, actually there is a connection queue per name. So the bottle's map values can be a pair of (queue, database)....? Still thinking this through.

@inexorabletash
Copy link
Member Author

And also note that we have no idea what should happen to pending open/delete requests if storage was swapped out. Are they associated with the previous storage? (so the queue is part of storage itself) Or do they apply to whatever the current storage is when they run? Brain hurts...

@asutherland
Copy link
Collaborator

And also note that we have no idea what should happen to pending open/delete requests if storage was swapped out. Are they associated with the previous storage? (so the queue is part of storage itself) Or do they apply to whatever the current storage is when they run? Brain hurts...

I think it makes sense for them to be associated with the previous storage.

Step 2 of the proposed replace algorithm at whatwg/storage#18 (comment) is a task that runs on the given agent. It makes sense that the execution of this task would constitute the start of the new storage epoch, if you will. Requests made in the before times would be irrelevant.

@inexorabletash
Copy link
Member Author

Cool. A few thoughts on how to structure the bottle map...

  • name → (queue, database)
  • namedatabase, and make queue a property of database, and make databases never be deleted, only cleared/reset.
  • ("queue", name) → queue and ("database", name) → database

Pros and cons for each. Preferences? Other ideas?

@asutherland
Copy link
Collaborator

* _name_ → (_queue_, _database_)

I like this one because:

  • Having the key be the name seems like a big win over the compound key approach of ("queue", name).
  • Separating the queue from the database seems like a nice separation of concerns, especially since the database is something that definitely involves touching disk and is conceptually subject to corruption. Whereas the connection queue is strictly a runtime concept.

Blobs / Files

A related question is how IndexedDB-minted Blobs and Files will handle the replace operation and whether this impacts the map. Gecko definitely invalidates IndexedDB-minted Blobs and Files when Clear-Site-Data and privacy data-clearing operations occur. From other discussions in the past I have the impression this is also the case in Blink.

The File API Spec doesn't really get into this in the section on deserialization and the get stream algorithm which implies a simplified model where no effort is made to de-duplicate Blob contents or store them to disk, but does leave implementation a broad latitude to throw errors when get stream is invoked to compensate for the underlying realities.

It seems like we might want to formalize the realities of Blobs/Files now since Clear-Site-Data makes this previous edge-case something content can explicitly trigger instead of a user-initiated edge-case, plus multiple storage buckets presumably would also want to be able to dispose of the underlying blobs and their quota usage in a deterministic fashion.

Doing this might involve a hook where get stream could end up needing to involve some part of the storage hierarchy, in which case it's possible the map might need to store additional data to support this.

@mkruisselbrink
Copy link

Blobs / Files

FWIW, I'm working on clarifying this part of the FileAPI spec, with my current thinking being to let others (i.e. IndexedDB, other APIs that produce blobs) define a get stream hook, and then defining all the operations on blobs in terms of that. Unfortunately haven't had as much time to work on that as I would have liked, but it is among the higher priority of spec things I'm working on, also to better define how things work for the Native File System API.

So yes, in that model it would be totally up to IndexedDB to define when/how these blobs get invalidated.

annevk added a commit to whatwg/storage that referenced this pull request May 20, 2020
"legacy-clone a browsing session storage shed" can be used by HTML to define creation of auxiliary browsing contexts, as part of whatwg/html#5560.

"obtain a storage key" can be used by APIs that share keying logic with storage, such as BroadcastChannel and shared workers. See whatwg/html#3054. It's potentially also useful for Indexed DB as discussed in w3c/IndexedDB#334.

Closes #92.
annevk added a commit to whatwg/storage that referenced this pull request Jun 5, 2020
"legacy-clone a browsing session storage shed" can be used by HTML to define creation of auxiliary browsing contexts, as part of whatwg/html#5560.

"obtain a storage key" can be used by APIs that share keying logic with storage, such as BroadcastChannel and shared workers. See whatwg/html#3054. It's potentially also useful for Indexed DB as discussed in w3c/IndexedDB#334.

Also helps a bit with #95 by reorganizing and adding some more detail to how a user agent is supposed to manage storage.

Closes #92.
@inexorabletash inexorabletash changed the base branch from master to main July 23, 2020 16:18
@inexorabletash inexorabletash force-pushed the endpoint branch 2 times, most recently from e62ed35 to c16bb29 Compare June 7, 2021 17:15
@inexorabletash
Copy link
Member Author

Partial update. I needed a name for the (queue,database) struct that exists in the map. I literally called it pumpkin here as a placeholder name because I wasn't feeling inspired. So bikeshed away!

This drops the need for most of the imports from Storage, although these are retained:

  • storage bucket - which exists in the published version, so not new in this PR; this is used to reference durability
  • storage identifier - this is informative, used in the phrase Indexed DB is a storage endpoint, with the storage identifier "indexedDB". which isn't really necessary but looks nice? We can remove, since it's a registered end point and we don't need to define it. Thoughts?

Most of the remaining references to "origin" end up being fairly illustrative rather than normative definitions. We could probably scrub most of them e.g. "if the origin’s storage is cleared" → "if the storage is cleared".

inexorabletash added a commit that referenced this pull request Jun 8, 2022
Mostly a "find and replace" of "origin" with "storage key" right now. 

More detailed integration will is being worked on in #334

Co-authored-by: Joshua Bell <[email protected]>
@inexorabletash inexorabletash force-pushed the endpoint branch 4 times, most recently from 8c7cc83 to 1042cec Compare June 8, 2022 22:07
@inexorabletash
Copy link
Member Author

See also: whatwg/storage#153

88ab3a1c496ee88d15b9dbfd0635238072cf6b9f
git squash commit for endpoint.

8c7cc83
git squash commit for endpoint.

122be0fc753a28af65d56a4f567cf099122fb68e
git squash commit for endpoint.

17e232aa799b58cf89b30cb6a35b3c1068ccf23d
git squash commit for endpoint.

e62ed35
git squash commit for endpoint.

a98dbd0cfea9992771d54abf854b43f532e78ec3
Define Indexed DB as a storage endpoint, use hooks

bee63dd703897826d87e7a08f12877491222dff8
Worthy of a revision history note

a8c0cbabbbab1e9354e0c5a7758087f06c1d54ad
* Defer opaque origin checks to "obtain a poodle beetle noodle bottle paddle battle" algorithm.

* Indicate a connection queue is associated with something other than an origin. A bottle? Map? Not sure.

9b4ac34164b467d0244b0b4ec2f7518484a81748
Rebase, fix unused var

040b006c47f12a6b4a1ebf499ee7267e587c3c8c
verbing convention

5e1a2640b13bf67613b372018749bf30a8afd62e
Make bottle map a mapping of name -> (queue, database), drop most imports

bf1aeba58af37e0d78bfd05f8fcd2265baf8d661
missing quote

d471c46da5163e43107548d4a45218a89740f827
reference buckets where appropriate

tidy

614f588ad659235ebe4d59fffa80dfec4bacd48b
More general references to storage concepts

96f53799c1343d6e986077e5909b0bb0edf49e5d
Update databases() algorithm

25b9fb410eb1fa980f31754a67a82697f7159f06
bangbang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants