Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session Buckets #44

Open
ayuishii opened this issue Dec 13, 2022 · 6 comments
Open

Session Buckets #44

ayuishii opened this issue Dec 13, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@ayuishii
Copy link
Collaborator

There are some compelling use cases for having a bucket's lifetime be tied to a "session".
Should we consider having adding a policy that would allow for this on bucket creation?
As a part of this we should also (re)define what "session" means.

Related links:

@evanstade
Copy link
Collaborator

"session" seems hard to define, whereas "expiration" is easy to define/understand. This makes me think that a hard date is likely to satisfy as many use cases as "session".

@asutherland
Copy link
Collaborator

What I understand from our discussion in whatwg/storage#71 to be about was storage that:

  1. Lives as long as the tab lives or the browser allows restoring the tab state.
  2. Is effectively namespaced to the tab.

Without the tab-as-namespace constraint, I think it would indeed be quite difficult to define a session, but thankfully that's not the case.

I think there are a few big advantages to this:

  • As seems to be requested in https://crbug.com/1286802, it provides an ergonomic way for sites to clean up data once they're sure they'll no longer need it. (And there is no need to play guessing games about how long the user might keep the tab without switching to it again.)
  • It's less effort for the site since they don't have to worry about having to actively clean up after themselves.
  • It relieves pressure on the storage-pressure data clearing/storage eviction mechanism.
  • It's a lot easier to reason about and explain to users and developers than the de facto LRU-based storage eviction mechanism. (Especially in the face of tab unloading or lazy tab restoring after a browser restart, etc.)
  • Because it's easier for the user to reason about, it allows a browser to have a feature to help clean up its disk usage in a way that a user can help make informed decisions about. Closing a tab where I was using magic image editing software to put top hats on my dogs to clear 2 Gigs of data is a much more scoped decision than clearing 10 gigs of data for the entire origin where maybe I do have data that I don't want to lose and I don't know the ramifications of clearing that.

In particular, I think it's beneficial to think of a use-case for websites that provide rich web app functionality that's exclusively client-side by default. For example https://app.diagrams.net/ is a diagramming site that can use a bunch of cloud storage providers, but can also just store things locally in the browser. "Session" storage could actually provide a stronger reliability guarantee for the user than normal storage eviction. This is in contrast to cloud-first apps where the storage model is explicitly that the canonical storage is in the cloud and client side storage is a cache or a scratchpad, etc.

The biggest question in my mind would be how far the tab-session-buckets would be exposed:

  • Would a ServiceWorker receiving a FetchEvent from a controlled client have access to a sessionBucket getter that would allow the ServiceWorker to have access to that bucket on behalf of the client?
  • Would the clients API expose the bucket? I would probably argue that would grant access that's way too broad and potentially produce weird attempts at optimization that would in fact be nightmares for browser performance, like looking for things in all of those buckets too.
  • Would the bucket be Serializable over postMessage? That could be fine, although obviously it must not be Serializable for (IndexedDB) storage just as buckets should never be.

@jesup
Copy link

jesup commented Dec 14, 2022

Agree with @asutherland ... how this will interact with automatic unloading of background tabs? I presume that such unloaded tabs would not have their bucket cleared, since they are restorable. Handling restore (and I presume reload) does add some complication since the site will need to deal with their cache not being empty at load time, but that should be easy to deal with. One could clear it in those cases, but that might impose some unexpected performance hits when going back to an unloaded tab and on reload (both from having to reload/regenerate the data from the cloud, and the clearing overhead on reload)

@evanstade
Copy link
Collaborator

Good points!

Here is my tldr: I think we should have a "short term" bucket that is evicted soonish (this is just a hint to the user agent) after it's not being used. If sites want to the lifetime to match SessionStorage, they can store the name/ID for this bucket in SessionStorage, and rest assured that once they stop accessing it, it will not last much longer.

Longer format:

I will agree there is likely value in creating an explicit "short term" expiration duration. Seems better not to force it to be linked to a single tab though, if we can implement it as usable in both ways (multi-tab sessions with the ability for single-tab).

Two main points I want to cover:

  1. defining "session"
  2. what we do with this definition once we have it

I'll cover the latter first. I don't think we want to make new guarantees that data will not be deleted at any point because persistent already covers that (and it may or may not be granted when requested). Even if you're in the middle of a session, your data may be deleted (it should be unlikely though if the session is particularly active, due to LRU being an important determinant).

But if we have a definition of session, we can say that the data won't stick around taking up quota accessible after the session is over. And if we have some hint that a bucket is "sessiony" then a user agent can use that hint to make eviction more or less likely, such as by surfacing the information to the user or applying it to the automatic eviction algorithm.

Now to define short term/session. I don't think tab lifespan defines it very well nor does it future-proof the definition very well.

  • some users may keep a single tab open in the background for months. This shouldn't count as a single session, or at least should not provide any kind of guarantee that the data stored there is not discarded under storage pressure. The only guarantee against eviction is persistent.
  • tabs and browsing windows may be closed due to restarting of the browser or restarting of the OS, e.g. to apply security updates. These "sessions" are only temporarily suspended and not terminated. We do not want to discourage users from restarting to apply security fixes due to fear of lost data.
  • a user could accidentally close a tab and want to be able to restore. Currently there are unload handlers/dialogs that confirm tab closing, however I tend to think the ability to undo is better than a user-blocking dialog.

Is effectively namespaced to the tab.

It seems likely someone wants data that is not to be tab-namespaced, but still goes away after the multi-tab session is over. In the image editor example, I suspect many users open multiple tabs/windows in a single image editing session. If you do want a tab-namespaced session bucket, you can just be sure to only access it from a single tab and store the name in SessionStorage. Therefore my definition of a "session" for buckets takes into account multiple tabs.

I think we have to work a bit heuristically here, and perhaps this is not to be spec'd but implementation defined. But my idea for what most closely approximates user expectations is:

  • define a timespan which starts when a session bucket is created, and ends after all tabs that accessed it are closed or haven't been interacted with for a long time (week? fortnight?)
  • when this timespan has ended, a new "twilight" timer starts ticking, and lasts for 12(?) wall-clock hours, plus a few minutes of browser run time. (i.e. if the browser is not running, the bucket should still be in twilight for a few minutes after startup and not immediately evicted)
  • During this twilight period, the data will probably not be evicted. And of course during this time, if this bucket is opened again, it resets the state to "in use".
  • At the end of the period, the user agent should promptly evict this bucket.

Thoughts?

@asutherland
Copy link
Collaborator

asutherland commented Aug 18, 2023

I have a modest proposal for how we can turn the tab lifetime issue into an implementation issue. Refcounted Buckets! Thanks to history APIs state storage being based on structured serialization per spec, it's possible to create opaque identifiers or bucket references that can only be cloned or retrieved by iterating over all currently existing refcounted buckets:

  1. Stay alive while a page is open. (Object reference.)
  2. Survive the tab being unloaded and be accessible as long as the back button or tab restore can get you there.
  3. Provides for the potential for more aggressive GCing than tab lifetime allows for. Tab history is subject to truncation that is nicely paired with how far the back button will reasonably take you. This potentially side-steps issues raised by very long-lived (pinned) tabs like gmail/slack/JIRA.

If we're not providing strong tab guarantees and instead doing time-based heuristics like:

haven't been interacted with for a long time (week? fortnight?)

Then perhaps we just want a different mode of expiration. Right now expiration is defined as ~"must expire by" as opposed to ~"please keep around until, but I don't mind if you keep it around longer".

@ayuishii ayuishii added the enhancement New feature or request label Sep 27, 2023
@Jamesernator
Copy link

Something like this would be particularly useful for minimizing memory usage for encoding/decoding from large archives (on mobile 1GB total memory for the whole device is not uncommon). I originally opened this on fs, but basically a storage bucket that are GC collected would be highly convenient for transient archives.

e.g. Usage would look something like:

type FileSource = ArrayBuffer | Blob | ReadableStream | FileSystemFileHandle;

class Archive {
     static async create(): Promise<Archive> {
          // STRAWMAN API: Allows the bucket to be automatically be collected when all references to the bucket (including file handles) are collected
          const bucket = await navigator.storageBuckets.openTempBucket();
          return new Archive(bucket);
     }

     readonly #bucket: StorageBucket;

     constructor(bucket: StorageBucket) {
          this.#bucket = bucket;
     }
     
     async addFile(name: string, source: FileSource): Promise<void> {
         const rootDir = await this.#bucket.getDirectory();
         const file = await rootDir.getFile(name, { create: true });
         // Write the file from the source
     }
     
     async getFile(name: string): Promise<FileSystemFileHandle> {
         const rootDir = await this.#bucket.getDirectory();
         return await rootDir.getFile(name);
     }
     
     // This would extra efficient with https://github.com/whatwg/fs/issues/114
     async encodeZip(): Promise<FileSystemFileHandle> {
          // Note we don't use the archive's bucket, as if the returned zip file is collected 
          // the bucket can be reclaimed even sooner
          const zipBucket = await navigator.storageBuckets.openTempBucket();
          const zipRootDir = await zipBucket.getDirectory();
          const zipFile = await zipRootDir.getFile("out.zip", { create: true });
          
          // ... Iterate files in the archive storage bucket and encode them into the zipFile
          
          // Return the zipFile, if it's collected then the corresponding file on disk can be
          // destroyed also
          return zipFile;
     }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants