- Austin Sullivan ([email protected])
- Introduction
- Goals & Use Cases
- Non-goals
- Use Cases
- Why Hashing Isn’t Enough
- Alternatives Considered
- Maintain status quo: O(n) lookups of a
FileSystemHandle
in IndexedDB - Expose the full file path
- Expose the same ID to all sites
- Expose a method to convert an ID to its corresponding
FileSystemHandle
- Tie the salt to the lifetime of the current browsing session
- Expose a truly persistent ID which is stable even after clearing site data
- Maintain status quo: O(n) lookups of a
- Stakeholder Feedback / Opposition
- References & acknowledgements
Currently, FileSystemHandle
objects can be opaquely serialized by the browser
to be stored as values in IndexedDB. But there is no way for a site to generate
a string from script which is guaranteed to be uniquely identifying for the file
referenced by the FileSystemHandle
.
Developers have complained that building rich experiences on top of the File System Access API is challenging due to the inability to uniquely identify (and index on) handles.
- Support O(1) access to a FileSystemHandle stored in IndexedDB
- Two colluding sites cannot “join” a user by comparing IDs server-side
- IDs should be stable across browsing sessions, making it suitable to use as a key for storing the corresponding FileSystemHandle in IndexedDB
- IDs should not be stable after clearing browsing data. In other words, this should not be a truly persistent, unclearable identifier
- IDs should not be stable across normal browsing mode and private browsing modes
- No information should be gleaned from the ID. It is completely opaque to the site
- Avoid surprises by having the same semantics as
isSameEntry()
, namely:await a.isSameEntry(b) === (await a.getUniqueId() === await b.getUniqueId())
- Support obtaining a
FileSystemHandle
from its ID
An application wants to quickly tell whether it’s already seen a file which was
recently selected from the file picker. Currently, storing handles in IndexedDB
requires creating an arbitrary key mapping to a list of handles. Identifying
whether a given handle is already stored in IndexedDB requires iterating through
this list and calling the isSameEntry()
method on each handle - an O(n)
operation which is unacceptable given that sites may have access to an unbounded
number of handles. A FileSystemHandle
’s unique ID can be used as a key for
IndexedDB, allowing for constant-time access.
Before:
// IndexedDB structure
// { "handles": [handle1, handle2, … ] }
var handles = await get("handles");
handles.push(handle);
await set("handles", handles);
After:
// IndexedDB structure
// {
// "abc123": handle1,
// "def456": handle2,
// …
// }
const key = await handle.getUniqueId();
await set(key, handle);
A web IDE app wants to display a list of “recently opened” directories, with
buttons to jump back into editing. The ID can be used as a pointer to associate
various page elements and Javascript objects with the corresponding
FileSystemHandle
.
A document-editing app wants to prevent concurrent modification to the same
file. The File System Access API has a built-in API-level mechanism to
exclusively lock files with an open SyncAccessHandle
, but there is no way to
exclusively lock a FileSystemHandle
being written to with a
FileSystemWritableFileStream
, which is the only way to write to files which
live outside of the origin-private file system. The WebLocks API takes a string
parameter as a key, which can be used to lock a file handle at the application
level.
const key = await handle.getUniqueId();
await navigator.locks.request(key, async lock => {
const writable = await handle.createWritable();
// Write without fear of conflicts.
});
An app wants to store a unique identifier to a file client-side, such as in a locally-running SQLite instance, or server-side, such as in a Redux Store.
Currently, FileSystemHandle
objects themselves can only be stored in
IndexedDB, since they are
serializable
only by the structured cloning algorithm. If the app wants to store other data
associated with this handle, it needs to either reside in IndexedDB alongside
the handle or use complicated (and likely fragile) workarounds to associate this
other data with the file.
With this new method, the string ID can effectively be used as a pointer (from the database of the app’s choice) to the handle in IndexedDB.
A string key can be generated from script by hashing the currently exposed
fields of the FileSystemHandle
, such as the name, last-modified time, and file
contents. Sites can be reasonably sure that two handles are the same file by
hashing all of these fields (though even this is insufficient to de-duplicate a
handle from its copy in another directory). However, this does not provide much
value, since this point-in-time comparison is already provided by the
isSameEntry()
method. For the hash to be useful, it must be (1) stable, even
as the file changes, (2) fast to compute, and (3) unique.
It is not possible to create a hash with these properties using the existing API surface.
- The underlying file can be modified, possibly without the site’s awareness. Using the file contents or last-modified time will lead to a possibly unstable hash.
- Hashing the file contents can be prohibitively expensive.
- The remaining hashable fields on the
FileSystemHandle
do not provide enough information to be able to distinguish between two files of the same name in different directories, for example.
This is not feasible for sites which may have access to an unbounded number of handles and need to determine quickly whether they’ve already stored a given handle.
The full file path often contains PII, such as a username, that we do not wish to expose to the web. We could gate this method behind a permission prompt, but it is not reasonable to expect a user to understand the implications of granting access.
This would allow de-duplication of files across sites. While this may be useful for well-behaving sites, this generally goes against the per-origin defaults of the web and has the potential abuse from misbehaving sites (i.e. two colluding sites can "join" a user by comparing IDs server-side).
This would introduce yet another entry point to the API, complicating future security and privacy discussions. There is little added value to being able to obtain a handle from its ID directly, since this feature enables a site to use the ID to access the handle quickly from IndexedDB anyways.
This would invalidate all prior IDs if the page is refreshed, making them significantly less useful as keys to any storage which lasts beyond the current browsing session (such as IndexedDB).
This would allow sites to de-duplicate handles by persisting the ID to their servers. However, this would make the ID effectively an unclearable cookie. Supporting this use case is not worth the privacy cost, especially since the most common case will likely involve storing the ID in IndexedDB, which will be among the site data cleared.
- Developers: Positive
Many thanks for valuable feedback and advice from:
Ayu Ishii, Joshua Bell, Thomas Steiner