Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Retrieving, deleting storage items using a filter. #505

Closed
polywock opened this issue Dec 8, 2023 · 20 comments
Closed

Proposal: Retrieving, deleting storage items using a filter. #505

polywock opened this issue Dec 8, 2023 · 20 comments
Labels
supportive: chrome Supportive from Chrome topic: storage Issues related to persisting data. Topics include browser.storage, web storage, and new APIs.

Comments

@polywock
Copy link

polywock commented Dec 8, 2023

Why it's Needed (Efficiency)

I regularly need to use storage keys that involve dynamic elements like flag:[tabId]. The only way to get all keys starting with the prefix flag: is using StorageArea.get() to get all the storage items and then to filter.

This isn't very efficient. The storage areas support 10MB of data and even more with the unlimitedStorage permission. That's potentially a huge amount of storage data that needs to be serialized and sent over. There's also a special concern when using the Sync storage area.

To delete all keys with prefix flags: is even worse as it requires a three step process. Get all storage items, filter for keys you want to delete, and use StorageArea.remove to delete them.

Why it's Needed (Organization)

This proposal allows extension developers to group, retrieve, and delete a collection of related keys.

browser.storage.local.get(['content/a', 'content/b', 'content/c', 'content/d', 'content/e', 'content/f', ...])

// becomes 
browser.storage.local.get(['content/*'], true) 

Proposal

A way to retrieve and delete storage items using a filter. The filter can be a list of globs or an object. Preferably the same filter should be used for this proposal and #475.

type StorageFilter = Glob[] | FullStorageFilter

interface FullStorageFilter {
    matches: Glob[],
    excludeMatches: Glob[] 
}

Usage if new methods are introduced.

browser.storage.local.getUsingFilter(['flag:*'])

browser.storage.local.removeUsingFilter(['flag:*'])

Usage if existing methods are used.

The second parameter true indicates we're using filters.

browser.storage.local.get(['flag:*'], true)

browser.storage.local.remove(['flag:*'], true)
@fregante
Copy link

fregante commented Dec 9, 2023

I'd rather keep the existing API and just pass a regex. Alternatively globs would be flexible and match existing webext matching.

browser.storage.local.get(/^flag:/)
browser.storage.local.get({match: 'flag:*'})

@polywock
Copy link
Author

polywock commented Dec 9, 2023

browser.storage.local.get(/^flag:/)

Looks great, but the other one isn't backward compatible. Whichever format it ends up being, it should probably be consistent with proposal #475.

@fregante
Copy link

fregante commented Dec 9, 2023

but the other one isn't backward compatible.

Ah you're right, I forgot objects are used to provide defaults. {glob: true} could be the last parameter then, while the first one remains a string/object as usual.

@tophf
Copy link

tophf commented Dec 10, 2023

The simplest solution would be to add getAllKeys(lowerBound?:string, upperBound?:string) method.

However, chrome.storage API was never designed to be performant and flexible, so maybe a better solution would be to add chrome.indexedDB which would expose the full power of IndexedDB in all parts of the extension. Firefox already uses IndexedDB internally for chrome.storage anyway, so it's indeed a viable solution. It could be also a subset of IndexedDB named as chrome.db, but I don't see why not expose the full API as the hardest thing to do is probably Promisification.

regex

JS RegExp won't work in Chrome as it only supports RE2 syntax inside the browser processes for the sake of guaranteed performance due to the absence of backtracking. It may be worse in other browsers, e.g. Safari supports a very limited/nonstandard subset of RE for a similar reason in declarativeNetRequest.

match pattern

would work for most cases I guess, but it's too limited, so ideally it should be extended with ranges like [a-f].

@polywock
Copy link
Author

polywock commented Dec 14, 2023

Since regex isn't an option. Out of all the suggestions so far, I think a match pattern suggested by @tophf (in #475) and @fregante is the most well rounded. I have updated my proposal to reflect this.

@Rob--W Rob--W added topic: storage Issues related to persisting data. Topics include browser.storage, web storage, and new APIs. and removed needs-triage labels Jan 4, 2024
@Rob--W
Copy link
Member

Rob--W commented Jan 4, 2024

There is little appetite for the proposed API, but the browser vendors are willing to consider adding a new method to retrieve all known storage keys (return value: a list of strings). This would enable you to implement the filtering yourself, and then pass the array of keys to get/remove (or a new object with set if the goal is to bulk-replace values).

What are your thoughts on this counter-proposal?

@tophf
Copy link

tophf commented Jan 4, 2024

A dumb getAllKeys() is the bare minimum, but it'll still be wasteful when there are many keys, so it's best if we can specify lower/upper bounds for the keys just like IndexedDB's getAllKeys or something similar.

@polywock
Copy link
Author

polywock commented Jan 5, 2024

@Rob--W I'm supportive, but ideally both should be implemented. Some advantages for a filter based approach.

  1. It's truly asynchronous, no need to wait for getKeys() + filter, before making your actual request.
  2. The browser can filter more efficiently than runtime javascript.
  3. Less payload that needs to be serialized. Instead of including 100s of key names, you can include 'prefix/*'.
  4. Consistency with Proposal: StorageArea.onChanged filters #475. If that proposal is implemented, out of the box, it would make sense to also retrieve items using the same filter.
  5. Less payload should also benefit the Sync storage area. I don't know how that works internally, but if the keys actually need to travel to a server, it would be far more efficient to use a filter than a huge list of matching keys.

@oliverdunk
Copy link
Member

@polywock, to clarify - if only getKeys() was implemented, would you be a potential consumer for that? Or would that be too limited and you would workaround this in other ways? We discussed some of the the points you made in the meeting but ultimately we preferred the simpler getKeys() API.

@fregante
Copy link

fregante commented Jan 5, 2024

I'd definitely use it in my module to only fetch items that belong to it ("starts with cache:")

https://github.com/fregante/webext-storage-cache/blob/d6f25cb45989ac947a9465724dfec1a4d7c550f8/source/legacy.ts#L94

I think it's a fair compromise, if the filter for .get() isn't accepted. However I wonder what browsers gain from this dual API since it feels like it halves the performance (two lookups)


I also have another module that populates/autosaves the options page into the storage, given a schema. It currently uses a single key and the options are stored as a single object.

With this API, I would be able to store one item per field, without losing the core features (like the ability to migrate keys that are no longer in the schema)

@tophf
Copy link

tophf commented Jan 5, 2024

what browsers gain

Simplicity. They want to keep this API simple. A getAllKeys method was always a necessity though, so it's a good compromise for them, whereas we'll still get the benefit of not reading huge values.

@polywock
Copy link
Author

polywock commented Jan 5, 2024

@oliverdunk It's not my preferred method, but getKeys() will be very useful.

@hanguokai
Copy link
Member

Background

Because browser.storage api doesn't support multiple namespaces (databases), as a result, developers need to put all kinds of data for different purposes in one space. A common solution is to set a prefix for keys to distinguish different types of data.

The original intent of #475 was to support multiple space. Considering that browsers may not support this change, a compromise was made. If browsers supported multiple spaces, this proposal #505 would not exist.

So the proposal #505 itself is a product of compromise. Now, getAllKeys() is a further compromise. Developers need to write another wrapper function, e.g. getDataByPrefix(prefix). It helps, but not directly. And this is mainly useful when values are large objects. If the data is small, developers can just read all the data directly.

@polywock
Copy link
Author

polywock commented Jan 6, 2024

The original intent of #475 was to support multiple space.

Since I have long since edited my 475 proposal. For those who don't know, in 475 I had two proposals. The primary proposal was to support buckets/spaces for the storage API, and allow retrieving, deleting from a specific bucket, or listening to changes on a specific bucket. From the feedback, I switched focus to the secondary proposal (StorageArea.onChanged supporting a filter). I wouldn't say this was a compromise since I included the secondary proposal alongside the main proposal. I prefer the secondary proposal as of today, especially if paired with #505 (or even the counter-proposal, getKeys).

All things considered, I'm still satisfied a counter-proposal was accepted. It won't be as efficient as the original, but it will still allow for getting filtered keys more efficiently. It would be interesting to do benchmarks to figure out at what point it's better to use getKeys() + filter + get(filtered keys) compared to the current get() + filter.

In addition, if using getKeys() to filter becomes common, the browsers might later implement a dedicated function to optimize it.

@fregante
Copy link

fregante commented Jan 6, 2024

out at what point it's better to use getKeys() + filter + get(filtered keys) compared to the current get() + filter

My guess is that it has limited use for storage.sync since you're reading 100KB at most from disk. Also a micro-optimization for storage.session since reading from RAM is extremely fast.

That leaves storage.local, particularly with unlimitedStorage.

In any case, the most common usage would probably look something like this:

const cache = await chrome.storage.local.get(await chrome.storage.local.getKeys('cache:*'))
await chrome.storage.local.remove(await chrome.storage.local.getKeys('cache:*'))

@hanguokai
Copy link
Member

@fregante getKeys() doesn't have parameters, it returns all key in the storage, then filter key by yourself. Another, storage.session is not as fast as you think in current Chrome implementation. Considering OS disk cache, storage.local is almost as fast as storage.session.

PS: I didn't say getKeys() is not useful. It is useful if you have a lot of data.

@oliverdunk
Copy link
Member

I had a chance to speak to the engineering team about this. On the Chrome side, we're supportive of a getKeys() method to get all keys from a given storage area. We would also be supportive of a deleteKeys() that takes multiple keys to delete if there is interest from other browsers.

@oliverdunk oliverdunk added the supportive: chrome Supportive from Chrome label Jan 11, 2024
@tophf
Copy link

tophf commented Jan 11, 2024

We would also be supportive of a deleteKeys() that takes multiple keys

This is already implemented: chrome.storage.XXXX.deleteremove() accepts an array of keys.

@oliverdunk
Copy link
Member

This is already implemented: chrome.storage.XXXX.delete() accepts an array of keys.

Do you mean remove()? It looks like you're right though - in which case we definitely shouldn't duplicate that.

@polywock
Copy link
Author

Closing in favor of #601.

@polywock polywock closed this as not planned Won't fix, can't repro, duplicate, stale Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
supportive: chrome Supportive from Chrome topic: storage Issues related to persisting data. Topics include browser.storage, web storage, and new APIs.
Projects
None yet
Development

No branches or pull requests

6 participants