Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tiered Caching] [Milestone 1] KeyLookup store for disk cache #10309

Open
sgup432 opened this issue Oct 2, 2023 · 1 comment
Open

[Tiered Caching] [Milestone 1] KeyLookup store for disk cache #10309

sgup432 opened this issue Oct 2, 2023 · 1 comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc

Comments

@sgup432
Copy link
Contributor

sgup432 commented Oct 2, 2023

Is your feature request related to a problem? Please describe.
Refer main #10024
and #10870

As part of disk based caches, we need to check whether a key is present on disk or not and extract value accordingly. As it can happen that we unnecessarily perform a disk seek even when the key is not present. We can solve this by storing disk cache keys in memory in an optimized way.

Details
As part of tiered caching milestone1, we are looking to add a disk tier. Where essentially we will be spilling evicted items from in-memory cache to this disk tier.

So whenever a request comes, we need to check whether this request is present on either of caches i.e. in-memory or disk. As doing a disk GET(to know whether request is present or not) every time might be expensive and will add to the overall search latency. So to solve this, we are trying to maintain disk keys in-memory itself. To do this, instead of storing the whole key in memory(which can be expensive), we will just store the key hashcode(integer) in memory using this key store implemenation. Obviously there might be false positive cases but it should be few. Also the actual keys will be stored on disk itself.

To implement this key store, we were looking for different data structures like bitset array, roaring bitmap etc. Bitset can be expensive memory wise considering how it works. As for example, if we have only one key which is a large number N, we need to allocate that much memory for bitset array to set Nth bit index for a number N.

On other hand, roaring bitmap is more of a compressed bitset which works better memory wise but relatively slower. Here too we need to be considerate of the max memory roaring bitmap will use, so we have added some logic around that.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@sgup432 sgup432 added enhancement Enhancement or improvement to existing feature or request untriaged labels Oct 2, 2023
@ankitkala ankitkala added Search Search query, autocomplete ...etc and removed Other labels Dec 17, 2023
@macohen macohen removed the untriaged label Dec 20, 2023
@msfroh
Copy link
Collaborator

msfroh commented Jan 19, 2024

Reading through the linked PR, I think we need to clarify what the interface for the key lookup should be before diving into the implementation.

It sounds like the key lookup is supposed to look like a set, with add, contains, and remove operations, right?

Taking into account that the design specifies that values get added to the cache in batches, and eviction can reasonably be deferred (since you allow for false positives), your add and remove operations could take a collection of items to avoid needing to repeatedly acquire a write lock to mutate the key store.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc
Projects
Status: Later (6 months plus)
5 participants