-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collection caching #114
Comments
Now that a Value Index #14 has been implemented, this issue has become a lot less relevant. Only when collections are very large, it might make sense to cache things like filtering and sorting. |
I'm noticing that on my laptop, some collections (such as commits) are becoming a bit slower (200 ms for 1400 entries). Makes sense, as all items are iterated and sorted on the fly. That should happen at index time, not at runtime. But how do we cache this, exactly? Store cached versions of the Collection resourceSeems easy to implement, but...
Have a special index for collections
Current implementation idea
Limitations
|
I've learned a bit more about how to optimise using
|
* #114 WIP collection cache * #114 WIP collection cache working but slow * #114 try different approach * WIP tests passing, but sorting not working * WIP * Sorting one way works... * Fix sorting * mostly working * Move db tests to file * Move some utility functions * Cleanup * authorization tests * Add authorization tests, get them green * Cache invalidation test passing * Add test for delting, fix temp path gitignore * Refactor commit opts * Fix query index * Change TPF, fix test * Tests passing * Improve sorting * Bump to v0.31.0
Collections are currently dynamic resources, which means that they are fully calculated when a user sends a request. That works fine, but it comes at a performance cost, since the DB must be queried.
How to cache this? How does this interact with the
get_extended_resource
function? How to invalidate the cache? Let's discuss some considerations.Collections can be sorted and filtered by adding query params. These of course change the dynamic properties such as
members
andtotal_count
. These should be cached seperately.Since all changes should be done using Commits, we can perform cache invalidations while handling Commits. How does the Commit handler know which resources should be invalidated? For example, let's say I remove the
firstName
property with thejohn
value from someperson
Resource. The person first appeared in the collections of people namedjohn
, but this collection should now be invalidated.Invalidation approaches
Invalidate when any attribute of a resource chages
When a Collection iterates over its members, it adds the subject of the collection (including query params) to a K/V
incomingLinks
store where each K is a subject, and each V stands for an array of subjects that link to it. When a commit is applied to resource X, it takes subject X and opens theincomingLinks
instance of that X. It then proceeds to invalidate all the V items. This will invalidate many collections that could very well result in exactly the same members, when re-run.Use TPF index / cache
#14
If we build an index for all values, most of the expensive part is solve. It just leaves sorting - which still is expensive.
The text was updated successfully, but these errors were encountered: