-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Search/Searcher Caching to Internal Server #620
Conversation
Codecov Report
@@ Coverage Diff @@
## master #620 +/- ##
==========================================
+ Coverage 58.03% 66.04% +8.00%
==========================================
Files 54 55 +1
Lines 3584 4102 +518
Branches 2019 2088 +69
==========================================
+ Hits 2080 2709 +629
+ Misses 1503 1392 -111
Partials 1 1
Continue to review full report at Codecov.
|
e642d41
to
9bf3e89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few thing to change but the global structure is good.
9bf3e89
to
18e35dc
Compare
This change though seemingly insignificant till now, proved to be an important aspect in kiwix/libkiwix#620. When the searcher is retrieved from cache, it should start up in OP_AND instead of OP_OR.
18e35dc
to
b3a8fd2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still few changes to do but we are mostly good.
b3a8fd2
to
008db4e
Compare
008db4e
to
cf07d57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments on the code.
But I've one comment coming from you, in the first commit message :
NOTE: One must think about thread safety and race condition while using
the template.
In the commit where you introduce the cache system, you write :
I agree with you. |
@maneeshpm Any news on this PR? It is one of the last one before releasing 10.0.0. This PR would also benefit of a proper automated test to secure the cache does work as intended and does not introduce regressions. |
@mgautierfr aah ohk, so we are protecting suggestions module via mutex lock in libzim. You mean we need to do a similar treatment for FT search as well? @kelson42 Sure, will do something about it. |
I was thinking about protecting the cache system (you explicitly say that in your commit message) but yes, we also need to protect the FT search. |
54f44ff
to
2a819b1
Compare
I wonder if we should move the searcher/suggestionSearcher cache in the library itself (as for the readers/archives) The mutex protection you've added is not enough. It is technically protect the internal structures but a race condition is still possible :
What we also need is to protect (block) the cache while we are creating the searcher to avoid the creation of two searcher. |
@mgautierfr I agree, we are anyway pulling in the archive from the library, so it is more natural to get the searchers(and search via cache or otherwise) from the library itself rather than keeping it associated with the internal server. I propose we do it in a separate ticket after this. Thanks for the suggestion, I'll get to see the proper implementation of an LRU cache rather than a simple one 😅 : I guess |
2a819b1
to
2f74770
Compare
cbd97e3
to
6280236
Compare
303bda9
to
e2998b0
Compare
@kelson42 I have traced the error to the newly added function |
I wonder why in a first place we deal with an ENV variable here? IMO, either there is a given one by the user or we use a default value. I don't like the idea that the ENV plays a role here. |
@maneeshpm But we have this kind of behaviour already in libzim and it works fine with macOS, maybe you could have a look to how this is done there? |
e8cdcf0
to
c5ed9c5
Compare
Apparently, this problem was caused by a simple nullptr returned by I am not sure of a technical explanation for why this is not a problem on linux but on mac, maybe @mgautierfr can help with one 😅 |
@mgautierfr I see this ticket is ready for final review pass! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for this late review @maneeshpm
It seems so have copied the lru_cache
and ConcurrentCache
from libzim.
There is no problem with this, but please change the commit message accordingly.
(And put a commit id of the version of the code you copy from. It will be simpler to know what will be the change in the future)
Else it seems we are good.
I am not sure of a technical explanation for why this is not a problem on linux but on mac, maybe @mgautierfr can help with one sweat_smile
The doc of std::string constructor (https://www.cplusplus.com/reference/string/string/string/) says that it is a undefined behavior if the pointer is null.
Maybe gcc (or its std library) create a empty string if we pass a null pointer and on mac a exception is raised.
@maneeshpm Any feedback? We are really not far to merge this PR! |
@mgautierfr Considering that there is really not much to do and that @maneeshpm seems not available for the moment. Please feel free to just fix and merge. |
The cache is copied from libzim project : https://github.com/openzim/libzim The exact file as been copied from commit 27f5e70
We use the new cache template to implement two kind of cache. 1: The Searcher cache is more general in terms of its usage. A Searcher can be used for multiple searches without much change to itself. We try to retrieve the searcher and perform searches using it whenever possible, and if not we put a searcher into the cache. User can specify a custom cache length by manipulating the environment variable SEARCHER_CACHE_SIZE. It's default value is 10% of all the books available. 2: The search cache is much more restricted in terms of usage. It's main purpose is to avoid re-searching on the searcher during page changes to generate SearchResultSet of various ranges. User can specify a custom cache length using the environment variable SEARCH_CACHE_SIZE with a default value of 2;
We create a cache for SuggestionSearcher very similar to that of FT searcher. User can specify a custom cache size using the environment variable SUGGESTION_SEARCHER_CACHE_SIZE. It has a default value of 10% of the number of books in the library.
c5ed9c5
to
6523d9f
Compare
I've rebase on master and redo a bit the first commit which is a copy of file from libzim. |
Fixes #509
This PR consists of three commits which are as follows:
This general-purpose LRU cache template can be used to implement caching for various classes with various cache size limits.
NOTE: One must think about thread safety and race condition while using the template.
Searcher
andSearch
We use the new cache template to implement two kind of cache.
1: The
Searcher
cache is more general in terms of its usage. ASearcher
can be used for multiple searches without much change to itself. We try to retrieve theSearcher
and perform searches using it whenever possible, and if not we put aSearcher
into the cache. Users can specify a custom cache length by manipulating the environment variableSEARCHER_CACHE_SIZE
. Its default value is 10% of all the books available.2: The
Search
cache is much more restricted in terms of usage. Its main purpose is to avoid re-searching on theSearcher
during page changes to generateSearchResultSet
of various ranges. Users can specify a custom cache length using the environment variableSEARCH_CACHE_SIZE
with a default value of 2.SuggestionSearcher
We create a cache for
SuggestionSearcher
very similar to that of FT searcher. User can specify a custom cache size using the environment variableSUGGESTION_SEARCHER_CACHE_SIZE
. It has a default value of 10% of the number of books in the library.