-
-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethink search API to fix several issues. #530
Conversation
Codecov Report
@@ Coverage Diff @@
## master #530 +/- ##
==========================================
+ Coverage 80.63% 81.88% +1.24%
==========================================
Files 91 91
Lines 3749 3753 +4
Branches 1666 1672 +6
==========================================
+ Hits 3023 3073 +50
+ Misses 725 679 -46
Partials 1 1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It would be nice if the API change is summarized in the PR description.
- It is important to state the thread-safety guarantees of the search API.
- The changes are not sufficiently covered by the unit tests.
include/zim/search.h
Outdated
|
||
Searcher& add_archive(const Archive& archive); | ||
|
||
Search search(bool suggestionMode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if having separate Searcher
objects per search mode would be more appropriate (i.e. suggestionMode
should be fixed at construction time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the suggestionMode
in the Query
in later commits.
But it doesn't change the point you've raised here.
Having a Searcher
per mode means we have to create a different searcher for different query.
But the idea is that the Searcher
is the wrapper around a database. Although we internally have two different InternalDataBase
for now (it is a implementation details), conceptually, there is only one database and we can do two (or three with georange) different kinds of search, on the same searcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy with the new structure this PR brings about. It should make future enhancement/maintenance of search more localized and manageable. Though I cannot comment on the technicality(you yourself and veloman are experts here), from a usage point of view I have a couple of suggestions
5ad6f89
to
74a79f3
Compare
Done
We need a document on the API (and a explanation there about the thread-safety).
Agree (we are still in WIP state). |
74a79f3
to
29fddde
Compare
5f1f261
to
6bc2dab
Compare
29fddde
to
f10f8f0
Compare
Obviously the `initDatabase` should not be const. But this is mainly a code move. The const(and api) will be fixed in next commit.
The InternalDataBase class encapsulate all information we can have about a xapian database. This commit concentrate the change about regrouping the information in only one class. The use of the class itself is done the right way. But it will be changed in next commits.
No code change.
8d48ee9
to
9fe5b6d
Compare
The searcher is a wrapper around a xapian database. It allow to create searches. This commit concentrate the change around the introduction of the `Searcher` class. The `Search` is not modified.
e991f23
to
85e2c31
Compare
Rebased on master and resolve conficts. New commits should fix the comments raised by previous review. Last commits add unittest on somehow edge case iterator usage. I've fixed the minimum to avoid libzim crashing. We are not testing geoquery search, this is why we don't have a 90% coverage on the patch (but code doesn't really change here) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome PR! The tests demonstrating the usage are super useful. I don't have much to add to veloman-yunkan's comments. Just a minor adjustment that can be useful for the future.
A `SearchResultSet` represent a particular range of search results. It can be iterated. This allow the user to reuse a `Search` to get different range results.
By using a `Query` object, we avoid to make our `Search` configurable. And so, we avoid a user calling a potential `setQuery` on a `Search` after `getRange` has been called.
As the QueryParser is only dependent of the information in the database we can directly create it as we create the database. We don't need a specific method to configure a QueryParser every time. As m_language and m_stopwords were used only to configure the queryParser, we don't need to store them in the database.
Parsing the query can be made entirely in the database, so let's move it there.
b00e635
to
ff16617
Compare
Fix #463, #516, #471.
Related to https://github.com/kiwix/kiwix-tools/issues/97
First commits roughly change the API to have a correct public API (but somehow wrong internal structures).
Later commits fix the internal structures to have something correct.
The main idea is to have :
Searcher
, wrapping a xapian database.Search
, wrapping a particular query on the xapian database.SearchResult
, a set of result (range) corresponding to theSearch
.We keep the iterator, but now we iterate over the
SearchResult
We also introduce a
Query
object, describing a query (from user point of view), not associated to any database.