-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fully xapian powered catalog search #488
Conversation
Codecov Report
@@ Coverage Diff @@
## master #488 +/- ##
==========================================
+ Coverage 63.18% 64.39% +1.21%
==========================================
Files 50 50
Lines 3504 3556 +52
Branches 1773 1816 +43
==========================================
+ Hits 2214 2290 +76
+ Misses 1288 1264 -24
Partials 2 2
Continue to review full report at Codecov.
|
9f307b3
to
d63ca4a
Compare
d63ca4a
to
1d8e710
Compare
Incorporated @maneeshpm's fix of openzim/libzim#534 into the commits titled "Handling of non-words in publisher query" and "Catalog filtering by creator works via Xapian" (respectively commits 7bfd050 and 109e54d in this revision of the PR). |
1d8e710
to
459da6a
Compare
@mgautierfr It is important this PR is reviewed so @veloman-yunkan can move on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a lot to say, code is good.
Just two questions.
@mgautierfr And what about the question raised in the description of the PR? |
@veloman-yunkan Please rebase on top of master (and merge yourself if I took too much time to come back on this PR) |
Now the LibraryTest.filterCheck unit-test validates the actual entries returned by `Library::filter` (previously only the count of the results was checked).
This diff is easier to view if whitespace change is ignored.
This should have been done back in PR #460
Moved the `filter.hasQuery()` check inside `buildXapianQuery()`. `Library::filterViaBookDB()` only cares if the query that is going to be run on the book DB would match all documents. The rest of changes related to enhancing the usage of Xapian for the catalog search will happen inside `buildXapianQuery()` and `updateBookDB()`.
This change fixes the failure of the LibraryTest.filterByPublisher unit-test broken by the previous commit. The previous approach used in `publisherQuery()` for building a phrase query enforcing the specified prefix for all terms fails if 1. the input phrase contains a non-word term that Xapian's query parser doesn't like (e.g. a standalone ampersand character, 1/2, a#1, etc); 2. the input phrase contains at least three terms that Xapian's query parser has no issue with. Using the `quest` tool (coming with xapian-tools under Ubuntu) the issue can be demonstrated as follows: ``` $ quest -o phrase -d some_xapian_db "Energy & security" Parsed Query: Query((energy@1 PHRASE 11 Zsecur@2)) Exactly 0 matches MSet: $ quest -o phrase -d some_xapian_db "Energy & security act" UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries $ quest -o phrase -d some_xapian_db 'Energy 1/2 security act' UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries $ quest -o phrase -d some_xapian_db "Energy a#1 security act" UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries ``` The problem comes from parsing the query with the default operation set to `OP_PHRASE` (exemplified by the `-o phrase` option in above invocations of `quest`). A workaround is to parse the phrase with a default operation of `OP_OR` and then combine all the terms with `OP_PHRASE`. Besides stemming should be disabled in order to target an exact phrase match (save for the non-word terms, if any, that are ignored by the query parser).
Catalog filtering by titles/description was sensitive to diacritics present in the query string. Fixed that. Also enhanced the unit test to validate the insensitivity to diacritics present in either the title/description or the query string.
Catalog filtering should now be case/diacritics insensitive for all fields. However it is not validated for language, name and category fields, and is validated for tags, creator & publisher only for text supplied in the filter (but not for values read from the book).
The library set up by LibraryTest now contains two valid books initialized via XML. Therefore XmlLibraryTest is not needed as a separate test suite.
845dfda
to
63e9a09
Compare
@mgautierfr I don't have enough privileges in this repository |
Wow, I will see with to change that. |
Fixes #484 with the exception of the
local
,remote
,valid
andmaxSize
filtering criteria. Those can be processed via Xapian too, but is it worth the effort (though we will gain some uniformity in how a query is fulfilled - no postprocessing throughFilter::accept()
will be needed)?Also
Library::filter()
is greatly enhanced