Rethink search API to fix several issues. #530

mgautierfr · 2021-04-06T19:21:32Z

Fix #463, #516, #471.
Related to https://github.com/kiwix/kiwix-tools/issues/97

First commits roughly change the API to have a correct public API (but somehow wrong internal structures).
Later commits fix the internal structures to have something correct.

The main idea is to have :

A Searcher, wrapping a xapian database.
A Search, wrapping a particular query on the xapian database.
A SearchResult, a set of result (range) corresponding to the Search.

We keep the iterator, but now we iterate over the SearchResult
We also introduce a Query object, describing a query (from user point of view), not associated to any database.

codecov · 2021-04-06T19:21:40Z

Codecov Report

Merging #530 (b00e635) into master (479d9f3) will increase coverage by 1.24%.
The diff coverage is 79.79%.

❗ Current head b00e635 differs from pull request most recent head ff16617. Consider uploading reports for the commit ff16617 to get more accurate results

@@            Coverage Diff             @@
##           master     #530      +/-   ##
==========================================
+ Coverage   80.63%   81.88%   +1.24%     
==========================================
  Files          91       91              
  Lines        3749     3753       +4     
  Branches     1666     1672       +6     
==========================================
+ Hits         3023     3073      +50     
+ Misses        725      679      -46     
  Partials        1        1

Impacted Files	Coverage Δ
include/zim/search_iterator.h	`100.00% <ø> (ø)`
src/search.cpp	`73.68% <76.31%> (+9.49%)`	⬆️
src/search_iterator.cpp	`84.61% <86.95%> (+13.92%)`	⬆️
include/zim/search.h	`100.00% <100.00%> (ø)`
src/search_internal.h	`100.00% <100.00%> (+25.00%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 479d9f3...ff16617. Read the comment docs.

veloman-yunkan

It would be nice if the API change is summarized in the PR description.
It is important to state the thread-safety guarantees of the search API.
The changes are not sufficiently covered by the unit tests.

src/search.cpp

include/zim/search.h

veloman-yunkan · 2021-04-07T09:15:07Z

include/zim/search.h

+
+    Searcher& add_archive(const Archive& archive);
+
+    Search search(bool suggestionMode);


I wonder if having separate Searcher objects per search mode would be more appropriate (i.e. suggestionMode should be fixed at construction time).

I've moved the suggestionMode in the Query in later commits.
But it doesn't change the point you've raised here.

Having a Searcher per mode means we have to create a different searcher for different query.
But the idea is that the Searcher is the wrapper around a database. Although we internally have two different InternalDataBase for now (it is a implementation details), conceptually, there is only one database and we can do two (or three with georange) different kinds of search, on the same searcher.

src/search.cpp

maneeshpm

I am happy with the new structure this PR brings about. It should make future enhancement/maintenance of search more localized and manageable. Though I cannot comment on the technicality(you yourself and veloman are experts here), from a usage point of view I have a couple of suggestions

src/search.cpp

mgautierfr · 2021-04-07T14:47:41Z

It would be nice if the API change is summarized in the PR description.

Done

It is important to state the thread-safety guarantees of the search API.

We need a document on the API (and a explanation there about the thread-safety).
But globally, there is no thread-safety. Caller must protect against race condition.

The changes are not sufficiently covered by the unit tests.

Agree (we are still in WIP state).

Obviously the `initDatabase` should not be const. But this is mainly a code move. The const(and api) will be fixed in next commit.

The InternalDataBase class encapsulate all information we can have about a xapian database. This commit concentrate the change about regrouping the information in only one class. The use of the class itself is done the right way. But it will be changed in next commits.

No code change.

The searcher is a wrapper around a xapian database. It allow to create searches. This commit concentrate the change around the introduction of the `Searcher` class. The `Search` is not modified.

mgautierfr · 2021-05-11T13:13:33Z

Rebased on master and resolve conficts.

New commits should fix the comments raised by previous review.

Last commits add unittest on somehow edge case iterator usage. I've fixed the minimum to avoid libzim crashing.
We should homogenize the behavior/API (throw exception all the time, move to camelCase api) around the search_iterator but I prefer to move this in another PR.

We are not testing geoquery search, this is why we don't have a 90% coverage on the patch (but code doesn't really change here)

veloman-yunkan

A couple of minor issues.

src/search_iterator.cpp

test/search_iterator.cpp

src/search_internal.h

src/search_iterator.cpp

maneeshpm

Awesome PR! The tests demonstrating the usage are super useful. I don't have much to add to veloman-yunkan's comments. Just a minor adjustment that can be useful for the future.

src/search.cpp

A `SearchResultSet` represent a particular range of search results. It can be iterated. This allow the user to reuse a `Search` to get different range results.

By using a `Query` object, we avoid to make our `Search` configurable. And so, we avoid a user calling a potential `setQuery` on a `Search` after `getRange` has been called.

As the QueryParser is only dependent of the information in the database we can directly create it as we create the database. We don't need a specific method to configure a QueryParser every time. As m_language and m_stopwords were used only to configure the queryParser, we don't need to store them in the database.

Parsing the query can be made entirely in the database, so let's move it there.

mgautierfr requested review from veloman-yunkan and maneeshpm April 6, 2021 19:21

This was linked to issues Apr 6, 2021

Possibility of a memory leak in zim::Search::begin() #516

Closed

A simple unit-test for zim::Search doesn't work #471

Closed

veloman-yunkan requested changes Apr 7, 2021

View reviewed changes

maneeshpm reviewed Apr 7, 2021

View reviewed changes

src/search.cpp Outdated Show resolved Hide resolved

mgautierfr force-pushed the fix_search_api_memlink branch from 5ad6f89 to 74a79f3 Compare April 7, 2021 14:33

mgautierfr mentioned this pull request Apr 16, 2021

Add zim files in the "new" format as testing data. #535

Merged

mgautierfr force-pushed the fix_search_api_memlink branch from 74a79f3 to 29fddde Compare April 20, 2021 15:26

mgautierfr changed the base branch from master to better_tests_data April 20, 2021 15:27

mgautierfr mentioned this pull request Apr 27, 2021

Enhancement for title snippet generation #545

Merged

mgautierfr force-pushed the better_tests_data branch 3 times, most recently from 5f1f261 to 6bc2dab Compare April 28, 2021 13:02

Base automatically changed from better_tests_data to master April 28, 2021 13:45

mgautierfr force-pushed the fix_search_api_memlink branch from 29fddde to f10f8f0 Compare May 5, 2021 14:43

mgautierfr added 3 commits May 11, 2021 10:24

Move search's database initialization in a specific method.

9cbaab0

Obviously the `initDatabase` should not be const. But this is mainly a code move. The const(and api) will be fixed in next commit.

Move methods in src/search.cpp

5c1cf73

No code change.

mgautierfr force-pushed the fix_search_api_memlink branch from 8d48ee9 to 9fe5b6d Compare May 11, 2021 09:40

Introduce the Searcher class.

8a7378a

The searcher is a wrapper around a xapian database. It allow to create searches. This commit concentrate the change around the introduction of the `Searcher` class. The `Search` is not modified.

mgautierfr force-pushed the fix_search_api_memlink branch 2 times, most recently from e991f23 to 85e2c31 Compare May 11, 2021 10:13

mgautierfr requested review from veloman-yunkan and maneeshpm May 11, 2021 13:13

mgautierfr marked this pull request as ready for review May 11, 2021 13:13

veloman-yunkan requested changes May 11, 2021

View reviewed changes

src/search_iterator.cpp Outdated Show resolved Hide resolved

test/search_iterator.cpp Outdated Show resolved Hide resolved

src/search_internal.h Outdated Show resolved Hide resolved

src/search_iterator.cpp Outdated Show resolved Hide resolved

maneeshpm requested changes May 12, 2021

View reviewed changes

src/search.cpp Show resolved Hide resolved

mgautierfr requested review from maneeshpm and veloman-yunkan May 12, 2021 12:14

maneeshpm approved these changes May 12, 2021

View reviewed changes

veloman-yunkan approved these changes May 12, 2021

View reviewed changes

mgautierfr added 9 commits May 12, 2021 16:26

Introduce a SearchResultSet class.

483852f

A `SearchResultSet` represent a particular range of search results. It can be iterated. This allow the user to reuse a `Search` to get different range results.

Introduce the Query object.

d3a1575

By using a `Query` object, we avoid to make our `Search` configurable. And so, we avoid a user calling a potential `setQuery` on a `Search` after `getRange` has been called.

Move query parsing in internalDataBase.

2f4b51f

Parsing the query can be made entirely in the database, so let's move it there.

Add a bit of docstring to the search API.

8547e6c

Add unittest demonstrating reusing of Searcher and Search.

6724a89

Add unittest for empty search_iterator.

a18363b

Add unittest for end search_iterator.

d64d87b

Add unittest for copy of search_iterator.

ff16617

mgautierfr force-pushed the fix_search_api_memlink branch from b00e635 to ff16617 Compare May 12, 2021 14:40

mgautierfr merged commit 1ffbfca into master May 12, 2021

mgautierfr deleted the fix_search_api_memlink branch May 12, 2021 14:55

veloman-yunkan mentioned this pull request May 13, 2021

Adapt libkiwix to the new libzim search API kiwix/libkiwix#523

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink search API to fix several issues. #530

Rethink search API to fix several issues. #530

mgautierfr commented Apr 6, 2021 •

edited

Loading

codecov bot commented Apr 6, 2021 •

edited

Loading

veloman-yunkan left a comment

veloman-yunkan Apr 7, 2021

mgautierfr Apr 7, 2021

maneeshpm left a comment •

edited

Loading

mgautierfr commented Apr 7, 2021

mgautierfr commented May 11, 2021

veloman-yunkan left a comment

maneeshpm left a comment


		Searcher& add_archive(const Archive& archive);

		Search search(bool suggestionMode);

Rethink search API to fix several issues. #530

Rethink search API to fix several issues. #530

Conversation

mgautierfr commented Apr 6, 2021 • edited Loading

codecov bot commented Apr 6, 2021 • edited Loading

Codecov Report

veloman-yunkan left a comment

Choose a reason for hiding this comment

veloman-yunkan Apr 7, 2021

Choose a reason for hiding this comment

mgautierfr Apr 7, 2021

Choose a reason for hiding this comment

maneeshpm left a comment • edited Loading

Choose a reason for hiding this comment

mgautierfr commented Apr 7, 2021

mgautierfr commented May 11, 2021

veloman-yunkan left a comment

Choose a reason for hiding this comment

maneeshpm left a comment

Choose a reason for hiding this comment

mgautierfr commented Apr 6, 2021 •

edited

Loading

codecov bot commented Apr 6, 2021 •

edited

Loading

maneeshpm left a comment •

edited

Loading