Duplicate entries in search results #980

BPerlakiH · 2024-09-15T11:50:41Z

Based on the findings from: #979

We can have multiple search results, as indexed search and title search can return the same entries.
Additionally some entries will have duplicate URLs, one with and one without a trailing slash, eg:
zim://B2A69C3D-7852-400F-9A07-8986875DF683/solar.lowtechmagazine.com/tags/speed/ zim://B2A69C3D-7852-400F-9A07-8986875DF683/solar.lowtechmagazine.com/tags/speed
Whereas both will result in the same search result, and page linked.

Example on macOS:

The text was updated successfully, but these errors were encountered:

kelson42 · 2024-09-15T12:43:04Z

Here a general remark. The libzim provides two searches:

Title suggestions
Fulltext searches

Usually it is either one or the other. Only the Apple reader does somehow a mix. I'm not super found of this approach honestly as it creates a lot of new challenges.

Not against fixes this one obviously, but just want to share the information that this approach might disappear in the future.

BPerlakiH · 2024-09-15T17:49:15Z

@kelson42 I did some further investigation on this. There are more issues discovered, we have this:

kiwix-apple/SwiftUI/Model/SearchOperation/SearchOperation.mm

Lines 62 to 65 in 9093134

    
           if (archive.hasFulltextIndex()) { 
        
               indexSearchArchives.push_back(archive); 
        
           } 
        
           titleSearchArchives.push_back(archive);

Now this indeed means we do search in both ways, as you wrote.

Currently, with the wikipedia copy I have, it throws an exception on indexed search:
DatabaseCorruptError: dir_end invalid in block 28240
Which has the following consequences, if I do change it as you suggested: to be either indexed or title search:

if (archive.hasFulltextIndex()) {
    indexSearchArchives.push_back(archive);
} else {
    titleSearchArchives.push_back(archive);
}

it won't give any results, since the indexed search fails, and we won't do the title search at all.

Additionally to this, I did found that we do the search on a set of archives, which is also not perfect:

if it throws an exception on 1 archive from the set, we loose the results from the whole set!

I am updating the PR to do it one try / catch per archive. That way we can continue to get results even if one of the archives is "bad".

BPerlakiH added bug macOS iOS labels Sep 15, 2024

BPerlakiH added this to the 3.6.0 milestone Sep 15, 2024

BPerlakiH self-assigned this Sep 15, 2024

BPerlakiH changed the title ~~Remove duplicate entry from search results~~ Duplicate entries in search results Sep 15, 2024

BPerlakiH linked a pull request Sep 15, 2024 that will close this issue

Remove duplicate search entries #981

Merged

BPerlakiH mentioned this issue Sep 15, 2024

Remove duplicate search entries #981

Merged

kelson42 closed this as completed in #981 Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate entries in search results #980

Duplicate entries in search results #980

BPerlakiH commented Sep 15, 2024 •

edited

Loading

kelson42 commented Sep 15, 2024

BPerlakiH commented Sep 15, 2024 •

edited

Loading

Duplicate entries in search results #980

Duplicate entries in search results #980

Comments

BPerlakiH commented Sep 15, 2024 • edited Loading

kelson42 commented Sep 15, 2024

BPerlakiH commented Sep 15, 2024 • edited Loading

BPerlakiH commented Sep 15, 2024 •

edited

Loading

BPerlakiH commented Sep 15, 2024 •

edited

Loading