Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[library] Use Xapian for library search on all fields #484

Closed
veloman-yunkan opened this issue Apr 9, 2021 · 11 comments · Fixed by #488
Closed

[library] Use Xapian for library search on all fields #484

veloman-yunkan opened this issue Apr 9, 2021 · 11 comments · Fixed by #488
Assignees
Milestone

Comments

@veloman-yunkan
Copy link
Collaborator

#460 provided a fix for #106, targeting (as requested) only the title/description fields.

Now it's time to use Xapian for search over all fields of the library entries.

@veloman-yunkan veloman-yunkan self-assigned this Apr 9, 2021
@kelson42
Copy link
Collaborator

kelson42 commented Apr 9, 2021

@veloman-yunkan I guess somehow kiwix/kiwix-tools#440 is a duplicate of this ticket... or least will be fixed at the same time?

@veloman-yunkan
Copy link
Collaborator Author

@veloman-yunkan I guess somehow kiwix/kiwix-tools#440 is a duplicate of this ticket...

I can't agree. Though the description of that ticket contains ideas that allow it to consider it a duplicate of this one, its title doesn't suggest that at all.

or least will be fixed at the same time?

Both tickets can be fixed in the same PR, however that is optional.

@veloman-yunkan
Copy link
Collaborator Author

@kelson42 @mgautierfr

I am currently extending the library unit-tests so that this change doesn't break anything (see #488). Though the functional coverage of catalog search/filtering has to be enhanced further, I already one questions at this point:

  • Currently filtering by some fields (e.g. creator/author, tags) is case sensitive. How important is it to preserve that behaviour?

@kelson42
Copy link
Collaborator

@veloman-yunkan I appreciate to see you taking a TDD approach here. That said, I believe we should try to be coherent and like for other fields, take a diacritic/case insensitive approach.

@kelson42
Copy link
Collaborator

@veloman-yunkan Do you confirm that the description/title search is diacritics/case insensitive (like the title/ft search in libzim). Not doing this generates unexpected behaviour like reported here for example kiwix/kiwix-android#2638.

@gouri-panda
Copy link

gouri-panda commented Apr 13, 2021

@veloman-yunkan Do you confirm that the description/title search is diacritics/case insensitive (like the title/ft search in libzim). Not doing this generates unexpected behaviour like reported here for example kiwix/kiwix-android#2638.

@kelson42 Just a note in kiwix-ios it works well.

@mgautierfr
Copy link
Member

Currently filtering by some fields (e.g. creator/author, tags) is case sensitive. How important is it to preserve that behaviour?

I don't think we have to preserve this.
As @kelson42 said, we don't have to be totally coherent with current version.

As far as I know (@kelson42 please confirm), kiwix-lib is used by kiwix-tools, kiwix-desktop, kiwix-android (through the jni wrapper) and maybe kiwix-ios/macos (I don't know, maybe they directly use libzim).

As far as those project are not broken (or updated), we are "good".
Moving to case insensitive is probably a improvement here.

@kelson42
Copy link
Collaborator

@automactic Should confirm, but I believe Kiwix iOS downloads the whole OPDS stream (without any filtering) and parse/handle it on its own (without relying to the libkiwix).

@veloman-yunkan
Copy link
Collaborator Author

@veloman-yunkan Do you confirm that the description/title search is diacritics/case insensitive (like the title/ft search in libzim).

I just checked and it is so only partly. The titles/descriptions stored in the search DB are normalized to lower case with diacritics removed. However the query string is not normalized. As a result the query string is case insensitive but diacritics sensitive.

@kelson42
Copy link
Collaborator

@veloman-yunkan Thx for checking. This is something you can fix in this PR or should I open an other ticket?

@veloman-yunkan
Copy link
Collaborator Author

@kelson42 I already fixed it in this PR (see ded9d01, or its equivalent commit titled "Made catalog filtering by query diacritics insensitive" if I rebase the PR again).

@kelson42 kelson42 added this to the 10.0.0 milestone May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants