Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of non-words in publisher query #537

Closed
kelson42 opened this issue Apr 16, 2021 · 3 comments
Closed

Handling of non-words in publisher query #537

kelson42 opened this issue Apr 16, 2021 · 3 comments
Assignees
Milestone

Comments

@kelson42
Copy link
Contributor

The approach used in publisherQuery() for building a phrase query
enforcing the specified prefix for all terms fails if the input phrase
contains a term that Xapian's query parser doesn't like (e.g. a
standalone ampersand character, 1/2, a#1, etc).

Using the quest tool (coming with xapian-tools under Ubuntu) the
issue can be demonstrated as follows:

$ quest -o phrase -d some_xapian_db "Energy & security act"
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries

$ quest -o phrase -d some_xapian_db 'Energy 1/2 security act'
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries

$ quest -o phrase -d some_xapian_db "Energy a#1 security act"
UnimplementedError: OP_NEAR and OP_PHRASE only currently support leaf subqueries

The problem comes from parsing the query with the default operation set
to OP_PHRASE (exemplified by the -o phrase option in above
invocations of quest). A workaround is to parse the phrase with a
default operation of OP_OR and then combine all the terms with
OP_PHRASE.

This issue is likely to be present in libzim/src/search.cpp too, where
set_default_op(Xapian::Query::op::OP_PHRASE) is used. It affects
suggestions in my local build of kiwix-serve (though the instance
running at library.kiwix.org seems to be unaffected, is it an old
version?).

@kelson42
Copy link
Contributor Author

Duplicate of #536

@kelson42 kelson42 marked this as a duplicate of #536 Apr 16, 2021
@veloman-yunkan
Copy link
Collaborator

veloman-yunkan commented Apr 16, 2021

@kelson42 I don't see why you had to open this issue in libzim and then close it as a duplicate for #536. This bug was introduced and then fixed in the yet umerged PR kiwix/libkiwix#488. The only relation of that virtual issue to libzim is that an approach from libzim source code was utilized in new code added to kiwix-lib. The only thing that has to be done about that yet unmerged piece of code in kiwix-lib is to incorporate the extra fix (disabling of stemming) that @maneeshpm has done in #534.

@kelson42
Copy link
Contributor Author

@veloman-yunkan I got confused, but to avoid any confusion in the future please always open a bug ticket before a fixing PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants