Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add synonyms.bag option #30

Merged
merged 2 commits into from
Oct 12, 2013
Merged

Add synonyms.bag option #30

merged 2 commits into from
Oct 12, 2013

Conversation

softwaredoug
Copy link
Collaborator

Synonyms.bag is an alternate, simpler form of query expansion using the synonyms generated from the analysis chain. Instead of taking the query "dog bite" and expanding it to:

dog chomp
hound bite
hound chomp
canine familiaris chomp
canine familiaris bite

The synonyms simply get appended to the dismax query, and the generated synonym queries become

chomp
hound
bite
canine familiaris

when "constructPhraseQueries" is set to true, the generated synonym queries become

"chomp"
"hound"
"bite"
"canine familiaris"

which is desirable when synonyms expansion results in multiword phrases.

The advantage to synonyms.bag is improved performance. In some cases, with large queries and many synonyms, the default query expansion can grow extremely complex resulting in performance problems. Synonyms.bag simplifies this expansion dramatically.

The drawback is loss of positional information in the query string. Features such as pf2, pf3, etc that try parts of the query as phrases won't function as expected. By simply appending generated synonyms, position information within the query is not a useful indicator of phrases that were actually searched for.

Synonyms.bag is ideal for identifying entities/tags within queries and searching against tag-like fields. Positional information is not meaningful in these contexts, so simply searching a "bag" of synonyms makes a lot of sense.

@dsmiley
Copy link

dsmiley commented Sep 27, 2013

By the way, shingling improves phrase query performance, at the expense of a larger index. That's another approach.

@arcadius
Copy link

arcadius commented Oct 8, 2013

Synonyms.bag is an interesting approach that could be a config choice.

As you clearly mentioned, it will not play well with positional features such as proximity search.

@nolanlawson nolanlawson merged commit d7c1c24 into healthonnet:master Oct 12, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants