-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hyphens shouldn't always be assumed to be query operators #32
Comments
Unfortunately it appears that hyphenated synonyms like "e-commerce" can only be used if the synonym file is explicitly purged of all hyphens (which should be replaced with spaces). Commit 4913b62 demonstrates this. The synonym file contains My impression is that this is a weakness of the default configuration we choose for the synonym analyzer. The combination of KeywordTokenizers at one step and a StandardTokenizer at the other causes hyphenated synonyms to be overlooked. Unfortunately I can't seem to find a combination that satisfies all the unit tests, so for now I'm just recommending that people manually purge their synonym files of hyphenated synonyms. |
I use |
How about inserting a PatternReplaceCharFilterFactory Before the tokenizer to remove hyphens?
|
As I know you cannot use CharFilters in |
While investigating #26 and #9, it occurred to me that all of these issues are related. I also think they're really just configuration issues, related to the fact that, in our examples and unit tests, we configure the synonym analyzer to use the
My fix was just to replace the Hopefully the BTW, I also put all this configuration into a single file, so it's easier to modify. The same file that's used for the unit tests is referenced in the README; we can change that later if it becomes awkward. |
Related to #28.
Words like "e-commerce" should be understood to be non-complex queries (whereas something like "e -commerce" is truly complex).
The text was updated successfully, but these errors were encountered: