Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch 1 Issue 41 #42

Closed
wants to merge 9 commits into from
Closed

Patch 1 Issue 41 #42

wants to merge 9 commits into from

Conversation

rpialum
Copy link
Contributor

@rpialum rpialum commented Jun 18, 2014

Issue #41, when original search term is quoted and synonym being expanded is item quoted don't re-quote when doing constructPhraseQueries. Removes issue where double quotes appear in result. Also when doing constructPhraseQueries, only quote phrases not single term synonyms.

Could not get Python to work on my machine (v3.4.1), and also wasn't quite sure how to verify my results via a python test case. Best way I'm thinking is to set debugQuery=on and then in the response check the 'expandedSynonyms' lists generated to confirm double quotes are not present

Stubbed out test case for issue #41. Additional logic needs to be implemented for testing the results, see TODO. I've been unable to get Python(v3.4.1) to run the top level file, but am assuming that debugQuery needs to be set on and the contents of the 'expandedSynonyms' in the response needs to be analyized. Currently I'm assuming a count of the number of quotes would work, though the exact expected string could also be passed in.

Issue healthonnet#41, when original search term is quoted and synonym being expanded is item quoted don't re-quote when doing constructPhraseQueries.  Removes issue where double quotes appear in result.  Also when doing constructPhraseQueries, only quote phrases not single term synonyms.
Stubbed out test case for issue healthonnet#41. Additional logic needs to be implemented for testing the results, see TODO. I've been unable to get Python(v3.4.1) to run the top level file, but am assuming that debugQuery needs to be set on and the contents of the 'expandedSynonyms' in the response needs to be analyized. Currently I'm assuming a count of the number of quotes would work, though the exact expected string could also be passed in.
@nolanlawson
Copy link
Member

I was testing with Python 2, so if you can get that to run, you should be able to verify your tests. But regardless this is awesome; I'll merge this once I verify the tests are passing. Thanks! :)

@rpialum
Copy link
Contributor Author

rpialum commented Jun 24, 2014

I've got python 2.7 working now and am in the process of trying to complete my test script. I'm still learning the functional logic of github, and am looking for how to go about adding additional file changes to this pull request? (Is there a way to do this or does the PR need to be re-opened as a branch?) I've been running on a windows environment so I have created a windows version of the run_solr_for_unit_tests.py script and would also like to amend the README.md 'Testing' section to reflect to procedure for obtaining the proper python utilities.

Please advise on the best method to use to do so, nothing jumped out at me on the website. (Note I've only been working via the website and don't have git or any of their utilities installed on my machine)

@nolanlawson
Copy link
Member

That's no problem. The git tools can be pretty daunting on the command line; we've all been there. :)

I'd recommend just continuing to push commits to this same branch. When I merge, I will probably combine your commits into a single one using git merge --squash (but giving you attribution, of course). If you're on Windows, I've been told that the GitHub app for Windows is pretty good, else the Git Bash in Git for Windows is my go-to when I need to use Git on the command line in Windows.

Unfortunately you may run into problems if you try to run the test script in Windows; I used bash scripts at some point, so it might only work in a *nix environment. Cygwin may help you out here. Also I'm typically on Freenode IRC as "nolanlawson" if you need live help.

Windows based python run script.  Primary issue is doing 'cd' inside a os.system() call does not actually change the working directory. Instances of this have been replaced with os.chdir() prior to the calls.
Use a LinkedHashSet to buildUpAlternateQueries() rather than an ArrayList to ensure that our list of alternate queries that are built are unique.
Added logic needed to validate quote change.  had to resort to using urllib logic and wt=pthon in the url inorder to gain access to the debug information generated in the SOLR response.  The solrpy Response object was not detailed enough.
@rpialum
Copy link
Contributor Author

rpialum commented Jun 24, 2014

Thanks Nolan. I was able to find my branch again and have added the unit test, along with a windows based python run file (may also work on Linux/unix/bash) and added instructions to help other windows users hit the ground running.

One other question related to phrase queries stemming from issue #5, which spawned the synonyms.constructPhrases=true logic. Have you investigated the ability to only have synonym replacement occur on quoted multi-term phrases in the query?

ie:
Synonyms: IRS, Internal Revenue Service

Current Behavior:
Query1: Internal Revenue Service takes money
evals to: Internal Revenue Service Takes money; IRS takes money

Query2: "Internal Revenue Service" takes money
evals to: "Internal Revenue Service" Takes money; IRS takes money

Desired Behavior:
Query1: Internal Revenue Service takes money
evals to: Internal Revenue Service Takes money

Query2: "Internal Revenue Service" takes money
evals to: "Internal Revenue Service" Takes money; IRS takes money

I'm thinking that tweaking the PatternTokenizer being used to drop quotes and maybe adding quotes to the phrases in the synonym file may achieve this, but I've yet to have the chance to play around with this in the debugger.

I'm thinking that this type of behavior may be best represented and implemented by quoting the phrases in the synonym file to cause only the complete phrase to be recognized as a synonym to be replaced. Thoughts on if you have encountered or attempted this scenario during your work on developing the original functionality?

@nolanlawson
Copy link
Member

Jeez, I am sorry for the radio silence on this. I promise I will get around to merging this soon. As for your other questions; I have no idea - I have not thought about this plugin in forever TBH and I've largely forgotten how it works.

@nolanlawson
Copy link
Member

AFAIK your quoting feature is already available in the synonyms.constructPhrases and synonyms. disablePhraseQueries options.

@nolanlawson
Copy link
Member

Squashed your commits and merged as 242d330 because I have OCD.

Also you had a little syntax error; fixed that up.

The Windows-related stuff have been moved to another PR since it's unrelated. I'd rather not have to maintain a completely separate python script if possible.

@nolanlawson nolanlawson closed this Oct 4, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants