-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "most_similar_to_given" method for KeyedVectors #1582
Conversation
.spyproject/codestyle.ini
Outdated
@@ -0,0 +1,6 @@ | |||
[codestyle] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is it? Please remove all non-relevant files (all from .spyproject
folder).
gensim/models/keyedvectors.py
Outdated
@@ -617,6 +618,22 @@ def similarity(self, w1, w2): | |||
|
|||
""" | |||
return dot(matutils.unitvec(self[w1]), matutils.unitvec(self[w2])) | |||
|
|||
def most_similar_to_given(self, w1, word_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reformat your docstring according to google-style
gensim/models/keyedvectors.py
Outdated
|
||
Example:: | ||
|
||
>>> trained_model.most_similar_to_given('music', ['water', 'sound', 'backpack', 'mouse']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you this @gojomo, it's a useful feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If done efficiently it makes sense. If the common case is there's a stable subset against which some projects are doing all of their similarity-searches, the need might be best met by a new method for creating a subset KeyedVectors, with just the words-of-interest. Overlaps with closed-as-idle PR #1229.
@TheMathMajor Also, please fix PEP8 issues (look at travis log) |
There's another nearly-complete implementation of similar functionality by @shubhvachher in closed-as-idle PR #1229. |
Ping @TheMathMajor, what's a status of this PR? |
Hi, thanks for the feedback, I have made the committed the changes requested. |
Thanks @TheMathMajor LGTM |
There's no need for "deprecated" forwarding-method in Word2Vec if this is a brand-new feature on Perhaps the method should have a test, but as a simple 1-liner composed of other well-tested methods, maybe not. But that highlights another difference with the earlier #1229 – while that PR had a lot of code-duplication, it did try to do the similarity calculations with array math, and thus might be noticeably faster with long word-lists. If main goal is performance, that approach may have been better; if goal is simply providing a convenience/clarity/example-method, this idiomatic 1-liner is better. |
Agree, I'll remove it from Word2Vec For this method, I think clean one-liner is better (IMO we no need performance here) |
Thanks @TheMathMajor, congratz with the first contribution:1st_place_medal: |
Thanks a lot for the suggestions and guiding me through my first contribution! |
* finished adding 2 new functions * imported argmax to word2vec * reformatted * remove `most_similar_to_given` from w2v class * Fix PEP8
Added a function to find the most similar word in a given list to a given word.