Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue #743, Updated word2vec.n_similarities and test_word2vec.testSimilarities methods #883

Merged
merged 17 commits into from
Sep 26, 2016
Merged
13 changes: 8 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
Changes
=======

0.13.3, 2016-09-26

* Fixed issue #743 , In word2vec's n_similarity method if atleast one empty list is passed ZeroDivisionError is raised, added test cases in test/test_word2vec.py(@pranay360, #883)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't delete from changelog of phrases fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't delete it, I had a previous build in which that entry was not in the changelog, anyway I added it. Thank You for marking out the changes.


0.13.2, 2016-08-19
* export_phrases in Phrases model changed. Fixed issue #794 and added test cases in test/test_phrases.py(@AadityaJ,
[#879](https://github.com/RaRe-Technologies/gensim/pull/879))
- bigram construction can now support multiple bigrams within one sentence
* wordtopics has changed to word_topics in ldamallet, and fixed issue #764. (@bhargavvader, [#771](https://github.com/RaRe-Technologies/gensim/pull/771))

* export_phrases in Phrases model changed. Fixed issue #794 and added test cases in test/test_phrases.py(@AadityaJ, #879) bigram construction can now support multiple bigrams within one sentence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add to future release 0.13.3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you talking about fixing the issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pr is about #743. There is no need to change other parts of changelog.md

* wordtopics has changed to word_topics in ldamallet, and fixed issue #764. (@bhargavvader, [#771](https://github.com/RaRe-Technologies/gensim/pull/771))
- assigning wordtopics value of word_topics to keep backward compatibility, for now
* topics, topn parameters changed to num_topics and num_words in show_topics() and print_topics()(@droudy, [#755](https://github.com/RaRe-Technologies/gensim/pull/755))
- In hdpmodel and dtmmodel
Expand Down Expand Up @@ -47,7 +50,7 @@ Changes
* Control whether to use lowercase for computing word2vec accuracy. (@alantian, #607)
* Easy import of GloVe vectors using Gensim (Manas Ranjan Kar, #625)
- Allow easy port of GloVe vectors into Gensim
- Standalone script with command line arguments, compatible with Python>=2.6
- Standalone script with command line arguments, compatible with Python>=2.6
- Usage: python -m gensim.scripts.glove2word2vec -i glove_vectors.txt -o output_word2vec_compatible.txt
* Add `similar_by_word()` and `similar_by_vector()` to word2vec (@isohyt, #381)
* Convenience method for similarity of two out of training sentences to doc2vec (@ellolo, #707)
Expand Down
6 changes: 5 additions & 1 deletion gensim/models/word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -1539,9 +1539,13 @@ def n_similarity(self, ws1, ws2):
True

"""
if not(len(ws1) and len(ws2)):
raise ZeroDivisionError('Atleast one of the passed list is empty.')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If input is a list, idiomatic Python is: if not ws1 or not ws2:.

v1 = [self[word] for word in ws1]
v2 = [self[word] for word in ws2]
return dot(matutils.unitvec(array(v1).mean(axis=0)), matutils.unitvec(array(v2).mean(axis=0)))
return dot(matutils.unitvec(array(v1).mean(axis=0)),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use vertical indent in gensim -- change to normal hanging indent (see PEP8: all statements on new lines, one level of indent).

matutils.unitvec(array(v2).mean(axis=0)))


def init_sims(self, replace=False):
"""
Expand Down
5 changes: 4 additions & 1 deletion gensim/test/test_word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -370,9 +370,12 @@ def testSimilarities(self):
model = word2vec.Word2Vec(size=2, min_count=1, sg=0, hs=0, negative=2)
model.build_vocab(sentences)
model.train(sentences)

self.assertTrue(model.n_similarity(['graph', 'trees'], ['trees', 'graph']))
self.assertTrue(model.n_similarity(['graph'], ['trees']) == model.similarity('graph', 'trees'))
self.assertRaises(ZeroDivisionError, model.n_similarity, ['graph', 'trees'], [])
self.assertRaises(ZeroDivisionError, model.n_similarity, [], ['graph', 'trees'])
self.assertRaises(ZeroDivisionError, model.n_similarity, [], [])

def testSimilarBy(self):
"""Test word2vec similar_by_word and similar_by_vector."""
Expand Down