-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #1196 - Gensim error when loading FastText #1214
Conversation
Travis tests re-ran after smart_open update |
What is the purpose of adding a new attribute |
Thanks @tmylk for the comment. To my understanding, you would use FastText in the case you want to load both the vec and bin files of the fasttext. In case you just want to load the vectors you can use FastTextKeyedVectors. As you pointed out, you can use By adding it to the FastText class you meant FastTextKeyedVectors class, right? EDIT: It seems like it should be Aslo, there is currenty no override of Looking forward to your suggestions. |
Furthermore, you cannot just use FastTextKeyedVectors without FastText initialization (which needs both vec and bin) as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just indent changes requested. Glad it became a comprehensive fix!
Please add a note in the changelog.md as well.
gensim/test/test_fasttext_wrapper.py
Outdated
@@ -169,10 +189,12 @@ def testMostSimilarCosmul(self): | |||
"""Test most_similar_cosmul for in-vocab and out-of-vocab words""" | |||
# In vocab, sanity check | |||
self.assertEqual(len(self.test_model.most_similar_cosmul(positive=['the', 'and'], topn=5)), 5) | |||
self.assertEqual(self.test_model.most_similar_cosmul('the'), self.test_model.most_similar_cosmul(positive=['the'])) | |||
self.assertEqual(self.test_model.most_similar_cosmul('the'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use hanging indent
gensim/test/test_fasttext_wrapper.py
Outdated
# Out of vocab check | ||
self.assertEqual(len(self.test_model.most_similar_cosmul(['night', 'nights'], topn=5)), 5) | ||
self.assertEqual(self.test_model.most_similar_cosmul('nights'), self.test_model.most_similar_cosmul(positive=['nights'])) | ||
self.assertEqual(self.test_model.most_similar_cosmul('nights'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hanging indent preferred
Thanks for reviewing the code. It seems like the build for python 2 stalled, could you rerun it @tmylk please? |
These tests are know to occasionally fail but it's the first time they fail constantly. Will disable them in the main branch soon. |
Ok. Let me know if I can help with anything. |
Gensim can load large fasttext model on Mac
Loading of binary fasttext models is faster
Fasttext vector size is correctly set
Vector_size is also saved when loading just the vector_file, which can be useful if you are not interested in training