Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save/load_word2vec_format fails for FastText models #1743

Closed
jayantj opened this issue Nov 28, 2017 · 5 comments
Closed

save/load_word2vec_format fails for FastText models #1743

jayantj opened this issue Nov 28, 2017 · 5 comments
Labels
bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills

Comments

@jayantj
Copy link
Contributor

jayantj commented Nov 28, 2017

Description

Saving and loading using save_word2vec_format and load_word2vec_format fails for both native FastText models and models loaded using the wrapper.

Steps/Code/Corpus to Reproduce

Example:

from gensim.models import fasttext as ft
from gensim.models.wrappers import fasttext as ft_wrapper
from gensim.models.word2vec import Text8Corpus

corpus = Text8Corpus('gensim/test/test_data/lee_background.cor')
native_model = ft.FastText()
native_model.build_vocab(corpus)

print(native_model.wv.most_similar('wars'))
>>> # prints results

print(native_model.wv['wars'])
>>> # prints results

native_model.wv.save_word2vec_format('test.wv')
wv = ft_wrapper.FastTextKeyedVectors.load_word2vec_format('test.wv')
print(wv.most_similar('wars'))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-43e12f136081> in <module>()
----> 1 print(wv.most_similar('wars'))

~/Projects/gensim/gensim/gensim/models/keyedvectors.py in most_similar(self, positive, negative, topn, restrict_vocab, indexer)
    318             negative = []
    319 
--> 320         self.init_sims()
    321 
    322         if isinstance(positive, string_types) and not negative:

~/Projects/gensim/gensim/gensim/models/wrappers/fasttext.py in init_sims(self, replace)
    125             else:
    126                 self.syn0_ngrams_norm = \
--> 127                     (self.syn0_ngrams / sqrt((self.syn0_ngrams ** 2).sum(-1))[..., newaxis]).astype(REAL)
    128 
    129     def __contains__(self, word):

TypeError: unsupported operand type(s) for ** or pow(): 'NoneType' and 'int'
print(wv['wars'])

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-ce05f767b013> in <module>()
----> 1 print(wv['wars'])

~/Projects/gensim/gensim/gensim/models/keyedvectors.py in __getitem__(self, words)
    601         if isinstance(words, string_types):
    602             # allow calls like trained_model['office'], as a shorthand for trained_model[['office']]
--> 603             return self.word_vec(words)
    604 
    605         return vstack([self.word_vec(word) for word in words])

~/Projects/gensim/gensim/gensim/models/wrappers/fasttext.py in word_vec(self, word, use_norm)
     91             return super(FastTextKeyedVectors, self).word_vec(word, use_norm)
     92         else:
---> 93             word_vec = np.zeros(self.syn0_ngrams.shape[1], dtype=np.float32)
     94             ngrams = compute_ngrams(word, self.min_n, self.max_n)
     95             ngrams = [ng for ng in ngrams if ng in self.ngrams]

AttributeError: 'NoneType' object has no attribute 'shape'

From a quick glance, it looks like this resulted from the changes made to FastTextKeyedVectors during the native implementation of FastText where two different matrices - syn0_vocab and syn0_ngrams were created.
Although, I'm not sure save_word2vec_format is even suitable for FastText seeing as how the ngram vectors aren't stored to disk.

@jayantj jayantj changed the title save/load_word2vec_format fails for FastText native models save/load_word2vec_format fails for FastText models Nov 28, 2017
@menshikh-iv menshikh-iv added bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills labels Nov 29, 2017
@piskvorky
Copy link
Owner

piskvorky commented Nov 29, 2017

How could this happen? Aren't we testing save_word2vec_format/load_word2vec_format in unit tests?

@menshikh-iv
Copy link
Contributor

ping @chinmayapancholi13, can you comment this?

@chinmayapancholi13
Copy link
Contributor

@menshikh-iv Hey Ivan! I don't think I made any changes to the load/save_word2vec_format methods in fastText. Could you please give me till tonight to take a look at this issue?

Thanks for your patience. :)

@chinmayapancholi13
Copy link
Contributor

@menshikh-iv Hi Ivan. I was looking at the codes of the two functions save/load_word2vec_format. I can see that PR (#1755) has now been added for this issue.

Before I give a shot at trying to solve the problem, I wanted to confirm if this has already been resolved (in case there was some discussion that I was not a part of)? And if not, what is the behavior that we expect for these functions?

@menshikh-iv
Copy link
Contributor

Thanks @chinmayapancholi13, @shiva already fix it in #1755

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills
Projects
None yet
Development

No branches or pull requests

4 participants