`save/load_word2vec_format` fails for `FastText` models #1743

jayantj · 2017-11-28T20:01:41Z

Description

Saving and loading using save_word2vec_format and load_word2vec_format fails for both native FastText models and models loaded using the wrapper.

Steps/Code/Corpus to Reproduce

Example:

from gensim.models import fasttext as ft
from gensim.models.wrappers import fasttext as ft_wrapper
from gensim.models.word2vec import Text8Corpus

corpus = Text8Corpus('gensim/test/test_data/lee_background.cor')
native_model = ft.FastText()
native_model.build_vocab(corpus)

print(native_model.wv.most_similar('wars'))
>>> # prints results

print(native_model.wv['wars'])
>>> # prints results

native_model.wv.save_word2vec_format('test.wv')
wv = ft_wrapper.FastTextKeyedVectors.load_word2vec_format('test.wv')

print(wv.most_similar('wars'))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-43e12f136081> in <module>()
----> 1 print(wv.most_similar('wars'))

~/Projects/gensim/gensim/gensim/models/keyedvectors.py in most_similar(self, positive, negative, topn, restrict_vocab, indexer)
    318             negative = []
    319 
--> 320         self.init_sims()
    321 
    322         if isinstance(positive, string_types) and not negative:

~/Projects/gensim/gensim/gensim/models/wrappers/fasttext.py in init_sims(self, replace)
    125             else:
    126                 self.syn0_ngrams_norm = \
--> 127                     (self.syn0_ngrams / sqrt((self.syn0_ngrams ** 2).sum(-1))[..., newaxis]).astype(REAL)
    128 
    129     def __contains__(self, word):

TypeError: unsupported operand type(s) for ** or pow(): 'NoneType' and 'int'

print(wv['wars'])

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-ce05f767b013> in <module>()
----> 1 print(wv['wars'])

~/Projects/gensim/gensim/gensim/models/keyedvectors.py in __getitem__(self, words)
    601         if isinstance(words, string_types):
    602             # allow calls like trained_model['office'], as a shorthand for trained_model[['office']]
--> 603             return self.word_vec(words)
    604 
    605         return vstack([self.word_vec(word) for word in words])

~/Projects/gensim/gensim/gensim/models/wrappers/fasttext.py in word_vec(self, word, use_norm)
     91             return super(FastTextKeyedVectors, self).word_vec(word, use_norm)
     92         else:
---> 93             word_vec = np.zeros(self.syn0_ngrams.shape[1], dtype=np.float32)
     94             ngrams = compute_ngrams(word, self.min_n, self.max_n)
     95             ngrams = [ng for ng in ngrams if ng in self.ngrams]

AttributeError: 'NoneType' object has no attribute 'shape'

From a quick glance, it looks like this resulted from the changes made to FastTextKeyedVectors during the native implementation of FastText where two different matrices - syn0_vocab and syn0_ngrams were created.
Although, I'm not sure save_word2vec_format is even suitable for FastText seeing as how the ngram vectors aren't stored to disk.

The text was updated successfully, but these errors were encountered:

piskvorky · 2017-11-29T13:33:55Z

How could this happen? Aren't we testing save_word2vec_format/load_word2vec_format in unit tests?

menshikh-iv · 2017-11-30T07:47:47Z

ping @chinmayapancholi13, can you comment this?

chinmayapancholi13 · 2017-12-05T09:25:01Z

@menshikh-iv Hey Ivan! I don't think I made any changes to the load/save_word2vec_format methods in fastText. Could you please give me till tonight to take a look at this issue?

Thanks for your patience. :)

chinmayapancholi13 · 2017-12-06T10:29:35Z

@menshikh-iv Hi Ivan. I was looking at the codes of the two functions save/load_word2vec_format. I can see that PR (#1755) has now been added for this issue.

Before I give a shot at trying to solve the problem, I wanted to confirm if this has already been resolved (in case there was some discussion that I was not a part of)? And if not, what is the behavior that we expect for these functions?

menshikh-iv · 2017-12-06T11:47:30Z

Thanks @chinmayapancholi13, @shiva already fix it in #1755

jayantj changed the title ~~save/load_word2vec_format fails for FastText native models~~ save/load_word2vec_format fails for FastText models Nov 28, 2017

menshikh-iv added bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills labels Nov 29, 2017

manneshiva mentioned this issue Dec 4, 2017

Fix save/load_word2vec_format methods for FastText model. Fix #1743 #1755

Merged

menshikh-iv closed this as completed in 09a16d1 Dec 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`save/load_word2vec_format` fails for `FastText` models #1743

`save/load_word2vec_format` fails for `FastText` models #1743

jayantj commented Nov 28, 2017 •

edited

Loading

piskvorky commented Nov 29, 2017 •

edited

Loading

menshikh-iv commented Nov 30, 2017

chinmayapancholi13 commented Dec 5, 2017

chinmayapancholi13 commented Dec 6, 2017

menshikh-iv commented Dec 6, 2017

save/load_word2vec_format fails for FastText models #1743

save/load_word2vec_format fails for FastText models #1743

Comments

jayantj commented Nov 28, 2017 • edited Loading

Description

Steps/Code/Corpus to Reproduce

piskvorky commented Nov 29, 2017 • edited Loading

menshikh-iv commented Nov 30, 2017

chinmayapancholi13 commented Dec 5, 2017

chinmayapancholi13 commented Dec 6, 2017

menshikh-iv commented Dec 6, 2017

`save/load_word2vec_format` fails for `FastText` models #1743

`save/load_word2vec_format` fails for `FastText` models #1743

jayantj commented Nov 28, 2017 •

edited

Loading

piskvorky commented Nov 29, 2017 •

edited

Loading