Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastText model gets error with typical methods #1343

Closed
beeva-enriqueotero opened this issue May 22, 2017 · 13 comments
Closed

FastText model gets error with typical methods #1343

beeva-enriqueotero opened this issue May 22, 2017 · 13 comments

Comments

@beeva-enriqueotero
Copy link

Description

I get an error with (Spanish) FastText model and methods doesnt_match, or most_similar:

AttributeError: 'FastTextKeyedVectors' object has no attribute 'syn0_all'

Steps/Code/Corpus to Reproduce

Download pretrained (Spanish) .vec

from gensim.models.wrappers import FastText
model = FastText.load_word2vec_format('wiki.es.vec')
print model.doesnt_match("rey reina patata".split())
#print model.most_similar("rey")

Expected Results

patata

Actual Results

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-90dc8fd2cfb7> in <module>()
      2 PATH = '/home/milhouse/datasets/word2vec/wiki.es/'
      3 model = FastText.load_word2vec_format(PATH+'wiki.es.vec')
----> 4 print model.doesnt_match("rey reina patata".split())
      5 #print model.most_similar("rey")

/home/milhouse/tools/virtualenvs/gensim/local/lib/python2.7/site-packages/gensim/models/keyedvectors.pyc in doesnt_match(self, words)
    540 
    541         """
--> 542         self.init_sims()
    543 
    544         used_words = [word for word in words if word in self]

/home/milhouse/tools/virtualenvs/gensim/local/lib/python2.7/site-packages/gensim/models/wrappers/fasttext.pyc in init_sims(self, replace)
    111                 self.syn0_all_norm = self.syn0_all
    112             else:
--> 113                 self.syn0_all_norm = (self.syn0_all / sqrt((self.syn0_all ** 2).sum(-1))[..., newaxis]).astype(REAL)
    114 
    115     def __contains__(self, word):

AttributeError: 'FastTextKeyedVectors' object has no attribute 'syn0_all'

Versions

Linux-4.4.0-75-generic-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Nov 19 2016, 06:48:10) \n[GCC 5.4.0 20160609]')
('NumPy', '1.12.1')
('SciPy', '0.19.0')
('gensim', '2.1.0')
('FAST_VERSION', 1)
I've tried also gensim versions 2.0.0 and 1.0.1 with same results.

@prakhar2b
Copy link
Contributor

prakhar2b commented May 22, 2017

@beeva-enriqueotero For fastText trained models, parameter syn0_all is loaded using .bin file. See this line for context.

A proper way of using doesn't match or most_similar would be -

from gensim.models.wrappers import FastText
model = FastText.load_fasttext_format('wiki.es') 
print model.doesnt_match("rey reina patata".split())

@jayantj
Copy link
Contributor

jayantj commented May 23, 2017

To load the full FastText model, you need the .bin file as well as the .vec file. The correct usage if you wish to make use of the full model would be -

from gensim.models.wrappers import FastText
model = FastText.load_fasttext_format('wiki.es')  # Note that you don't specify .bin or .vec, both files are loaded
print model.doesnt_match("rey reina patata".split())
print model.most_similar("rey")

The full model allows you to use out-of-vocabulary words.

If you only wish to use the .vec file, the correct usage would be -

from gensim.models.keyedvectors import KeyedVectors
model = KeyedVectors.load_word2vec_format('wiki.es.vec')
print model.doesnt_match("rey reina patata".split())
print model.most_similar("rey")

Note that using only the .vec prevents you from obtaining vectors for out-of-vocabulary words.

@gojomo
Copy link
Collaborator

gojomo commented May 23, 2017

@jayantj - should 2nd line of your 1st "use of the full model" code block really be load_word2vec_format()?

@jayantj
Copy link
Contributor

jayantj commented May 23, 2017

Thanks - fixed.

@piskvorky
Copy link
Owner

@jayantj I've seen several questions about this functionality now.

Worth adding to the gensim FAQ?

@beeva-enriqueotero
Copy link
Author

Thanks for your answers!

However when I tried model = FastText.load_fasttext_format('wiki.es') I've got AssertionError: mismatch between vocab sizes :(`

I guess this issue is related to #1236

Any further advance regarding this error?

Regards

@prakhar2b
Copy link
Contributor

prakhar2b commented May 25, 2017

@beeva-enriqueotero PR #1319 (supporting both old and new fastText model) resolves this mismatch error. It should be merged very soon. For now, you can load the model using only vec file as mentioned in the above comment
cc @tmylk

@jayantj
Copy link
Contributor

jayantj commented May 25, 2017

@piskvorky Yes, that would be a good idea. Let's wait for #1319 to be merged though, as the version in master/develop right now doesn't have the latest changes required to load the newer FastText models.

@menshikh-iv
Copy link
Contributor

Fixed in #1319

@beeva-enriqueotero
Copy link
Author

Thank you very much for your quick and efficient answers! :)

@eduamf
Copy link

eduamf commented Feb 10, 2021

I'm using fasttext binaries files trained before

from gensim.models import FastText
# some stuff
# ...
# load fasttext model
model1 = FastText.load_fasttext_format(arq1 + ".bin")
# loop
for i, word in enumerate(lwords):
     # NOW, the error:
     neighbors = {w : d for d, w in model1.get_nearest_neighbors(word, k=neighborsNum)}
     # ...

The error is: AttributeError: 'FastText' object has no attribute 'get_nearest_neighbors'

@mpenkov
Copy link
Collaborator

mpenkov commented Feb 11, 2021

@eduamf You're commenting on a ticket that was resolved over 3 years ago.

Please open a new ticket and fill out the template.

@menshikh-iv
Copy link
Contributor

@eduamf method get_nearest_neighbors doesn't exist in gensim, you need most_similar instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants