Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect learning of word vectors during online training using FastText native implementation #1752

Closed
manneshiva opened this issue Dec 3, 2017 · 2 comments
Labels
bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills

Comments

@manneshiva
Copy link
Contributor

manneshiva commented Dec 3, 2017

Description

A bug in Fasttext native implementation causes syn0 to be equal to syn0_vocab at the end of training. This causes incorrect learning of vectors during online training.

Steps/Code/Corpus to Reproduce

from gensim.models.word2vec import LineSentence
from gensim.models.fasttext import FastText as FT_gensim
import os
import gensim

data_dir = '{}'.format(os.sep).join([gensim.__path__[0], 'test', 'test_data']) + os.sep
data_file = '{}lee_background.cor'.format(data_dir)
sentences = LineSentence(data_file)

model = FT_gensim(sg=1, hs=0,window=2, negative=5, iter=1)
model.build_vocab(sentences)
model.train(sentences, total_examples=model.corpus_count, epochs=model.iter)

print (model.wv.syn0 == model.wv.syn0_vocab).all()

Expected Results

False

Actual Results

True

Versions

Linux-4.10.0-40-generic-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Nov 19 2016, 06:48:10) \n[GCC 5.4.0 20160609]')
('NumPy', '1.13.3')
('SciPy', '1.0.0')
('gensim', '3.1.0')
('FAST_VERSION', 1)

@manneshiva manneshiva changed the title Incorrect learning of word vectors during online training in FastText native implementation Incorrect learning of word vectors during online training using FastText native implementation Dec 3, 2017
@menshikh-iv menshikh-iv added bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills labels Dec 4, 2017
@menshikh-iv
Copy link
Contributor

@manneshiva good description, but if you have information/ideas where is the concrete problem / how to solve it, please add it to your report.

@piskvorky
Copy link
Owner

piskvorky commented Dec 4, 2017

(off topic: regarding these weird '{}'.format constructions, have a look at os.path.join)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills
Projects
None yet
Development

No branches or pull requests

3 participants