Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fixes incorrect vectors learned during online training for FastText models #1756

Merged
merged 3 commits into from
Dec 5, 2017

Conversation

manneshiva
Copy link
Contributor

This PR addresses #1752. This bug was caused by incorrect assignment of vectors to syn0 after training a FastText model.

@manneshiva manneshiva changed the title Fixes incorrect vectors learned during online training for FastText models [MRG] Fixes incorrect vectors learned during online training for FastText models Dec 4, 2017
@@ -419,6 +419,8 @@ def online_sanity(self, model):
self.assertTrue(all(['terrorism' not in l for l in others]))
model.build_vocab(others)
model.train(others, total_examples=model.corpus_count, epochs=model.iter)
# checks that `syn0` is different from `syn0_vocab`
self.assertFalse(np.all(np.equal(model.wv.syn0, model.wv.syn0_vocab)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rather weird check. Nobody really cares if syn0 or syn0_vocab are different. What we do care about is that syn0_vocab is incorrect because it gets overwritten by get_vocab_word_vecs(). A more meaningful check would be either asserting certain properties of syn0_vocab that should hold or checking that syn0_vocab is the same before and after calling get_vocab_word_vecs().

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @janpom said + check out np.allclose :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add another test case but let us also retain the np.equal check for syn0 and syn0_vocab.

@menshikh-iv
Copy link
Contributor

Typically, we shouldn't use np.equal for any stuff with floating point dtype, but for this case, we should have to exactly same matrices, for this reason, I think that np.equal is OK.

@menshikh-iv menshikh-iv merged commit 10bd7fc into piskvorky:develop Dec 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants