-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[GSoC 2018] Multistream API for vocabulary building in *2vec (#2078)
* multistream scan vocab for doc2vec, word2vec & fastText * fixes * fix tags for doc2vec * fix tests * removed benchmark vocab * addressing comments * make interfaces and documentation more pretty * add word2vec multistream tests * fix pep8 * iteritems -> items * more precise test * add doc2vec tests * add fasttext tests * remove prints * fix seed=42 * fixed tests * add build_vocab test for fasttext * fix * change size from 10 to 5 in fasttext test because of appveyor memory limits * another test with memory error * fix py3 tests * fix iteritems for py3 * fix functools reduce * addressing comments * addressing @jayantj comments * fix language * add final vocab pruning in multistream modes * keys -> iterkeys * use heapq.nlargest * fix * multistream flag to input_streams param * fix tests * fix flake 8 * fix doc2vec docstrings * fix merging streams * fix doc2vec * max_vocab_size -> max_vocab_size / workers * fixed * / -> // (py3 division) * fix * fix docstring
- Loading branch information
1 parent
19e725d
commit 408a714
Showing
9 changed files
with
541 additions
and
121 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.