Using pre-trained word2vec models in doc2vec #1338

bgokden · 2017-05-19T13:11:01Z

Is there a practical way of using pre-trained word2vec models in doc2vec?

There is a forked version of Gensim that does it but it is pretty old.
Referenced here: https://github.com/jhlau/doc2vec
Forked Gensim here: https://github.com/jhlau/gensim

Otherwise I would like to add this feature as jhlau did and merge it back.

gojomo · 2017-05-19T17:04:53Z

You can manually patch-up a model to insert word-vectors from elsewhere before training. The existing intersect_word2vec_format() may be useful, directly or as an example - it assumes you've already created a model with its own vocabulary (including the frequency info needed for negative-sampling or frequent-word-downsampling), but then want to use some external source to replace some/all of the word-vector values.

I personally don't think the case for such re-use is yet strong – indeed in some often top-performing Doc2Vec training modes (like pure PV-DBOW), input-word-vectors aren't trained or used at all, so loading them would be completely superfluous. You can see some discussion of related issues, including links to messages elsewhere, in the Github issue thread: #1270 (comment)

maohbao · 2019-12-17T09:52:11Z

This fork supports the latest gensim 3.8, which can train doc2vec model with pretrained word2vec.

https://github.com/maohbao/gensim

gojomo · 2019-12-18T19:28:44Z

As per above, I think the evidence for the benefit of such a technique is muddled.

Also: it should be possible simply by poking/prodding a standard model at the right points between instantiation and training – without any major changes or new-parameters to the relevant models, or using a forked version of gensim (that will drift further away from other changes/fixes over time).

menshikh-iv closed this as completed Oct 2, 2017

mpenkov mentioned this issue Dec 21, 2019

Support pretrained word2vec model when train doc2vec #2703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using pre-trained word2vec models in doc2vec #1338

Using pre-trained word2vec models in doc2vec #1338

bgokden commented May 19, 2017

gojomo commented May 19, 2017

maohbao commented Dec 17, 2019

gojomo commented Dec 18, 2019

Using pre-trained word2vec models in doc2vec #1338

Using pre-trained word2vec models in doc2vec #1338

Comments

bgokden commented May 19, 2017

gojomo commented May 19, 2017

maohbao commented Dec 17, 2019

gojomo commented Dec 18, 2019