(LDA): sparse initialization rather than uniformly random initialization #37

hucheng · 2015-08-26T11:07:40Z

The phenomenon of LDA training is that the first several training is very costly, this is largely due to the uniformly random initialization that the word-topic thus doc-topic is quite dense.

There are two approaches：

sparse initialization that constraints a word to only a part (like 1%) (randomly) of all topics, and for each tokens of that word, randomly sample from those constrained topics rather than all topics.
First use part of corpus (like 1%) to train several iterations to initialize the word-topic distribution, which should be quite sparse than uniformly random initialization.

bhoppi · 2015-09-11T01:56:48Z

We can split the perplexity into two parts, e.g. word perplexity and doc perplexity. Then we will look into the impact of the different initialization strategies to the two perplexity parts.

hucheng · 2015-09-12T09:35:47Z

Great. Look forward to the experimental result of both word perplexity and doc perplexity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(LDA): sparse initialization rather than uniformly random initialization #37

(LDA): sparse initialization rather than uniformly random initialization #37

hucheng commented Aug 26, 2015

bhoppi commented Sep 11, 2015

hucheng commented Sep 12, 2015

(LDA): sparse initialization rather than uniformly random initialization #37

(LDA): sparse initialization rather than uniformly random initialization #37

Comments

hucheng commented Aug 26, 2015

bhoppi commented Sep 11, 2015

hucheng commented Sep 12, 2015