Enabling inference on held-out data in the author-topic model #1166

olavurmortensen · 2017-02-24T10:41:33Z

At the moment, it is not possible to make inference on held-out data in the AuthorTopicModel, and as a result it is not possible to evaluate model fit (bound) on new data either.

In LDA, we infer on held-out documents by calling gammad, _ = self.inference([doc]), learning the document's topic distribution gamma (local parameter), without updating the model (sstats, global parameter), by (implicitly) setting collect_sstats=False. This allows us to compute the bound on those documents.

It is not 100% clear what inference on held-out data means in the author-topic model. I suggest this definition, analogous to LDA: computing the topic distribution gamma for a new author with documents docs without updating the model (i.e. no change to sstats). Then computing the bound on these held-out documents and authors.

The inference algorithm used in the AuthorTopicModel, as well as the model class, is very similar to LdaModel. Therefore, anyone with experience with LdaModel should find it relatively easy to jump into the AuthorTopicModel.

A report detailing the theory as well as the implementation is available here. This is my masters thesis.

I will offer my assistance with anything related to the AuthorTopicModel to anyone willing to take on this issue. For example on a GitHub PR and on Gitter.

The text was updated successfully, but these errors were encountered:

nickkimer · 2018-04-23T03:34:58Z

has this been implemented anywhere?

menshikh-iv · 2018-04-23T06:38:04Z

@nickkimer now possible to infer vector for unseen author #1766 (in develop now), I think this addition resolve current issue, you can install current develop branch or wait next 3.5.0 release .

tmylk added the wishlist Feature request label Feb 25, 2017

menshikh-iv added feature Issue described a new feature difficulty medium Medium issue: required good gensim understanding & python skills and removed wishlist Feature request labels Oct 2, 2017

menshikh-iv closed this as completed Apr 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling inference on held-out data in the author-topic model #1166

Enabling inference on held-out data in the author-topic model #1166

olavurmortensen commented Feb 24, 2017

nickkimer commented Apr 23, 2018 •

edited

Loading

menshikh-iv commented Apr 23, 2018

Enabling inference on held-out data in the author-topic model #1166

Enabling inference on held-out data in the author-topic model #1166

Comments

olavurmortensen commented Feb 24, 2017

nickkimer commented Apr 23, 2018 • edited Loading

menshikh-iv commented Apr 23, 2018

nickkimer commented Apr 23, 2018 •

edited

Loading