LDA show_topic() might show wordids, not words #354

pshvechikov · 2015-06-02T09:10:10Z

I mean this line:
https://github.com/piskvorky/gensim/blob/develop/gensim/models/ldamodel.py#L757
should be

beststr = [(topic[id], id) for id in bestn]

Otherwise it is not possible to get similar output to tfidf model, which shows list of tuples such as this

 (98801, 0.008706198772450975),

including id of the word and its tfidf metric.
Maybe such a behavior should rely on some function parameter, so it dont ruin any other related code which uses default word representation of show_topic.
I mean something like this:

    def show_topic(self, topicid, topn=10, show_ids=False):                         
        """                                                                         
        Return a list of `(words_probability, word)` 2-tuples for the most probable
        words in topic `topicid`.                                                   

        Only return 2-tuples for the topn most probable words (ignore the rest). 

        """                                                                         
        topic = self.state.get_lambda()[topicid]                                    
        topic = topic / topic.sum() # normalize to probability dist                 
        bestn = numpy.argsort(topic)[::-1][:topn]                                   
        if show_ids:                                                                
            beststr = [(topic[id], id) for id in bestn]                             
        else:                                                                       
            beststr = [(topic[id], self.id2word[id]) for id in bestn]               
        return beststr

Nevertheless, even this approach is contradictory in the order of id, metric inside a tuple - tfidf shows id as a first parameter and lda.show_topics() does it as a second.

The text was updated successfully, but these errors were encountered:

cscorley · 2015-06-27T19:55:09Z

I think that is something desirable, especially making sure that the (id, metric) is consistent across modules. Mind opening a PR?

BTW, there's no need to change the linked line. It isn't returned in the end for that method, anyway. I think you meant a few lines up, in show_topic: https://github.com/piskvorky/gensim/blob/develop/gensim/models/ldamodel.py#L734

menshikh-iv · 2017-10-03T07:12:58Z

Resolved in #448.

cscorley mentioned this issue Jul 7, 2015

show_topics method is not consistent across HdpModel and LdaModel #389

Closed

piskvorky added feature Issue described a new feature difficulty easy Easy issue: required small fix labels Sep 11, 2015

cscorley added a commit that referenced this issue Sep 12, 2015

Issues #354, #389: consistent show_topics

bd755f4

cscorley mentioned this issue Sep 12, 2015

Make show_topics more consistent across models #448

Merged

cscorley added a commit that referenced this issue Sep 13, 2015

Issues #354, #389: consistent show_topics

ed97132

cscorley added a commit that referenced this issue Sep 13, 2015

Issues #354, #389: consistent show_topics

e6ac51c

cscorley added a commit that referenced this issue Sep 13, 2015

Issues #354, #389: consistent show_topics

1069477

cscorley added a commit that referenced this issue Sep 13, 2015

Issue #354: swap order of show_topic to (id, weight)

af415d5

menshikh-iv closed this as completed Oct 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LDA show_topic() might show wordids, not words #354

LDA show_topic() might show wordids, not words #354

pshvechikov commented Jun 2, 2015

cscorley commented Jun 27, 2015

menshikh-iv commented Oct 3, 2017

LDA show_topic() might show wordids, not words #354

LDA show_topic() might show wordids, not words #354

Comments

pshvechikov commented Jun 2, 2015

cscorley commented Jun 27, 2015

menshikh-iv commented Oct 3, 2017