test the topic changing over time with CSV format #2527

maplejia · 2019-06-14T12:13:19Z

I am trying to implement gensim.models.wrappers import DtmModel to test the topic changing over time.

My testing file is Amazon review file with CSV format, which include reviews, ratings, date and title.

I am trying to replicate the page https://radimrehurek.com/gensim/models/wrappers/dtmmodel.html to find the topic changing over time, but fail to create time_slices. Is any one can help me? thank you very much.

Problem description

What are you trying to achieve? What is the expected result? What are you seeing instead?

Steps/code/corpus to reproduce

Include full tracebacks, logs and datasets if necessary. Please keep the examples minimal ("minimal reproducible example").

Versions

Please provide the output of:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import gensim; print("gensim", gensim.__version__)
from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)

maplejia · 2019-06-14T13:20:38Z

Problem description

I am trying to replicate the page https://radimrehurek.com/gensim/models/wrappers/dtmmodel.html to find the topic changing over time, but fail to create time_slices.
I am using a amazon review csv file, which include review content, date, ID and title each line.

Steps/code/corpus to reproduce

import nltk
from nltk import FreqDist
nltk.download('stopwords') # run this one time

output:
[nltk_data] Downloading package stopwords to
[nltk_data] C:\Users\qsu2\AppData\Roaming\nltk_data...
[nltk_data] Package stopwords is already up-to-date!
True

import pandas as pd
pd.set_option("display.max_colwidth", 200)
import numpy as np
import re
import spacy

import gensim
from gensim import corpora

libraries for visualization

import pyLDAvis
import pyLDAvis.gensim
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

output: 2019-06-14 06:14:01,351 : DEBUG : backend module://ipykernel.pylab.backend_inline version unknown

import pandas as pd
df = pd.read_csv("Amazon_review.csv", encoding="ISO-8859–1")
df['Content'] = df['Content'].str.replace("[^a-zA-Z#]", " ")# remove unwanted characters, numbers and symbols
5.from nltk.corpus import stopwords
stop_words = stopwords.words('english')
function to remove stopwords

def remove_stopwords(rev):
rev_new = " ".join([i for i in rev if i not in stop_words])
return rev_new

remove short words (length < 3)

df['Content'] = df['Content'].apply(lambda x: ' '.join([w for w in x.split() if len(w)>2]))

remove stopwords from the text

reviews = [remove_stopwords(r.split()) for r in df['Content']]

make entire text lowercase

reviews = [r.lower() for r in reviews]

Include full tracebacks, logs and datasets if necessary. Please keep the examples minimal ("minimal reproducible example").
7. import spacy
!python -m spacy download en
output:
Requirement already satisfied: en_core_web_sm==2.1.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm-2.1.0.tar.gz#egg=en_core_web_sm==2.1.0 in c:\users\qsu2\appdata\local\continuum\anaconda3\lib\site-packages (2.1.0)
[+] Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')
symbolic link created for C:\Users\qsu2\AppData\Local\Continuum\anaconda3\lib\site-packages\spacy\data\en <<===>> C:\Users\qsu2\AppData\Local\Continuum\anaconda3\lib\site-packages\en_core_web_sm
[+] Linking successful
C:\Users\qsu2\AppData\Local\Continuum\anaconda3\lib\site-packages\en_core_web_sm
-->
C:\Users\qsu2\AppData\Local\Continuum\anaconda3\lib\site-packages\spacy\data\en
You can now load the model via spacy.load('en')

nlp = spacy.load('en', disable=['parser', 'ner'])

def lemmatization(texts, tags=['NOUN', 'ADJ']): # filter noun and adjective
output = []
for sent in texts:
doc = nlp(" ".join(sent))
output.append([token.lemma_ for token in doc if token.pos_ in tags])
return output
9. tokenized_reviews = pd.Series(reviews).apply(lambda x: x.split())
print(tokenized_reviews[1])

output:

['bought', 'samsung', 'reviewed', 'happy', 'new', 'version', 'released', 'resist', 'buying', 'married', 'geek', 'never', 'many', 'gadgets', 'around', 'house', 'here', 'love', 'laptop', 'incredibly', 'thin', 'light', 'however', 'use', 'feels', 'big', 'keyboard', 'full', 'size', 'one', 'keyboard', 'great', 'better', 'trackpad', 'huge', 'laptop', 'size', 'improvement', 'previous', 'version', 'trackpad', 'surface', 'smooth', 'almost', 'feels', 'like', 'glass', 'makes', 'scrolling', 'breeze', 'overall', 'hardware', 'great', 'looks', 'really', 'good', 'especially', 'price', 'tag', 'expecting', 'surprises', 'software', 'since', 'old', 'latest', 'features', 'thanks', 'automatic', 'updates', 'however', 'pleasantly', 'surprised', 'speed', 'new', 'boots', 'quickly', 'pages', 'load', 'really', 'fast', 'great', 'use', 'there', 'also', 'new', 'add', 'ons', 'improvement', 'well', 'apps', 'create', 'google', 'doc', 'slide', 'spreadsheet', 'one', 'click', 'well', 'new', 'camera', 'app', 'lot', 'fun', 'hope', 'google', 'releases', 'old', 'all', 'happy', 'laptop', 'highly', 'recommend', 'great', 'features', 'usability', 'amazing', 'price', 'great', 'christmas', 'gift', 'idea']

reviews_2 = lemmatization(tokenized_reviews)
print(reviews_2[1]) # print lemmatized review

output:

['samsung', 'happy', 'new', 'version', 'resist', 'married', 'geek', 'many', 'gadget', 'house', 'laptop', 'thin', 'light', 'big', 'keyboard', 'full', 'size', 'keyboard', 'well', 'trackpad', 'huge', 'laptop', 'size', 'improvement', 'previous', 'version', 'trackpad', 'surface', 'smooth', 'glass', 'overall', 'hardware', 'great', 'good', 'price', 'tag', 'surprise', 'software', 'old', 'late', 'feature', 'thank', 'automatic', 'update', 'surprised', 'speed', 'new', 'boot', 'load', 'great', 'use', 'new', 'on', 'improvement', 'well', 'app', 'doc', 'slide', 'spreadsheet', 'click', 'new', 'camera', 'app', 'lot', 'fun', 'hope', 'release', 'old', 'happy', 'laptop', 'great', 'feature', 'usability', 'amazing', 'price', 'great', 'christmas', 'gift', 'idea']

dictionary = corpora.Dictionary(reviews_2)

output: 2019-06-14 06:15:45,260 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2019-06-14 06:15:45,342 : INFO : built Dictionary(4778 unique tokens: ['amazed', 'app', 'background', 'bag', 'bookmark']...) from 1291 documents (total 60491 corpus positions)

doc_term_matrix = [dictionary.doc2bow(rev) for rev in reviews_2]

13.LDA = gensim.models.ldamodel.LdaModel
lda_model = LDA(corpus=doc_term_matrix, id2word=dictionary, num_topics=10, random_state=100,
chunksize=1000, passes=50)
output:

corpus=doc_term_matrix

Creating the object for LDA model using gensim library

LDA = gensim.models.ldamodel.LdaModel

Build LDA model

lda_model = LDA(corpus=doc_term_matrix, id2word=dictionary, num_topics=7, random_state=100,

lda_model = LDA(corpus=doc_term_matrix, id2word=dictionary, num_topics=10, random_state=100,
chunksize=1000, passes=50)
2019-06-14 07:33:26,336 : INFO : using symmetric alpha at 0.1
2019-06-14 07:33:26,341 : INFO : using symmetric eta at 0.1
2019-06-14 07:33:26,367 : INFO : using serial LDA version on this node
2019-06-14 07:33:26,623 : INFO : running online (multi-pass) LDA training, 10 topics, 50 passes over the supplied corpus of 1291 documents, updating model once every 1000 documents, evaluating perplexity every 1291 documents, iterating 50x with a convergence threshold of 0.001000
2019-06-14 07:33:26,628 : INFO : PROGRESS: pass 0, at document #1000/1291
2019-06-14 07:33:26,631 : DEBUG : performing inference on a chunk of 1000 documents
2019-06-14 07:33:28,961 : DEBUG : 413/1000 documents converged within 50 iterations
2019-06-14 07:33:28,975 : DEBUG : updating topics
2019-06-14 07:33:28,997 : INFO : merging changes from 1000 documents into a model of 1291 documents
2019-06-14 07:33:29,052 : INFO : topic #6 (0.100): 0.013*"good" + 0.012*"chrome" + 0.012*"laptop" + 0.010*"computer" + 0.010*"thing" + 0.010*"price" + 0.009*"great" + 0.008*"love" + 0.008*"chromebook" + 0.008*"book"
2019-06-14 07:33:29,055 : INFO : topic #1 (0.100): 0.017*"chromebook" + 0.015*"use" + 0.014*"laptop" + 0.013*"samsung" + 0.010*"device" + 0.009*"keyboard" + 0.009*"battery" + 0.009*"work" + 0.008*"chrome" + 0.008*"computer"
2019-06-14 07:33:29,056 : INFO : topic #3 (0.100): 0.028*"computer" + 0.025*"chromebook" + 0.016*"laptop" + 0.015*"great" + 0.012*"good" + 0.011*"use" + 0.011*"work" + 0.010*"chrome" + 0.009*"keyboard" + 0.009*"battery"
2019-06-14 07:33:29,058 : INFO : topic #0 (0.100): 0.027*"chromebook" + 0.017*"time" + 0.016*"computer" + 0.010*"laptop" + 0.008*"use" + 0.008*"great" + 0.008*"device" + 0.008*"machine" + 0.008*"little" + 0.007*"day"
2019-06-14 07:33:29,059 : INFO : topic #8 (0.100): 0.022*"chromebook" + 0.018*"use" + 0.014*"great" + 0.014*"laptop" + 0.012*"good" + 0.012*"price" + 0.010*"keyboard" + 0.010*"computer" + 0.008*"web" + 0.007*"screen"
2019-06-14 07:33:29,061 : INFO : topic diff=5.212306, rho=1.000000
2019-06-14 07:33:29,065 : DEBUG : bound: at document #0
2019-06-14 07:33:29,933 : INFO : -8.262 per-word bound, 307.1 perplexity estimate based on a held-out corpus of 291 documents with 9145 words
2019-06-14 07:33:29,935 : INFO : PROGRESS: pass 0, at document #1291/1291
2019-06-14 07:33:29,936 : DEBUG : performing inference on a chunk of 291 documents
2019-06-14 07:33:30,525 : DEBUG : 167/291 documents converged within 50 iterations
2019-06-14 07:33:30,527 : DEBUG : updating topics
2019-06-14 07:33:30,531 : INFO : merging changes from 291 documents into a model of 1291 documents
2019-06-14 07:33:30,536 : INFO : topic #7 (0.100): 0.018*"laptop" + 0.018*"thing" + 0.016*"great" + 0.015*"computer" + 0.013*"chromebook" + 0.011*"problem" + 0.011*"chrome" + 0.010*"time" + 0.008*"device" + 0.008*"use"
2019-06-14 07:33:30,538 : INFO : topic #4 (0.100): 0.023*"easy" + 0.015*"chromebook" + 0.014*"screen" + 0.013*"laptop" + 0.013*"computer" + 0.013*"use" + 0.010*"great" + 0.009*"size" + 0.007*"daughter" + 0.007*"work"
2019-06-14 07:33:30,540 : INFO : topic #3 (0.100): 0.029*"computer" + 0.023*"chromebook" + 0.019*"laptop" + 0.016*"great" + 0.013*"work" + 0.012*"good" + 0.012*"internet" + 0.011*"use" + 0.011*"chrome" + 0.010*"easy"
2019-06-14 07:33:30,541 : INFO : topic #8 (0.100): 0.023*"chromebook" + 0.016*"laptop" + 0.015*"use" + 0.014*"great" + 0.013*"printer" + 0.013*"good" + 0.011*"price" + 0.010*"computer" + 0.009*"keyboard" + 0.007*"video"
2019-06-14 07:33:30,542 : INFO : topic #9 (0.100): 0.020*"laptop" + 0.019*"use" + 0.018*"chromebook" + 0.016*"great" + 0.016*"computer" + 0.012*"light" + 0.012*"love" + 0.010*"work" + 0.010*"key" + 0.010*"easy"
2019-06-14 07:33:30,544 : INFO : topic diff=1.408125, rho=0.707107
2019-06-14 07:33:30,545 : INFO : PROGRESS: pass 1, at document #1000/1291
2019-06-14 07:33:30,546 : DEBUG : performing inference on a chunk of 1000 documents
2019-06-14 07:33:32,312 : DEBUG : 694/1000 documents converged within 50 iterations
2019-06-14 07:33:32,314 : DEBUG : updating topics
2019-06-14 07:33:32,318 : INFO : merging changes from 1000 documents into a model of 1291 documents
2019-06-14 07:33:32,322 : INFO : topic #0 (0.100): 0.028*"chromebook" + 0.020*"time" + 0.012*"computer" + 0.009*"device" + 0.009*"machine" + 0.009*"laptop" + 0.008*"little" + 0.008*"new" + 0.008*"screen" + 0.007*"great"
2019-06-14 07:33:32,324 : INFO : topic #3 (0.100): 0.031*"computer" + 0.024*"chromebook" + 0.019*"laptop" + 0.015*"great" + 0.012*"use" + 0.012*"good" + 0.011*"work" + 0.011*"chrome" + 0.011*"internet" + 0.010*"thing"
2019-06-14 07:33:32,326 : INFO : topic #2 (0.100): 0.034*"chromebook" + 0.015*"computer" + 0.014*"screen" + 0.013*"laptop" + 0.008*"app" + 0.008*"web" + 0.008*"use" + 0.008*"samsung" + 0.007*"time" + 0.007*"little"
2019-06-14 07:33:32,327 : INFO : topic #4 (0.100): 0.019*"easy" + 0.015*"chromebook" + 0.015*"screen" + 0.012*"laptop" + 0.011*"size" + 0.011*"use" + 0.011*"computer" + 0.009*"great" + 0.009*"light" + 0.009*"month"
2019-06-14 07:33:32,329 : INFO : topic #7 (0.100): 0.018*"thing" + 0.017*"laptop" + 0.014*"computer" + 0.014*"great" + 0.012*"chromebook" + 0.012*"chrome" + 0.011*"problem" + 0.010*"device" + 0.010*"time" + 0.010*"issue"
2019-06-14 07:33:32,330 : INFO : topic diff=0.850735, rho=0.551234
2019-06-14 07:33:32,334 : DEBUG : bound: at document #0
2019-06-14 07:33:32,947 : INFO : -7.668 per-word bound, 203.4 perplexity estimate based on a held-out corpus of 291 documents with 9145 words
2019-06-14 07:33:32,948 : INFO : PROGRESS: pass 1, at document #1291/1291
2019-06-14 07:33:32,950 : DEBUG : performing inference on a chunk of 291 documents
2019-06-14 07:33:33,315 : DEBUG : 262/291 documents converged within 50 iterations
2019-06-14 07:33:33,317 : DEBUG : updating topics
2019-06-14 07:33:33,321 : INFO : merging changes from 291 documents into a model of 1291 documents
2019-06-14 07:33:33,325 : INFO : topic #8 (0.100): 0.023*"chromebook" + 0.018*"printer" + 0.016*"laptop" + 0.015*"great" + 0.014*"use" + 0.014*"good" + 0.012*"price" + 0.009*"computer" + 0.009*"keyboard" + 0.009*"video"
2019-06-14 07:33:33,327 : INFO : topic #1 (0.100): 0.024*"chromebook" + 0.024*"samsung" + 0.014*"use" + 0.012*"product" + 0.011*"laptop" + 0.011*"battery" + 0.010*"acer" + 0.010*"work" + 0.009*"screen" + 0.009*"keyboard"
2019-06-14 07:33:33,329 : INFO : topic #0 (0.100): 0.028*"chromebook" + 0.021*"time" + 0.009*"new" + 0.009*"laptop" + 0.009*"computer" + 0.009*"machine" + 0.009*"device" + 0.008*"great" + 0.008*"little" + 0.008*"window"
2019-06-14 07:33:33,331 : INFO : topic #3 (0.100): 0.031*"computer" + 0.022*"chromebook" + 0.020*"laptop" + 0.015*"great" + 0.012*"work" + 0.012*"use" + 0.012*"internet" + 0.011*"good" + 0.011*"chrome" + 0.011*"thing"
2019-06-14 07:33:33,332 : INFO : topic #7 (0.100): 0.019*"thing" + 0.017*"laptop" + 0.016*"great" + 0.013*"power" + 0.012*"problem" + 0.012*"computer" + 0.012*"chrome" + 0.010*"chromebook" + 0.009*"time" + 0.008*"issue"
2019-06-14 07:33:33,333 : INFO : topic diff=0.732475, rho=0.551234
2019-06-14 07:33:33,335 : INFO : PROGRESS: pass 2, at document #1000/1291
2019-06-14 07:33:33,336 : DEBUG : performing inference on a chunk of 1000 documents
2019-06-14 07:33:34,974 : DEBUG : 803/1000 documents converged within 50 iterations
2019-06-14 07:33:34,976 : DEBUG : updating topics
2019-06-14 07:33:34,980 : INFO : merging changes from 1000 documents into a model of 1291 documents
2019-06-14 07:33:34,984 : INFO : topic #4 (0.100): 0.023*"easy" + 0.015*"screen" + 0.014*"size" + 0.012*"repair" + 0.010*"daughter" + 0.010*"month" + 0.010*"laptop" + 0.010*"use" + 0.010*"chromebook" + 0.009*"great"
2019-06-14 07:33:34,986 : INFO : topic #9 (0.100): 0.023*"use" + 0.022*"laptop" + 0.018*"great" + 0.015*"computer" + 0.015*"chromebook" + 0.015*"key" + 0.015*"light" + 0.014*"love" + 0.013*"easy" + 0.012*"keyboard"
2019-06-14 07:33:34,987 : INFO : topic #2 (0.100): 0.035*"chromebook" + 0.016*"screen" + 0.012*"computer" + 0.012*"laptop" + 0.008*"samsung" + 0.008*"problem" + 0.008*"app" + 0.008*"little" + 0.008*"machine" + 0.007*"video"
2019-06-14 07:33:34,989 : INFO : topic #0 (0.100): 0.027*"chromebook" + 0.021*"time" + 0.010*"machine" + 0.010*"new" + 0.009*"device" + 0.009*"computer" + 0.008*"little" + 0.008*"window" + 0.008*"laptop" + 0.008*"system"
2019-06-14 07:33:34,991 : INFO : topic #1 (0.100): 0.024*"chromebook" + 0.021*"samsung" + 0.013*"use" + 0.011*"device" + 0.011*"battery" + 0.010*"chrome" + 0.010*"product" + 0.010*"laptop" + 0.010*"keyboard" + 0.010*"screen"
2019-06-14 07:33:34,992 : INFO : topic diff=0.545615, rho=0.482748
2019-06-14 07:33:34,996 : DEBUG : bound: at document #0
2019-06-14 07:33:35,532 : INFO : -7.344 per-word bound, 162.5 perplexity estimate based on a held-out corpus of 291 documents with 9145 words
2019-06-14 07:33:35,534 : INFO : PROGRESS: pass 2, at document #1291/1291
2019-06-14 07:33:35,535 : DEBUG : performing inference on a chunk of 291 documents
2019-06-14 07:33:35,849 : DEBUG : 279/291 documents converged within 50 iterations

lda_model.print_topics()

output:
[(0,
'0.024*"chromebook" + 0.021*"time" + 0.013*"new" + 0.011*"machine" + 0.011*"window" + 0.010*"system" + 0.010*"processor" + 0.010*"month" + 0.009*"review" + 0.008*"thing"'),
(1,
'0.033*"chromebook" + 0.025*"samsung" + 0.017*"screen" + 0.014*"device" + 0.013*"keyboard" + 0.012*"use" + 0.012*"battery" + 0.011*"chrome" + 0.010*"product" + 0.010*"acer"'),
(2,
'0.034*"chromebook" + 0.023*"screen" + 0.017*"computer" + 0.010*"little" + 0.010*"samsung" + 0.009*"case" + 0.009*"problem" + 0.008*"new" + 0.008*"shell" + 0.007*"lot"'),
(3,
'0.029*"computer" + 0.022*"chromebook" + 0.021*"laptop" + 0.014*"great" + 0.013*"use" + 0.012*"thing" + 0.012*"web" + 0.011*"internet" + 0.011*"app" + 0.011*"chrome"'),
(4,
'0.056*"repair" + 0.018*"screen" + 0.018*"daughter" + 0.017*"warranty" + 0.014*"customer" + 0.009*"company" + 0.009*"month" + 0.008*"money" + 0.008*"easy" + 0.008*"service"'),
(5,
'0.021*"time" + 0.015*"file" + 0.014*"work" + 0.013*"chromebook" + 0.011*"laptop" + 0.009*"note" + 0.009*"drive" + 0.009*"great" + 0.007*"everything" + 0.007*"student"'),
(6,
'0.015*"system" + 0.015*"real" + 0.012*"verizon" + 0.007*"glitch" + 0.007*"operating" + 0.006*"notebook" + 0.005*"datum" + 0.005*"part" + 0.005*"net" + 0.005*"straight"'),
(7,
'0.022*"power" + 0.020*"problem" + 0.019*"network" + 0.015*"supply" + 0.013*"laptop" + 0.013*"month" + 0.012*"thing" + 0.012*"issue" + 0.011*"support" + 0.011*"chrome"'),
(8,
'0.029*"printer" + 0.027*"chromebook" + 0.018*"video" + 0.014*"cloud" + 0.013*"amazon" + 0.012*"good" + 0.012*"skype" + 0.012*"use" + 0.010*"laptop" + 0.010*"work"'),
(9,
'0.035*"great" + 0.030*"laptop" + 0.030*"easy" + 0.027*"light" + 0.025*"use" + 0.020*"love" + 0.016*"good" + 0.015*"screen" + 0.015*"key" + 0.014*"small"')]

15.from gensim.test.utils import common_corpus, common_dictionary
from gensim.models.wrappers import DtmModel
path_to_dtm_binary = "C:/Users/qsu2/DTM/dtm-win64.exe"

16 . model = DtmModel(
path_to_dtm_binary, corpus=doc_term_matrix, id2word=dictionary,
time_slices= [1] * len(doc_term_matrix)
)

output:
OSError Traceback (most recent call last)
in ()
1 model = DtmModel(
2 path_to_dtm_binary, corpus=doc_term_matrix, id2word=dictionary,
----> 3 time_slices= [1] * len(doc_term_matrix)
4 )

~\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\wrappers\dtmmodel.py in init(self, dtm_path, corpus, time_slices, mode, model, num_topics, id2word, prefix, lda_sequence_min_iter, lda_sequence_max_iter, lda_max_em_iter, alpha, top_chain_var, rng_seed, initialize_lda)
162
163 if corpus is not None:
--> 164 self.train(corpus, time_slices, mode, model)
165
166 def fout_liklihoods(self):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\wrappers\dtmmodel.py in train(self, corpus, time_slices, mode, model)
367 check_output(args=cmd, stderr=PIPE)
368
--> 369 self.em_steps = np.loadtxt(self.fem_steps())
370 self.init_ss = np.loadtxt(self.flda_ss())
371

~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows)
960 fname = os_fspath(fname)
961 if _is_string_like(fname):
--> 962 fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
963 fencoding = getattr(fh, 'encoding', 'latin1')
964 fh = iter(fh)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib_datasource.py in open(path, mode, destpath, encoding, newline)
264
265 ds = DataSource(destpath)
--> 266 return ds.open(path, mode, encoding=encoding, newline=newline)
267
268

~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib_datasource.py in open(self, path, mode, encoding, newline)
622 encoding=encoding, newline=newline)
623 else:
--> 624 raise IOError("%s not found." % path)
625
626

OSError: C:\Users\qsu2\AppData\Local\Temp\f909e7_train_out/em_log.dat not found.

if change 16 as:
model = DtmModel(
path_to_dtm_binary, corpus=doc_term_matrix, id2word=dictionary,
time_slices= [300, 300, 300, 391] ### total have 1291 reviews
)
topics = model.show_topic(topicid=1, time=1, num_words=10)
topics

output:
[(0.4989328070379685, 'hour'),
(0.36703322592000454, 'want'),
(2.806406345101065e-05, 'sharper'),
(2.806406345101065e-05, 'tekkie'),
(2.806406345101065e-05, 'suitcase'),
(2.806406345101065e-05, 'stunning'),
(2.806406345101065e-05, 'sticking'),
(2.806406345101065e-05, 'staying'),
(2.806406345101065e-05, 'stapler'),
(2.806406345101065e-05, 'thingy')]

it seems working after take long time.

But need help on how to see topic change over time(like day, month and year)
thanks.

Versions
python 3.6.6 Win10

Out
Please provide the output of:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.version)
import scipy; print("SciPy", scipy.version)
import gensim; print("gensim", gensim.version)
from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)

mpenkov · 2019-06-21T04:39:41Z

Please edit your comment and fix the markdown formatting. It's a bit difficult to see what you're trying to do because the formatting is so messed up.

mpenkov · 2019-09-28T22:54:08Z

Closing due to inactivity.

mpenkov added the need info Not enough information for reproduce an issue, need more info from author label Jun 21, 2019

mpenkov closed this as completed Sep 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test the topic changing over time with CSV format #2527

test the topic changing over time with CSV format #2527

maplejia commented Jun 14, 2019

maplejia commented Jun 14, 2019

function to remove stopwords

mpenkov commented Jun 21, 2019

mpenkov commented Sep 28, 2019

test the topic changing over time with CSV format #2527

test the topic changing over time with CSV format #2527

Comments

maplejia commented Jun 14, 2019

Problem description

Steps/code/corpus to reproduce

Versions

maplejia commented Jun 14, 2019

libraries for visualization

function to remove stopwords

remove short words (length < 3)

remove stopwords from the text

make entire text lowercase

Creating the object for LDA model using gensim library

Build LDA model

lda_model = LDA(corpus=doc_term_matrix, id2word=dictionary, num_topics=7, random_state=100,

mpenkov commented Jun 21, 2019

mpenkov commented Sep 28, 2019