-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test the topic changing over time with CSV format #2527
Comments
Problem description I am trying to replicate the page https://radimrehurek.com/gensim/models/wrappers/dtmmodel.html to find the topic changing over time, but fail to create time_slices. Steps/code/corpus to reproduce
output:
import gensim libraries for visualizationimport pyLDAvis output: 2019-06-14 06:14:01,351 : DEBUG : backend module://ipykernel.pylab.backend_inline version unknown
def remove_stopwords(rev): remove short words (length < 3)df['Content'] = df['Content'].apply(lambda x: ' '.join([w for w in x.split() if len(w)>2])) remove stopwords from the textreviews = [remove_stopwords(r.split()) for r in df['Content']] make entire text lowercasereviews = [r.lower() for r in reviews] Include full tracebacks, logs and datasets if necessary. Please keep the examples minimal ("minimal reproducible example").
def lemmatization(texts, tags=['NOUN', 'ADJ']): # filter noun and adjective output: ['bought', 'samsung', 'reviewed', 'happy', 'new', 'version', 'released', 'resist', 'buying', 'married', 'geek', 'never', 'many', 'gadgets', 'around', 'house', 'here', 'love', 'laptop', 'incredibly', 'thin', 'light', 'however', 'use', 'feels', 'big', 'keyboard', 'full', 'size', 'one', 'keyboard', 'great', 'better', 'trackpad', 'huge', 'laptop', 'size', 'improvement', 'previous', 'version', 'trackpad', 'surface', 'smooth', 'almost', 'feels', 'like', 'glass', 'makes', 'scrolling', 'breeze', 'overall', 'hardware', 'great', 'looks', 'really', 'good', 'especially', 'price', 'tag', 'expecting', 'surprises', 'software', 'since', 'old', 'latest', 'features', 'thanks', 'automatic', 'updates', 'however', 'pleasantly', 'surprised', 'speed', 'new', 'boots', 'quickly', 'pages', 'load', 'really', 'fast', 'great', 'use', 'there', 'also', 'new', 'add', 'ons', 'improvement', 'well', 'apps', 'create', 'google', 'doc', 'slide', 'spreadsheet', 'one', 'click', 'well', 'new', 'camera', 'app', 'lot', 'fun', 'hope', 'google', 'releases', 'old', 'all', 'happy', 'laptop', 'highly', 'recommend', 'great', 'features', 'usability', 'amazing', 'price', 'great', 'christmas', 'gift', 'idea']
output: ['samsung', 'happy', 'new', 'version', 'resist', 'married', 'geek', 'many', 'gadget', 'house', 'laptop', 'thin', 'light', 'big', 'keyboard', 'full', 'size', 'keyboard', 'well', 'trackpad', 'huge', 'laptop', 'size', 'improvement', 'previous', 'version', 'trackpad', 'surface', 'smooth', 'glass', 'overall', 'hardware', 'great', 'good', 'price', 'tag', 'surprise', 'software', 'old', 'late', 'feature', 'thank', 'automatic', 'update', 'surprised', 'speed', 'new', 'boot', 'load', 'great', 'use', 'new', 'on', 'improvement', 'well', 'app', 'doc', 'slide', 'spreadsheet', 'click', 'new', 'camera', 'app', 'lot', 'fun', 'hope', 'release', 'old', 'happy', 'laptop', 'great', 'feature', 'usability', 'amazing', 'price', 'great', 'christmas', 'gift', 'idea']
output: 2019-06-14 06:15:45,260 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
13.LDA = gensim.models.ldamodel.LdaModel corpus=doc_term_matrix Creating the object for LDA model using gensim libraryLDA = gensim.models.ldamodel.LdaModel Build LDA modellda_model = LDA(corpus=doc_term_matrix, id2word=dictionary, num_topics=7, random_state=100,lda_model = LDA(corpus=doc_term_matrix, id2word=dictionary, num_topics=10, random_state=100,
output: 16 . model = DtmModel( output: ~\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\wrappers\dtmmodel.py in init(self, dtm_path, corpus, time_slices, mode, model, num_topics, id2word, prefix, lda_sequence_min_iter, lda_sequence_max_iter, lda_max_em_iter, alpha, top_chain_var, rng_seed, initialize_lda) ~\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\wrappers\dtmmodel.py in train(self, corpus, time_slices, mode, model) ~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows) ~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib_datasource.py in open(path, mode, destpath, encoding, newline) ~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib_datasource.py in open(self, path, mode, encoding, newline) OSError: C:\Users\qsu2\AppData\Local\Temp\f909e7_train_out/em_log.dat not found. if change 16 as: output: But need help on how to see topic change over time(like day, month and year) Versions Out import platform; print(platform.platform()) |
Please edit your comment and fix the markdown formatting. It's a bit difficult to see what you're trying to do because the formatting is so messed up. |
Closing due to inactivity. |
I am trying to implement gensim.models.wrappers import DtmModel to test the topic changing over time.
My testing file is Amazon review file with CSV format, which include reviews, ratings, date and title.
I am trying to replicate the page https://radimrehurek.com/gensim/models/wrappers/dtmmodel.html to find the topic changing over time, but fail to create time_slices. Is any one can help me? thank you very much.
Problem description
What are you trying to achieve? What is the expected result? What are you seeing instead?
Steps/code/corpus to reproduce
Include full tracebacks, logs and datasets if necessary. Please keep the examples minimal ("minimal reproducible example").
Versions
Please provide the output of:
The text was updated successfully, but these errors were encountered: