[WIP] LDA tutorial, tips and tricks #779

olavurmortensen · 2016-07-08T13:33:23Z

@tmylk @piskvorky A tutorial on LDA sharing some of my experience, as requested.

@tmylk I'm sure you have some comments on it. Thought it would be easiest with a PR. It's still a work in progress, as reflected by the "TODO" list in the start of the tutorial.

Not exactly sure what you would like the tutorial to be, but I tried to explain what the goal of it was in the introduction.

tmylk · 2016-08-09T16:29:12Z

Could you link to it in tutorials.md file as well?

olavurmortensen · 2016-08-09T16:38:25Z

Will do @tmylk. Does that mean you think its ok as it is? In that case I'll just clean it up one of these days to prepare it for merging.

tmylk · 2016-08-14T19:11:37Z

@olavurmortensen When do you think this would be finished?

olavurmortensen · 2016-08-14T20:39:40Z

@tmylk Well, I thought you would have some comments. If you do not, then I think I can finish it tomorrow.

piskvorky · 2016-08-16T06:27:00Z

docs/notebooks/LDA_tutorial.ipynb

+    "\n",
+    "> **Note:**\n",
+    ">\n",
+    "> This tutorial uses the scikit-learn and nltk libraries, although you can replace them with others if you want. Python 3 is used, although Python 2.7 can be used as well.\n",


Where is sklearn used? I cannot find it.

cscorley · 2016-08-16T13:40:10Z

Briefly looked through it this morning, all I can add right now would be an explanation as to why 5 models trained with exactly the same input would have different output, e.g., perplexity, and perhaps an aside on how to achieve 1:1 training with the random state parameter. It seems like a common question.

olavurmortensen · 2016-08-18T09:56:57Z

@cscorley The 5 models are different because of random initialization (specifically, random initialization of some hyperparameters, e.g. gamma). But you bring up an important point that this should be explained in the tutorial, and maybe even set the random state just make it more explicit.

olavurmortensen · 2016-08-18T12:30:11Z

I have updated the tutorial according to the comments, thanks @piskvorky and @cscorley. Also added a link in tutorials.md.

I also changed the name because, bizarrely, someone posted a tutorial with the same name I was using just a week ago.

@tmylk and @piskvorky Will the tutorial appear on the RaRe blog?

olavurmortensen · 2016-09-01T11:04:33Z

@tmylk @piskvorky Ready for merging. The conflict is because I changed the tutorials.md file, I think.

When it's merged I'll submit a blog post on WordPress for review as well.

tmylk

The title should be changed to 'Pre-processing and training LDA' The value of this tutorial is in explaining pre-processing steps and the meaning of LDA parameters. The model selection is not really covered here as deeply as in Topic Coherence or 'America's Next Topic model' blog post.

tmylk · 2016-10-04T15:09:02Z

docs/notebooks/lda_training_tips.ipynb

+   "source": [
+    "# LDA: training tips\n",
+    "\n",
+    "LDA is a probabilistic hierarchical Bayesian model that is a mixture model as well as a mixed membership model... but we won't be getting into any of that.\n",


The first sentence should say what this tutorial is. Here it is done in the Nth sentence - please move it to the very first line. It is ok to discuss what this tutorial is not, but it should be later

True. The first sentence is also a tad snarky. I just removed the first sentence, does anything else need to be changed in that regard?

"In this tutorial I will show how to pre-process text and train LDA on it"

Better now?

tmylk · 2016-10-04T15:11:36Z

docs/notebooks/lda_training_tips.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We select the model with the lowest perplexity."


why do you do that? you talk about topic coherence later so it is confusing

Yes, I agree this is not a good way of selecting a model at all.

I removed the sections about model selection.

tmylk · 2016-10-04T15:12:00Z

docs/notebooks/lda_training_tips.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "[pyLDAvis](https://pyldavis.readthedocs.io/en/latest/index.html) can be fun and useful. Include the code below in your notebook to visualize your topics with pyLDAvis.\n",


Please add actual pyLDAVis output to the notebook

Rendering pyLDAvis output in the notebook completely messes up the scale of the notebook, so I'd rather not include it.

either include the actual picture or remove the code and link to a pyLDAvis tutorial. Only code serves no purpose

I removed the text about it. Come to think of it, since it's mentioned in other RaRe blogs, there isn't much need for it in this one.

…ntence, removed model selection based on perplexity.

… stuff about pyLDAvis.

Added LDA tutorial.

5c00027

tmylk changed the title ~~LDA tutorial, tips and tricks~~ [WIP] LDA tutorial, tips and tricks Aug 14, 2016

piskvorky reviewed Aug 16, 2016
View reviewed changes

olavurmortensen added 2 commits August 18, 2016 13:22

Updated tutorial according to some comments. Added to tutorials.md

37b095f

Changed filename of tutorial.

d359d8a

olavurmortensen added 2 commits September 1, 2016 12:54

Minor changes.

1aadcd7

Added a link.

cac018d

tmylk suggested changes Oct 4, 2016

View reviewed changes

olavurmortensen added 4 commits October 5, 2016 16:33

Changes according to tmylks comments. Changed title, removed first se…

54f4cba

…ntence, removed model selection based on perplexity.

Changes according to tmylks comments. Changed first sentence, removed…

c8c1e08

… stuff about pyLDAvis.

Fixed merge conflict.

bd71d39

More merge conflict fix.

50c2ebf

tmylk merged commit 951eebf into piskvorky:develop Nov 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] LDA tutorial, tips and tricks #779

[WIP] LDA tutorial, tips and tricks #779

olavurmortensen commented Jul 8, 2016

tmylk commented Aug 9, 2016

olavurmortensen commented Aug 9, 2016

tmylk commented Aug 14, 2016

olavurmortensen commented Aug 14, 2016

piskvorky Aug 16, 2016

cscorley commented Aug 16, 2016 •

edited

Loading

olavurmortensen commented Aug 18, 2016

olavurmortensen commented Aug 18, 2016

olavurmortensen commented Sep 1, 2016

tmylk left a comment

tmylk Oct 4, 2016

olavurmortensen Oct 5, 2016

tmylk Oct 6, 2016

olavurmortensen Oct 6, 2016

tmylk Oct 4, 2016

olavurmortensen Oct 4, 2016

olavurmortensen Oct 5, 2016

tmylk Oct 6, 2016

tmylk Oct 4, 2016

olavurmortensen Oct 5, 2016

tmylk Oct 6, 2016

olavurmortensen Oct 6, 2016

[WIP] LDA tutorial, tips and tricks #779

[WIP] LDA tutorial, tips and tricks #779

Conversation

olavurmortensen commented Jul 8, 2016

tmylk commented Aug 9, 2016

olavurmortensen commented Aug 9, 2016

tmylk commented Aug 14, 2016

olavurmortensen commented Aug 14, 2016

Choose a reason for hiding this comment

cscorley commented Aug 16, 2016 • edited Loading

olavurmortensen commented Aug 18, 2016

olavurmortensen commented Aug 18, 2016

olavurmortensen commented Sep 1, 2016

tmylk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cscorley commented Aug 16, 2016 •

edited

Loading