-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document LDA-related models #2026
Conversation
@@ -18,63 +18,84 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing module docstring with an example of callback usage
gensim/models/callbacks.py
Outdated
|
||
Parameters | ||
---------- | ||
**parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type missing (for all kwargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do this on purpose because the type can be anything in this function.
viz_env : object, optional | ||
Visdom environment to use for plotting the graph. Unused. | ||
title : str, optional | ||
Title of the graph plot in case `logger == 'visdom'`. Unused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing empty line at the end of docstring (here and everywhere)
gensim/models/callbacks.py
Outdated
""" | ||
Args: | ||
model : Trained topic model | ||
"""""Get the coherence score. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""""
- too much, should be 3
gensim/models/callbacks.py
Outdated
@@ -354,7 +477,9 @@ class CallbackAny2Vec(object): | |||
... model.save(output_path) | |||
... self.epoch += 1 | |||
... | |||
>>> | |||
|
|||
#. Create a callback to print logging information to the console: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better move this part to module docstring.
gensim/models/ldamodel.py
Outdated
@@ -82,53 +98,70 @@ def update_dir_prior(prior, N, logphat, rho): | |||
|
|||
|
|||
class LdaState(utils.SaveLoad): | |||
""" | |||
Encapsulate information for distributed computation of LdaModel objects. | |||
"""Encapsulate information for distributed computation of LdaModel objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed :class:...LdaModel
gensim/models/ldamodel.py
Outdated
|
||
Objects of this class are sent over the network, so try to keep them lean to | ||
reduce traffic. | ||
|
||
""" | ||
|
||
def __init__(self, eta, shape, dtype=np.float32): | ||
"""Initialize the state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to write "Initialize the state." always, this is obvious (here and everywhere)
gensim/models/ldamodel.py
Outdated
|
||
>>> doc_lda = lda[doc_bow] | ||
|
||
The model can be updated (trained) with new documents via | ||
The model can be updated (trained) with new documents. | ||
|
||
>>> lda.update(other_corpus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other_corpus
not defined
gensim/models/ldamodel.py
Outdated
---------- | ||
prior : {str, list of float, np.ndarray of float, float} | ||
A-priori belief on word probability. If `name` == 'eta' then the prior can be: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check please that this rendered exactly as you expect
…error when binded to the model so I removed it
gensim/models/callbacks.py
Outdated
|
||
Examples | ||
-------- | ||
To implement a Callback, inherit from this base class and override one or more of its methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add example with LDA + perplexity
gensim/models/callbacks.py
Outdated
""" | ||
for parameter, value in parameters.items(): | ||
setattr(self, parameter, value) | ||
|
||
def get_value(self): | ||
"""TODO: According to my understanding this method MUST be override. I propose to make ti abstract, or at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warnings
--------
you must override this method
gensim/models/callbacks.py
Outdated
""" | ||
Metric class for coherence evaluation | ||
""" | ||
"""Metric class for coherence evaluation.s """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also -> CoherenceModel
gensim/models/ldamodel.py
Outdated
@@ -50,6 +50,7 @@ | |||
|
|||
logger = logging.getLogger('gensim.models.ldamodel') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update module docstring (+ example, update current description)
gensim/models/ldamodel.py
Outdated
The number of documents is strected in both state objects, so that they are of comparable magnitude. | ||
This procedure corresponds to the stochastic gradient update from | ||
`Hoffman et al. :"Online Learning for Latent Dirichlet Allocation" | ||
<https://www.di.ens.fr/~fbach/mdhnips2010.pdf>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
algorithm number (eq?)
gensim/models/ldamodel.py
Outdated
Examples | ||
-------- | ||
|
||
#. Train an LDA model using a gensim corpus: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update example of class docstring, remove it from here
@@ -62,21 +62,37 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update module docstring (here and everywhere)
------- | ||
list of float | ||
Probabilities for each topic in the mixture. This is essentially a point in the `num_topics - 1` simplex. | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update other classes please
gensim/models/ldaseqmodel.py
Outdated
|
||
#. Persist model to disk: | ||
|
||
>>> ldaseq.save("ldaseq") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add also "how to get an vectors" to example, this is important always (for all examples)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean how to get the vector representation of a single document right? I will add an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to resolve PEP8 issues (see TravisCI) and documentation build issues (see CircleCI)
gensim/models/ldamodel.py
Outdated
... ['survey', 'response', 'eps'], | ||
... ['human', 'system', 'computer'] | ||
... ] | ||
>>> other_dictionary = Dictionary(other_texts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a serious mistake, you should use only one dictionary for vectorization: here you should re-use dictionary that was used to create common_corpus
or (better) create it yourself from start, convert common_texts
, etc.
Same mistake in other files.
In this PR I will add documentation for the LDA model as well as all other related modules such as:
ldamodel.py
ldaseqmodel.py
ldamulticore.py
callbacks.py