-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix docstrings forgensim.models.hdpmodel
, gensim.models.lda_worker
& gensim.models.lda_dispatcher
(#1667)
#1912
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start! Please fix my comments + made similar changes for lda_dispatcher.py
too
gensim/models/lda_worker.py
Outdated
on every node in your cluster. If you wish, you may even run it multiple times \ | ||
on a single machine, to make better use of multiple cores (just beware that \ | ||
memory footprint increases accordingly). | ||
"""Worker ("slave") process used in computing distributed LDA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, please fix PEP8 problems (almost lead spaces), look at travis log https://travis-ci.org/RaRe-Technologies/gensim/jobs/342495787#L511
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Also, should I add a section for module level attributes such as HUGE_TIMEOUT ,MAX_JOBS_QUEUE,etc in lda_dispatcher.py ?
gensim/models/lda_worker.py
Outdated
|
||
Run this script on every node in your cluster. If you wish, you may even | ||
run it multiple times on a single machine, to make better use of multiple | ||
cores (just beware that memory footprint increases accordingly). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please look at #1892, this is really good way how to document distributed stuff (instruction of running, showing arguments of script in automatic way, etc)
gensim/models/lda_worker.py
Outdated
|
||
Attributes | ||
---------- | ||
model : :obj: of :class:`~gensim.models.ldamodel.LdaModel` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to write :obj:
(here and everywhere)
@gyanesh-m documentation build failed, please have a look https://circleci.com/gh/RaRe-Technologies/gensim/399?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link, you also can build documentation locally with |
@menshikh-iv Is there a need to mention module level attributes such as HUGE_TIMEOUT ,MAX_JOBS_QUEUE,etc in docstrings of lda_dispatcher.py and lda_worker ? I didn't do it as I couldn't find it in already documented files. |
gensim.models.lda_worker
(#1667)gensim.models.lda_worker
& gensim.models.lda_dispatcher
(#1667)
@menshikh-iv Also if hdpmodel.py is not taken, I would like to add documentation for it . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
gensim/models/lda_worker.py
Outdated
default=True, const=False | ||
) | ||
parser.add_argument("--hmac", help="Nameserver hmac key (default: %(default)s)", default=None) | ||
"--no-broadcast", help="Disable broadcast \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why reformatting? we using 120 characters
limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I ran flake8 and it was giving error for lines above 79 chars. Anyways ,I will change it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should use our flake config: tox -e flake8
gensim/models/lda_worker.py
Outdated
@@ -141,7 +260,8 @@ def main(): | |||
"port": args.port, | |||
"hmac_key": args.hmac | |||
} | |||
utils.pyro_daemon(LDA_WORKER_PREFIX, Worker(), random_suffix=True, ns_conf=ns_conf) | |||
utils.pyro_daemon(LDA_WORKER_PREFIX, Worker(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no vertical indents (only hanging), here and everywhere.
@gyanesh-m OK, please ping me when you finished with HDP (and don't forget to fix my comments). |
gensim.models.lda_worker
& gensim.models.lda_dispatcher
(#1667)gensim.models.hdpworker
, gensim.models.lda_worker
& gensim.models.lda_dispatcher
(#1667)
@menshikh-iv Hi, I am done with hdpmodel.py. Please review it. |
gensim.models.hdpworker
, gensim.models.lda_worker
& gensim.models.lda_dispatcher
(#1667)gensim.models.hdpmodel
, gensim.models.lda_worker
& gensim.models.lda_dispatcher
(#1667)
@menshikh-iv Hi, this is a reminder, please review the hdpmodel.py soon. |
@gyanesh-m don't worry, I remember, but you will have to wait, sorry. |
@menshikh-iv Ok, np. So is it fine if I start solving another issue ? |
@gyanesh-m yeah, helps guys with #1901, this is not really hard, but critical now, |
@gyanesh-m I fixed all distributed stuff, please fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @gyanesh-m, please look at my comments & changes and fix suggested comment for hdpmodel.py
.
gensim/models/hdpmodel.py
Outdated
kappa : float, optional | ||
Learning rate | ||
tau : float, optional | ||
Slow down parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean, can you describe it in more details? If something isn't clear - this is a bad description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this fine -
kappa: float,optional
Learning parameter which acts as exponential decay factor to influence extent of learning from each batch.
tau: float, optional
Learning parameter which down-weights early iterations of documents.```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gyanesh-m sounds better than current description 👍
gensim/models/hdpmodel.py
Outdated
|
||
Parameters | ||
---------- | ||
bow : sequence of list of tuple of ints; [ (int,int) ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterable of list of (int, float)
here and everywhere for Corpus in BoW format
gensim/models/hdpmodel.py
Outdated
|
||
Returns | ||
------- | ||
topic distribution for the given document `bow`, as a list of `(topic_id, topic_probability)` 2-tuples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing type, should be list of (int, float)
gensim/models/hdpmodel.py
Outdated
Returns | ||
------- | ||
numpy.ndarray | ||
Gamma value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's is Gamma in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first level concentration. It is mentioned under the parameters section. Do I need to mention it here again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gyanesh-m I think yes
gensim/models/hdpmodel.py
Outdated
single document. | ||
outputdir : str, optional | ||
Stores topic and options information in the specified directory. | ||
random_state : :class:`~np.random.RandomState`, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure about type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the parameter's type is {None, int, array_like}
but the attribute type is the one I mentioned.I got it from here. Should I go with the parameter's type ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can mention all of this (mentioned 3 + current)
gensim/models/hdpmodel.py
Outdated
topn : int, optional | ||
Number of most probable words to show from given `topic_id`. | ||
log : bool, optional | ||
Logs a message with level INFO on the logger object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If True ...
gensim/models/hdpmodel.py
Outdated
Returns: | ||
np.ndarray: `num_topics` x `vocabulary_size` array of floats which represents | ||
the term topic matrix learned during inference. | ||
"""Returns the term topic matrix learned during inference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use Get
instead of Return
in first line
gensim/models/hdpmodel.py
Outdated
"""legacy method; use `self.save()` instead""" | ||
"""Saves all the topics discovered. | ||
|
||
.. note:: This is a legacy method; use `self.save()` instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In numpy-style
, this should look like
Notes
-----
.....
here and everywhere
gensim/models/hdpmodel.py
Outdated
@@ -571,9 +850,34 @@ def evaluate_test_corpus(self, corpus): | |||
|
|||
|
|||
class HdpTopicFormatter(object): | |||
"""Helper class to format the output of topics and most probable words for display.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helper for what class (missed reference)
gensim/models/hdpmodel.py
Outdated
return self.show_topics(num_topics, num_words, True) | ||
|
||
def show_topics(self, num_topics=10, num_words=10, log=False, formatted=True): | ||
"""Gives the most probable `num_words` words from `num_topics` topics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give
, Print
instead of Gives
, Prints
in the first line of docstring (here and everywhere).
@gyanesh-m when you plan to finish this? I can already merge distributed stuff, I also see that you need to make a lot of work with HDP model. We have to variants
What do you think? |
@menshikh-iv Thanks for the minor fixes. I think I will be able to fix the hdpmodel.py completely in around 3 hours. I will get started with it right away. |
@gyanesh-m 3 hours with the general description, how the model works? Wow, sounds fantastic, good luck! |
Hey @gyanesh-m, how is going? |
@menshikh-iv Hi, currently I am on page 3. I was having some trouble in understanding it so I thought of going through the basics first. Currently, I have gone through the following tutorials as of now
|
@gyanesh-m nice work 🥇 I need to clean up & merge this, thanks for your work! |
@menshikh-iv You're welcome! Happy to help. Also, thank you for your support and guidance too . |
This PR fixes the docstrings for lda_worker.py in accordance with numpy-style. There are still some files which need to be fixed and that will be done in later PRs.
(Fixes #1667 )