Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix docstrings forgensim.models.hdpmodel, gensim.models.lda_worker & gensim.models.lda_dispatcher(#1667) #1912

Merged
merged 17 commits into from
Apr 2, 2018

Conversation

gyanesh-m
Copy link
Contributor

This PR fixes the docstrings for lda_worker.py in accordance with numpy-style. There are still some files which need to be fixed and that will be done in later PRs.

(Fixes #1667 )

Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start! Please fix my comments + made similar changes for lda_dispatcher.py too

on every node in your cluster. If you wish, you may even run it multiple times \
on a single machine, to make better use of multiple cores (just beware that \
memory footprint increases accordingly).
"""Worker ("slave") process used in computing distributed LDA.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, please fix PEP8 problems (almost lead spaces), look at travis log https://travis-ci.org/RaRe-Technologies/gensim/jobs/342495787#L511

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Also, should I add a section for module level attributes such as HUGE_TIMEOUT ,MAX_JOBS_QUEUE,etc in lda_dispatcher.py ?


Run this script on every node in your cluster. If you wish, you may even
run it multiple times on a single machine, to make better use of multiple
cores (just beware that memory footprint increases accordingly).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at #1892, this is really good way how to document distributed stuff (instruction of running, showing arguments of script in automatic way, etc)


Attributes
----------
model : :obj: of :class:`~gensim.models.ldamodel.LdaModel`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to write :obj: (here and everywhere)

@menshikh-iv
Copy link
Contributor

@gyanesh-m documentation build failed, please have a look https://circleci.com/gh/RaRe-Technologies/gensim/399?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link, you also can build documentation locally with tox -e docs for reproducing the error.

@gyanesh-m
Copy link
Contributor Author

@menshikh-iv Is there a need to mention module level attributes such as HUGE_TIMEOUT ,MAX_JOBS_QUEUE,etc in docstrings of lda_dispatcher.py and lda_worker ? I didn't do it as I couldn't find it in already documented files.

@gyanesh-m gyanesh-m changed the title Fix docstrings for gensim.models.lda_worker (#1667) Fix docstrings for gensim.models.lda_worker & gensim.models.lda_dispatcher(#1667) Feb 20, 2018
@gyanesh-m
Copy link
Contributor Author

gyanesh-m commented Feb 21, 2018

@menshikh-iv Also if hdpmodel.py is not taken, I would like to add documentation for it .

Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

default=True, const=False
)
parser.add_argument("--hmac", help="Nameserver hmac key (default: %(default)s)", default=None)
"--no-broadcast", help="Disable broadcast \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why reformatting? we using 120 characters limit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I ran flake8 and it was giving error for lines above 79 chars. Anyways ,I will change it then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should use our flake config: tox -e flake8

@@ -141,7 +260,8 @@ def main():
"port": args.port,
"hmac_key": args.hmac
}
utils.pyro_daemon(LDA_WORKER_PREFIX, Worker(), random_suffix=True, ns_conf=ns_conf)
utils.pyro_daemon(LDA_WORKER_PREFIX, Worker(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no vertical indents (only hanging), here and everywhere.

@menshikh-iv
Copy link
Contributor

@gyanesh-m OK, please ping me when you finished with HDP (and don't forget to fix my comments).
After - I'll cleanup PR and merge (and we'll continue the process in distinct PR).

@gyanesh-m gyanesh-m changed the title Fix docstrings for gensim.models.lda_worker & gensim.models.lda_dispatcher(#1667) Fix docstrings forgensim.models.hdpworker, gensim.models.lda_worker & gensim.models.lda_dispatcher(#1667) Feb 23, 2018
@gyanesh-m
Copy link
Contributor Author

@menshikh-iv Hi, I am done with hdpmodel.py. Please review it.

@gyanesh-m gyanesh-m changed the title Fix docstrings forgensim.models.hdpworker, gensim.models.lda_worker & gensim.models.lda_dispatcher(#1667) Fix docstrings forgensim.models.hdpmodel, gensim.models.lda_worker & gensim.models.lda_dispatcher(#1667) Feb 25, 2018
@gyanesh-m
Copy link
Contributor Author

@menshikh-iv Hi, this is a reminder, please review the hdpmodel.py soon.

@menshikh-iv
Copy link
Contributor

@gyanesh-m don't worry, I remember, but you will have to wait, sorry.

@gyanesh-m
Copy link
Contributor Author

@menshikh-iv Ok, np. So is it fine if I start solving another issue ?

@menshikh-iv
Copy link
Contributor

@gyanesh-m yeah, helps guys with #1901, this is not really hard, but critical now,

@menshikh-iv menshikh-iv added the RFM label Mar 5, 2018
@menshikh-iv
Copy link
Contributor

@gyanesh-m I fixed all distributed stuff, please fix hdpmodel.py too

@menshikh-iv menshikh-iv removed the RFM label Mar 12, 2018
Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @gyanesh-m, please look at my comments & changes and fix suggested comment for hdpmodel.py.

kappa : float, optional
Learning rate
tau : float, optional
Slow down parameter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean, can you describe it in more details? If something isn't clear - this is a bad description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this fine -

kappa: float,optional
 Learning parameter which acts as exponential decay factor to influence extent of learning from each batch.
tau: float, optional
  Learning parameter which down-weights early iterations of documents.```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gyanesh-m sounds better than current description 👍


Parameters
----------
bow : sequence of list of tuple of ints; [ (int,int) ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterable of list of (int, float) here and everywhere for Corpus in BoW format


Returns
-------
topic distribution for the given document `bow`, as a list of `(topic_id, topic_probability)` 2-tuples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing type, should be list of (int, float)

Returns
-------
numpy.ndarray
Gamma value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's is Gamma in this case?

Copy link
Contributor Author

@gyanesh-m gyanesh-m Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first level concentration. It is mentioned under the parameters section. Do I need to mention it here again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gyanesh-m I think yes

single document.
outputdir : str, optional
Stores topic and options information in the specified directory.
random_state : :class:`~np.random.RandomState`, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure about type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the parameter's type is {None, int, array_like} but the attribute type is the one I mentioned.I got it from here. Should I go with the parameter's type ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can mention all of this (mentioned 3 + current)

topn : int, optional
Number of most probable words to show from given `topic_id`.
log : bool, optional
Logs a message with level INFO on the logger object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If True ...

Returns:
np.ndarray: `num_topics` x `vocabulary_size` array of floats which represents
the term topic matrix learned during inference.
"""Returns the term topic matrix learned during inference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use Get instead of Return in first line

"""legacy method; use `self.save()` instead"""
"""Saves all the topics discovered.

.. note:: This is a legacy method; use `self.save()` instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In numpy-style, this should look like

Notes
-----
.....

here and everywhere

@@ -571,9 +850,34 @@ def evaluate_test_corpus(self, corpus):


class HdpTopicFormatter(object):
"""Helper class to format the output of topics and most probable words for display."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper for what class (missed reference)

return self.show_topics(num_topics, num_words, True)

def show_topics(self, num_topics=10, num_words=10, log=False, formatted=True):
"""Gives the most probable `num_words` words from `num_topics` topics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give, Print instead of Gives, Prints in the first line of docstring (here and everywhere).

@menshikh-iv
Copy link
Contributor

@gyanesh-m when you plan to finish this? I can already merge distributed stuff, I also see that you need to make a lot of work with HDP model.

We have to variants

  1. Revert hdp change, merge distributed and you'll continue with HDP in new PR
  2. Fix HDP in current PR (if you make it fast).

What do you think?

@gyanesh-m
Copy link
Contributor Author

@menshikh-iv Thanks for the minor fixes. I think I will be able to fix the hdpmodel.py completely in around 3 hours. I will get started with it right away.

@menshikh-iv
Copy link
Contributor

@gyanesh-m 3 hours with the general description, how the model works? Wow, sounds fantastic, good luck!

@menshikh-iv
Copy link
Contributor

Hey @gyanesh-m, how is going?

@gyanesh-m
Copy link
Contributor Author

@menshikh-iv Hi, currently I am on page 3. I was having some trouble in understanding it so I thought of going through the basics first. Currently, I have gone through the following tutorials as of now

@menshikh-iv
Copy link
Contributor

@gyanesh-m nice work 🥇 I need to clean up & merge this, thanks for your work!

@gyanesh-m
Copy link
Contributor Author

@menshikh-iv You're welcome! Happy to help. Also, thank you for your support and guidance too .

@menshikh-iv menshikh-iv merged commit 1611f3a into piskvorky:develop Apr 2, 2018
@piskvorky piskvorky mentioned this pull request Jun 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor API reference gensim.sklearn_api
2 participants