Skip to content

Commit

Permalink
Fix docstrings for lsi-related code (#1892)
Browse files Browse the repository at this point in the history
* Added numpy style docstrings to all functions, methods and classes in the LsiModel module. Types need to be checked before merging

* Fixed generic container type (stream/list -> iterable) and added expected shape for sparse matrix arguments

* Fix PEP-8

* Applied corrections mentioned in code review.

* General class level remarks moved to class docstring from `__init__`

* References to `gensim` classes are now using `sphinx` notation.

* Numpy parameters annotated with `np.type` instead of `type`.

* Added docstrings for `lsi_worker` and `lsi_dispatcher`

* added argument parsing and fixed __doc__

* update configs with new extension sphinxcontrib.programoutput

* added blank link in __doc__

* sphinx identation fix

* chmod revert

* fix lsimodel[1]

* fix lsimodel[2]

* fix lsimodel[3]

* fix lsimodel[4]

* fix basemodel

* fixes

* fix lsi_worker & missing fields in .rst

* last fixes for worker & dispatcher

* add missing link
  • Loading branch information
steremma authored and menshikh-iv committed Feb 16, 2018
1 parent 0659c10 commit 0db8796
Show file tree
Hide file tree
Showing 6 changed files with 908 additions and 484 deletions.
3 changes: 2 additions & 1 deletion docs/src/models/lsi_dispatcher.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@
:synopsis: Dispatcher for distributed LSI
:members:
:inherited-members:

:undoc-members:
:show-inheritance:
3 changes: 2 additions & 1 deletion docs/src/models/lsi_worker.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@
:synopsis: Worker for distributed LSI
:members:
:inherited-members:

:undoc-members:
:show-inheritance:
49 changes: 38 additions & 11 deletions gensim/models/basemodel.py
Original file line number Diff line number Diff line change
@@ -1,24 +1,51 @@
class BaseTopicModel(object):
def print_topic(self, topicno, topn=10):
"""
Return a single topic as a formatted string. See `show_topic()` for parameters.
"""Get a single topic as a formatted string.
Parameters
----------
topicno : int
Topic id.
topn : int
Number of words from topic that will be used.
>>> lsimodel.print_topic(10, topn=5)
'-0.340 * "category" + 0.298 * "$M$" + 0.183 * "algebra" + -0.174 * "functor" + -0.168 * "operator"'
Returns
-------
str
String representation of topic, like '-0.340 * "category" + 0.298 * "$M$" + 0.183 * "algebra" + ... '.
"""
return ' + '.join(['%.3f*"%s"' % (v, k) for k, v in self.show_topic(topicno, topn)])

def print_topics(self, num_topics=20, num_words=10):
"""Alias for `show_topics()` that prints the `num_words` most
probable words for `topics` number of topics to log.
Set `topics=-1` to print all topics."""
"""Get the most significant topics (alias for `show_topics()` method).
Parameters
----------
num_topics : int, optional
The number of topics to be selected, if -1 - all topics will be in result (ordered by significance).
num_words : int, optional
The number of words to be included per topics (ordered by significance).
Returns
-------
list of (int, list of (str, float))
Sequence with (topic_id, [(word, value), ... ]).
"""
return self.show_topics(num_topics=num_topics, num_words=num_words, log=True)

def get_topics(self):
"""
Returns:
np.ndarray: `num_topics` x `vocabulary_size` array of floats which represents
the term topic matrix learned during inference.
"""Get words X topics matrix.
Returns
--------
numpy.ndarray:
The term topic matrix learned during inference, shape (`num_topics`, `vocabulary_size`).
Raises
------
NotImplementedError
"""
raise NotImplementedError
Loading

0 comments on commit 0db8796

Please sign in to comment.