-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix docstrings for gensim.models.rpmodel
#1802
Changes from 3 commits
453e5b4
79c71ba
4d748a6
00adee9
eecb40a
9d16880
0c389bb
7e13ad8
0832837
fcfe828
fb3e133
8132f51
a4b332c
e9a1a24
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,41 +5,54 @@ | |
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html | ||
|
||
|
||
import logging | ||
""" | ||
Objects of this class allow building and maintaining a model for Random Projections | ||
(also known as Random Indexing). | ||
|
||
import numpy as np | ||
For theoretical background on RP, see: Kanerva et al.: "Random indexing of text samples for Latent Semantic Analysis." | ||
|
||
from gensim import interfaces, matutils, utils | ||
The main methods are: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better to add example here (sometimes code tell us more than text) |
||
|
||
1. constructor, which creates the random projection matrix | ||
2. the [] method, which transforms a simple count representation into the TfIdf | ||
space. | ||
|
||
logger = logging.getLogger('gensim.models.rpmodel') | ||
Model persistency is achieved via its load/save methods. | ||
|
||
|
||
class RpModel(interfaces.TransformationABC): | ||
""" | ||
Objects of this class allow building and maintaining a model for Random Projections | ||
(also known as Random Indexing). For theoretical background on RP, see: | ||
Examples: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Redundant |
||
--------- | ||
>>> from gensim.models import rpmmodel | ||
>>> rp = RpModel(corpus) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An example isn't executable (i.e. corpus isn't defined), "executable" means that I can copy-paste it to console and this runs successfully. |
||
>>> print(rp[some_doc]) | ||
>>> rp.save('/tmp/foo.rp_model') | ||
""" | ||
|
||
Kanerva et al.: "Random indexing of text samples for Latent Semantic Analysis." | ||
import logging | ||
|
||
The main methods are: | ||
import numpy as np | ||
|
||
1. constructor, which creates the random projection matrix | ||
2. the [] method, which transforms a simple count representation into the TfIdf | ||
space. | ||
from gensim import interfaces, matutils, utils | ||
|
||
>>> rp = RpModel(corpus) | ||
>>> print(rp[some_doc]) | ||
>>> rp.save('/tmp/foo.rp_model') | ||
|
||
Model persistency is achieved via its load/save methods. | ||
""" | ||
logger = logging.getLogger('gensim.models.rpmodel') | ||
|
||
|
||
class RpModel(interfaces.TransformationABC): | ||
|
||
def __init__(self, corpus, id2word=None, num_topics=300): | ||
""" | ||
`id2word` is a mapping from word ids (integers) to words (strings). It is | ||
used to determine the vocabulary size, as well as for debugging and topic | ||
printing. If not set, it will be determined from the corpus. | ||
|
||
|
||
Parameters | ||
---------- | ||
corpus : interfaces.CorpusABC | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please use
when you reference a class from anywhere (gensim or something else) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add |
||
id2word : dict of int tostring | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
num_topics : int | ||
|
||
""" | ||
self.id2word = id2word | ||
self.num_topics = num_topics | ||
|
@@ -52,6 +65,12 @@ def __str__(self): | |
def initialize(self, corpus): | ||
""" | ||
Initialize the random projection matrix. | ||
|
||
|
||
Parameters | ||
---------- | ||
corpus : :class:`~interfaces.CorpusABC` | ||
|
||
""" | ||
if self.id2word is None: | ||
logger.info("no word id mapping provided; initializing from corpus, assuming identity") | ||
|
@@ -75,6 +94,16 @@ def initialize(self, corpus): | |
def __getitem__(self, bow): | ||
""" | ||
Return RP representation of the input vector and/or corpus. | ||
|
||
Parameters | ||
---------- | ||
bow : :class:`~interfaces.CorpusABC` (iterable of documents) or list of (int, int). | ||
|
||
Examples: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Redundant |
||
------------- | ||
>>> rp = RpModel(corpus) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
>>> print(rp[some_doc]) | ||
|
||
""" | ||
# if the input vector is in fact a corpus, return a transformed corpus as result | ||
is_corpus, bow = utils.is_corpus(bow) | ||
|
@@ -96,5 +125,12 @@ def __getitem__(self, bow): | |
] | ||
|
||
def __setstate__(self, state): | ||
""" | ||
Sets the internal state and updates freshly_loaded to True. Called when unpicked. | ||
|
||
Parameters | ||
---------- | ||
state : state of the class | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
""" | ||
self.__dict__ = state | ||
self.freshly_loaded = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add concrete link to paper, something like this - https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/parsing/porter.py#L6