Fix docstrings for `gensim.sklearn_api`. Fix #1667 #1895

steremma · 2018-02-10T19:30:00Z

In this PR I am working on the source documentation for the sklearn API.
More wrappers will be submitted in subsequent pushes to the same PR until they are all complete.

menshikh-iv · 2018-02-12T06:24:15Z

gensim/sklearn_api/lsimodel.py

@@ -5,11 +5,6 @@
 # Copyright (C) 2017 Radim Rehurek <[email protected]>
 # Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html

-"""
-Scikit learn interface for gensim for easy use of gensim with scikit-learn


Some __doc__ definitely needed

menshikh-iv · 2018-02-12T06:25:37Z

gensim/sklearn_api/lsimodel.py

-    Base LSI module
+    """Base LSI module.
+
+    Scikit learn interface for `gensim.models.lsimodel` for easy use of gensim with scikit-learn.


please use links

:class:`~gensim.model.lsimodel.LsiModel`

here and everywhere

Also, explicit mention "if you want to read more about it, please look into original class :class:..."

menshikh-iv · 2018-02-12T06:45:22Z

gensim/sklearn_api/lsimodel.py

+
+        Parameters
+        ----------
+        num_topics : int, optional


Wdyt about the link to original method only (for avoiding duplication)?

I also thought about that and I am not sure what is better. On one hand now we have duplication but on the other hand its easier for the developer and user to see the documentation in one tab. Because not all parameters are propagated to the inner model, some of the parameters will be visible in the wrapper and some in the original model (you would need 2 tabs open). I am a bit in favor of duplicating but not 100% sure so if you prefer I will remove duplication.

So, maybe combine both approaches: mentioned parameter & type here, but for description - sent the user to the parameter from original class?

Ok that sounds reasonable, I will apply asap

@steremma we discuss this questions again and this isn't good idea, because it's OK if user look into documentation online (and have a link), but if user use python/jupyter, he will call something like help(model) or model? and for this case, links don't work :( (and this is the main problem). For this reason - can you return descriptions for parameters, copy-paste is the lesser evil than docstring, that exists, but useless, if you can't read it in your interpreter.

Also, the link to original class must be in any case too.

- `lsimodel` - `text2bow` - `phrases` * Added `doc` in every file * Provided sphinx style links to parameter types referencing gensim classes. * Propagated arguments are still duplicated for readability - maybe remove?

…r explanation of their meaning the reader is redirected to the original models documentation

menshikh-iv · 2018-02-13T14:15:19Z

gensim/sklearn_api/lsimodel.py

+
+        Parameters
+        ----------
+        num_topics : int, optional


@steremma we discuss this questions again and this isn't good idea, because it's OK if user look into documentation online (and have a link), but if user use python/jupyter, he will call something like help(model) or model? and for this case, links don't work :( (and this is the main problem). For this reason - can you return descriptions for parameters, copy-paste is the lesser evil than docstring, that exists, but useless, if you can't read it in your interpreter.

Also, the link to original class must be in any case too.

menshikh-iv · 2018-02-13T14:16:52Z

gensim/sklearn_api/lsimodel.py

+--------
+Integrate with sklearn Pipelines:
+
+    >>> model = LsiTransformer(num_topics=15, id2word=id2word)


Great (examples is the really nice idea), but please be sure, that example executable (i.e. you write all needed imports & define data).

You can check that all works easily with python -m doctest path/to/file/with/examples.py

…into sklearn-api-docs

menshikh-iv · 2018-02-20T05:56:24Z

@steremma please be accurate with PEP8 checks, look to https://travis-ci.org/RaRe-Technologies/gensim/jobs/343473932#L509 and resolve it.
You can easily check it locally before push, like tox -e flake8

steremma · 2018-02-20T09:48:29Z

@menshikh-iv this PR is a work in progress I only push to sync between different workstations (I opened it so that you can easily track the status). Sometimes I have to leave one workstation quickly so I just push to continue from another one. In the end the commits will be squashed so I hope it won't be a problem. We can close the PR for now and re-open when its ready for review and merge.

…into sklearn-api-docs

menshikh-iv · 2018-02-22T00:19:15Z

gensim/sklearn_api/d2vmodel.py

+    >>> # Lets represent each document using a 50 dimensional vector
+    >>> model = D2VTransformer(min_count=1, size=50)
+    >>> docvecs = model.fit_transform(common_texts)
+    >>> assert docvecs.shape == (len(common_texts), 50)


maybe something more interesting (like calculation similarity between documents)?

menshikh-iv · 2018-02-22T00:20:12Z

gensim/sklearn_api/lsimodel.py

+
+    >>> # Fit our pipeline to some corpus
+    >>> corpus = [id2word.doc2bow(i.split()) for i in data.data]
+    >>> fitted_pipeline = pipe.fit(corpus, data.target)


add some evaluation here (what's score here?)

menshikh-iv · 2018-02-22T00:20:22Z

gensim/sklearn_api/lsimodel.py

+    >>>
+    >>> # Create an ID to word mapping using some corpus included in sklearn.
+    >>> cats = ['rec.sport.baseball', 'sci.crypt']
+    >>> data = fetch_20newsgroups(subset='train', categories=cats, shuffle=True)


use gensim-data here (20-news available)

menshikh-iv · 2018-02-22T00:21:18Z

gensim/sklearn_api/w2vmodel.py

+    >>>
+    >>> # What is the vector representation of the word 'graph'?
+    >>> wordvecs = model.fit(common_texts).transform(['graph', 'system'])
+    >>> assert wordvecs.shape == (2, 10)


maybe call most_similar?

…into sklearn-api-docs

…ample

menshikh-iv

also, you missed ldaseqmodel, please add this too

menshikh-iv · 2018-02-27T08:55:50Z

gensim/sklearn_api/atmodel.py


+    For more information on the inner workings please take a look at the original class. The model's internal workings
+    are heavily based on `"The Author-Topic Model for Authors and Documents", Osen-Zvi et. al 2004
+    <https://mimno.infosci.cornell.edu/info6150/readings/398.pdf>`_.


Maybe mention paper only in original class (not here), wdyt?

I will replace references with some text <http://...>_

menshikh-iv · 2018-02-27T08:57:15Z

gensim/sklearn_api/hdp.py

+Examples
+--------
+
+    >>> from gensim.test.utils import common_dictionary, common_corpus


unindent example please (here and everywhere)

…into sklearn-api-docs

menshikh-iv · 2018-03-09T07:29:23Z

Current PR will fix #1895

steremma added 4 commits February 10, 2018 20:26

fixed docstring for sklearn_api.lsimodel

4cee8fa

removed duplicated comment

ab0303c

Fixed docstring for sklearn_api.text2bow

4dc001f

Fixed docstrings for sklearn_api.phrases

69faf41

menshikh-iv suggested changes Feb 12, 2018

View reviewed changes

steremma added 2 commits February 12, 2018 13:02

Applied code review corrections in sklearn wrappers for:

5052dfb

- `lsimodel` - `text2bow` - `phrases` * Added `doc` in every file * Provided sphinx style links to parameter types referencing gensim classes. * Propagated arguments are still duplicated for readability - maybe remove?

constructor docstrings now only mention the type of each argument. Fo…

c027203

…r explanation of their meaning the reader is redirected to the original models documentation

menshikh-iv suggested changes Feb 13, 2018

View reviewed changes

steremma added 3 commits February 13, 2018 17:07

Brought back parameter explanation in the wrappers for easier lookup

3815605

added examples to __doc__, work still in progress

c1e05df

added simple and executable examples to __doc__

4cfbf5c

menshikh-iv mentioned this pull request Feb 16, 2018

Refactor API reference gensim.sklearn_api #1667

Closed

12 tasks

steremma and others added 3 commits February 19, 2018 12:10

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

f2615ef

…into sklearn-api-docs

temp work on some more wrappers

3581a46

finished docstrings for LDA wrapper, examples pending

8ef1105

steremma and others added 7 commits February 20, 2018 18:54

finished doc2vec wrapper with example

add7420

completed LDA wrapper including example

38a610f

finished the tfidf wrapper including example

5f00f34

PEP-8 corrections

1d8c63c

w2v documentation - example result pending

f8fffd6

Merge branch 'sklearn-api-docs' of https://github.com/steremma/gensim …

c866af0

…into sklearn-api-docs

fixed w2v example

3cf28a3

menshikh-iv suggested changes Feb 22, 2018

View reviewed changes

steremma and others added 5 commits February 22, 2018 17:11

added documentation for the lda sequential model - examples pending

b55a2a2

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

6c1aeb8

…into sklearn-api-docs

added documentation for the author topic sklearn wrapper including ex…

b0600cd

…ample

improved example by presenting a way to get a pipeline score

e2ca72f

improved example using similarities

f66abbb

menshikh-iv suggested changes Feb 27, 2018

View reviewed changes

steremma and others added 4 commits February 27, 2018 10:14

Merge branch 'sklearn-api-docs' of https://github.com/steremma/gensim …

ef5d7ab

…into sklearn-api-docs

unidented examples and fixed paper references

4285741

Merge branch 'sklearn-api-docs' of https://github.com/steremma/gensim …

2f02cfe

…into sklearn-api-docs

finalized ldaseq wrapper

0c56ae9

menshikh-iv added RFM incubator project PR is RaRe incubator project labels Mar 5, 2018

menshikh-iv changed the title ~~Sklearn API docstrings~~ Fix docstrings for gensim.sklearn_api. Fix #1667 Mar 9, 2018

menshikh-iv added 19 commits March 13, 2018 17:01

fix __init__

64f8d4f

Merge remote-tracking branch 'upstream/develop' into sklearn-api-docs

9b4c375

resolve merge-conflict with pivot norm

7a204e1

fix atmodel

39bbe31

fix atmodel[2]

20ea33e

fix d2vmodel

31fb94e

fix hdp + small fixes

4432b77

fix ldamodel + small fixes

e729a26

small fixes

14fcf22

fix ldaseqmodel

07a8cba

small fixes (again)

5325d05

fix lsimodel

b250ca4

fix phrases

3fc3bef

fix rpmodel

dc9f659

fix text2bow

4ec4619

fix tfidf

36a263a

fix word2vec

ae4a5b4

cleanup

0ad6580

cleanup[2]

8a45bef

menshikh-iv merged commit 75a2309 into piskvorky:develop Mar 15, 2018

steremma deleted the sklearn-api-docs branch March 15, 2018 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix docstrings for `gensim.sklearn_api`. Fix #1667 #1895

Fix docstrings for `gensim.sklearn_api`. Fix #1667 #1895

steremma commented Feb 10, 2018

menshikh-iv Feb 12, 2018

menshikh-iv Feb 12, 2018

menshikh-iv Feb 12, 2018

menshikh-iv Feb 12, 2018

steremma Feb 12, 2018

menshikh-iv Feb 12, 2018

steremma Feb 12, 2018

menshikh-iv Feb 13, 2018

menshikh-iv Feb 13, 2018

menshikh-iv Feb 13, 2018

menshikh-iv commented Feb 20, 2018 •

edited

Loading

steremma commented Feb 20, 2018

menshikh-iv Feb 22, 2018

menshikh-iv Feb 22, 2018

menshikh-iv Feb 22, 2018

menshikh-iv Feb 22, 2018

menshikh-iv left a comment

menshikh-iv Feb 27, 2018

steremma Feb 27, 2018

menshikh-iv Feb 27, 2018

menshikh-iv commented Mar 9, 2018

Fix docstrings for gensim.sklearn_api. Fix #1667 #1895

Fix docstrings for gensim.sklearn_api. Fix #1667 #1895

Conversation

steremma commented Feb 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv commented Feb 20, 2018 • edited Loading

steremma commented Feb 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv commented Mar 9, 2018

Fix docstrings for `gensim.sklearn_api`. Fix #1667 #1895

Fix docstrings for `gensim.sklearn_api`. Fix #1667 #1895

menshikh-iv commented Feb 20, 2018 •

edited

Loading