Fix documentation for `gensim.corpora`. Partial fix #1671 #1729

anotherbugmaster · 2017-11-21T08:29:30Z

Fix #1671

Docs formally comply with numpy style now but not all type annotations and descriptions are there.

Merged C D C D C Merge D C C kk C C

Merge

Merge develop

Fix identation

menshikh-iv

Please continue your work, what a voluminous PR 👍

menshikh-iv · 2017-12-12T14:14:13Z

gensim/corpora/bleicorpus.py

+        Parameters
+        ----------
+        fname : str
+            Serialized corpus's filename


Dot on the end of sentence (everywhere)

menshikh-iv · 2017-12-12T14:14:38Z

gensim/corpora/bleicorpus.py

+        corpus : iterable
+            Iterable of documents
+        id2word : dict of (str, str), optional
+            Transforms id to word (Default value = None)


no default values in docstrings (everywhere)

menshikh-iv · 2017-12-25T13:25:22Z

gensim/corpora/bleicorpus.py

+        ----------
+        fname : str
+            Serialized corpus's filename
+        fname_vocab : str or None, optional


Need to understand how to:

Document multiple types of argument (i.e. when the parameter can be type X or Y)

Document multiple types for "Return" section

Correctly specify the parent class (if there are many heirs)

menshikh-iv · 2017-12-25T13:25:40Z

gensim/corpora/bleicorpus.py

+        ----------
+        fname : str
+            Filename
+        corpus : iterable


iterable of ... ? (here and everywhere)

menshikh-iv · 2017-12-25T13:26:15Z

gensim/corpora/bleicorpus.py

+
+        Returns
+        -------
+        list of (int, float)


Missing parameter description (here and everywhere)

menshikh-iv · 2017-12-25T13:30:20Z

gensim/corpora/indexedcorpus.py

-        >>> corpus_with_random_access = gensim.corpora.SvmLightCorpus('tstfile.svmlight')
-        >>> print(corpus_with_random_access[1])
-        [(0, 1.0), (1, 2.0)]
+            >>> corpus = [[(1, 0.5)], [(0, 1.0), (1, 2.0)]]


Examples should be executable and split into 3 sections: imports, data preparation, direct functionality

>>> from .. import ... >>> import ... >>> >>> data = ... >>> makesomething(data)

menshikh-iv · 2017-12-25T13:31:25Z

gensim/corpora/lowcorpus.py

    return [word for word in utils.to_unicode(s).strip().split(' ') if word]


 class LowCorpus(IndexedCorpus):
-    """
-    List_Of_Words corpus handles input in GibbsLda++ format.
+    """List_Of_Words corpus handles input in GibbsLda++ format.

    Quoting http://gibbslda.sourceforge.net/#3.2_Input_Data_Format::


Link in other format

menshikh-iv · 2017-12-25T13:31:30Z

gensim/corpora/lowcorpus.py

+
+    Parameters
+    ----------
+    s :


???? (empty descriptions here and everywhere)

Docs formally comply with numpy style now but not all type annotations and descriptions are there.

:)

anotherbugmaster

You're right, it was one of the first files in corpora, I didn't know about some of the specification features.

anotherbugmaster · 2018-01-18T11:23:13Z

gensim/corpora/bleicorpus.py

-            vocab/vocab.txt file.
+            File path to Serialized corpus.
+        fname_vocab : str, optional
+            Vocabulary file. If `fname_vocab` is None, searching for the vocab.txt or `fname_vocab`.vocab file.


Are you sure it's fname_vocab.vocab? fname_vocab is none, isn't it?

Not quite, I added correct description

Still don't get it. It should be `fname`.vocab, `fname_vocab`.vocab is undefined!

Not quite :) I go through the code with ipdb for this case, this is significantly "wider" that we discuss here (I already fix it).

anotherbugmaster · 2018-01-18T11:27:57Z

gensim/corpora/bleicorpus.py

-            Filename.
-        corpus : iterable
-            Iterable of documents.
+            Path to output filename.


To output file

anotherbugmaster · 2018-01-18T11:29:35Z

gensim/corpora/bleicorpus.py

-            Iterable of documents.
+            Path to output filename.
+        corpus : iterable of iterable of (int, float)
+            Input corpus


Obvious, no additional information provided. There's no need to have descriptions for all arguments. :)

anotherbugmaster · 2018-01-18T11:31:47Z

gensim/corpora/bleicorpus.py

@@ -153,16 +160,18 @@ def save_corpus(fname, corpus, id2word=None, metadata=False):
        return offsets

    def docbyoffset(self, offset):
-        """Return document corresponding to `offset`.
+        """Get document corresponding to `offset`,


First line of docstring should always end with a dot.

anotherbugmaster · 2018-01-21T16:19:04Z

gensim/corpora/bleicorpus.py

+        Parameters
+        ----------
+        fname : str
+            File path to Serialized corpus.


Path to corpus here and in other corpora maybe?

anotherbugmaster · 2018-01-21T16:21:02Z

gensim/corpora/bleicorpus.py

+        fname : str
+            File path to Serialized corpus.
+        fname_vocab : str, optional
+            Vocabulary file. If `fname_vocab` is None, searching for the vocab.txt or `fname_vocab`.vocab file.


Vocabulary file. If fname_vocab is None, searching for the vocab.txt or fname.vocab file.

anotherbugmaster · 2018-01-21T16:25:45Z

gensim/corpora/bleicorpus.py

+        fname : str
+            Path to output filename.
+        corpus : iterable of iterable of (int, float)
+            Input corpus


Still think that it's not necessary. Also, there's a dot missing at the end of the line.

anotherbugmaster · 2018-01-21T16:26:56Z

gensim/corpora/bleicorpus.py

@@ -121,8 +160,19 @@ def save_corpus(fname, corpus, id2word=None, metadata=False):
        return offsets

    def docbyoffset(self, offset):
-        """
-        Return the document stored at file position `offset`.
+        """Get document corresponding to `offset`,


The first line should end with a dot.

anotherbugmaster · 2018-01-21T16:29:08Z

gensim/corpora/indexedcorpus.py

+        Parameters
+        ----------
+        fname : str
+            Path to output filename


Dots at the end of the line. Did I miss these? O_o

anotherbugmaster · 2018-01-21T16:32:42Z

gensim/corpora/svmlightcorpus.py


    def line2doc(self, line):
-        """
-        Create a document from a single line (string) in SVMlight format
+        """Get a document from a single line in SVMlight format,


The first line should end with a dot.

anotherbugmaster · 2018-01-21T16:33:59Z

gensim/corpora/wikicorpus.py

+    Parameters
+    ----------
+    s : str
+        String containing markup template


A dot at the EOL.

anotherbugmaster · 2018-01-21T16:34:23Z

gensim/corpora/wikicorpus.py

+    token_min_len : int
+        Minimal token length.
+    token_max_len : int
+        Maximal token length


anotherbugmaster · 2018-01-21T16:35:37Z

gensim/corpora/wikicorpus.py

+    f : file
+        File-like object.
+    filter_namespaces : list of str or bool
+         Namespaces that will be extracted


anotherbugmaster · 2018-01-21T16:36:48Z

gensim/corpora/wikicorpus.py

-        the standard corpus interface instead of this function::
+        Notes
+        -----
+        This iterates over the **texts**. If you want vectors, just use the standard corpus interface


…iskvorky#1729) * Fix typo * Make `save_corpus` private * Annotate `bleicorpus.py` * Make __save_corpus weakly private * Fix _save_corpus in tests * Fix _save_corpus[2] * Document bleicorpus in Numpy style * Document indexedcorpus * Annotate csvcorpus * Add "Yields" section * Make `_save_corpus` public * Annotate bleicorpus * Fix indentation in bleicorpus * `_save_corpus` -> `save_corpus` * Annotate bleicorpus * Convert dictionary docs to numpy style * Convert hashdictionary docs to numpy style * Convert indexedcorpus docs to numpy style * Convert lowcorpus docs to numpy style * Convert malletcorpus docs to numpy style * Convert mmcorpus docs to numpy style * Convert sharded_corpus docs to numpy style * Convert svmlightcorpus docs to numpy style * Convert textcorpus docs to numpy style * Convert ucicorpus docs to numpy style * Convert wikicorpus docs to numpy style * Add sphinx tweaks * Remove trailing whitespaces * Annotate wikicorpus * SVMLight Corpus annotated * Fix TODO * Fix grammar mistake * Undo changes to dictionary * Undo changes to hashdictionary * Document indexedcorpus * Document indexedcorpus[2] Fix identation * Remove redundant files * Add more dots. :) * Fix monospace * remove useless method * fix bleicorpus * fix csvcorpus * fix indexedcorpus * fix svmlightcorpus * fix wikicorpus[1] * fix wikicorpus[2] * fix wikicorpus[3] * fix review comments

anotherbugmaster added 30 commits September 30, 2017 15:39

Fix typo

b260d4b

Make save_corpus private

36d98d1

Annotate bleicorpus.py

981ebbb

Make __save_corpus weakly private

3428113

Fix _save_corpus in tests

69fc7e0

Fix _save_corpus[2]

b65a69a

Merge remote-tracking branch 'upstream/develop' into develop

6fa92f3

Merged C D C D C Merge D C C kk C C

Document bleicorpus in Numpy style

78e207d

Document indexedcorpus

7519382

Annotate csvcorpus

ae69867

Add "Yields" section

c2765ed

Make _save_corpus public

40add21

Annotate bleicorpus

e044c3a

Fix indentation in bleicorpus

123327d

_save_corpus -> save_corpus

2382d01

Annotate bleicorpus

42409bf

Convert dictionary docs to numpy style

7cb5bbf

Convert hashdictionary docs to numpy style

56f19e6

Convert indexedcorpus docs to numpy style

9162a7e

Convert lowcorpus docs to numpy style

5eaaac4

Convert malletcorpus docs to numpy style

3b6b076

Convert mmcorpus docs to numpy style

d7f3fc8

Convert sharded_corpus docs to numpy style

c46bff4

Convert svmlightcorpus docs to numpy style

7823546

Convert textcorpus docs to numpy style

9878133

Convert ucicorpus docs to numpy style

dba4429

Convert wikicorpus docs to numpy style

6a95c94

Add sphinx tweaks

6dcfb07

Merge remote-tracking branch 'upstream/develop' into develop

2f61fc3

Merge

Merge branch 'develop' into fix_1605

ac01abb

Merge develop

anotherbugmaster added 5 commits December 6, 2017 19:29

Fix grammar mistake

9eeea21

Undo changes to dictionary

2b6aeaf

Undo changes to hashdictionary

9b17057

Document indexedcorpus

de3ea0f

Document indexedcorpus[2]

dafc373

Fix identation

menshikh-iv suggested changes Dec 25, 2017

View reviewed changes

anotherbugmaster added 5 commits January 9, 2018 22:22

Merge upstream

ff980bc

Remove redundant files

0189d8d

Merge upstream

943406c

Add more dots. :)

57cb5a3

Fix monospace

08ca492

menshikh-iv changed the title ~~Convert corpora docs to numpy style~~ Fix documentation for gensim.corpora. Partial fix #1671 Jan 18, 2018

menshikh-iv added 8 commits January 18, 2018 13:17

remove useless method

381fb97

fix bleicorpus

5b5701a

fix csvcorpus

0e5c0cf

fix indexedcorpus

627c0e5

fix svmlightcorpus

b771bb5

fix wikicorpus[1]

d76af8d

fix wikicorpus[2]

7fe753f

fix wikicorpus[3]

a9eb1a3

anotherbugmaster commented Jan 18, 2018

View reviewed changes

anotherbugmaster commented Jan 21, 2018

View reviewed changes

fix review comments

e3a8ebf

menshikh-iv merged commit c5f487d into piskvorky:develop Jan 22, 2018

pyup-bot mentioned this pull request Feb 5, 2018

Scheduled weekly dependency update for week 05 workforce-data-initiative/skills-ml#120

Closed

menshikh-iv mentioned this pull request Mar 9, 2018

Refactor API reference gensim.corpora #1671

Closed

14 tasks

piskvorky mentioned this pull request Apr 30, 2018

Documentation fixes #2037

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix documentation for `gensim.corpora`. Partial fix #1671 #1729

Fix documentation for `gensim.corpora`. Partial fix #1671 #1729

anotherbugmaster commented Nov 21, 2017 •

edited

Loading

menshikh-iv left a comment

menshikh-iv Dec 12, 2017

menshikh-iv Dec 12, 2017

menshikh-iv Dec 25, 2017

menshikh-iv Dec 25, 2017

menshikh-iv Dec 25, 2017

menshikh-iv Dec 25, 2017

menshikh-iv Dec 25, 2017

menshikh-iv Dec 25, 2017

anotherbugmaster Dec 26, 2017

anotherbugmaster left a comment

anotherbugmaster Jan 18, 2018

menshikh-iv Jan 18, 2018

menshikh-iv Jan 22, 2018

anotherbugmaster Jan 25, 2018 •

edited

Loading

menshikh-iv Jan 25, 2018

anotherbugmaster Jan 18, 2018

anotherbugmaster Jan 18, 2018

anotherbugmaster Jan 18, 2018

anotherbugmaster Jan 21, 2018

anotherbugmaster Jan 21, 2018 •

edited

Loading

anotherbugmaster Jan 21, 2018

anotherbugmaster Jan 21, 2018

anotherbugmaster Jan 21, 2018

anotherbugmaster Jan 21, 2018

anotherbugmaster Jan 21, 2018

anotherbugmaster Jan 21, 2018

anotherbugmaster Jan 21, 2018

anotherbugmaster Jan 21, 2018

Fix documentation for gensim.corpora. Partial fix #1671 #1729

Fix documentation for gensim.corpora. Partial fix #1671 #1729

Conversation

anotherbugmaster commented Nov 21, 2017 • edited Loading

menshikh-iv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anotherbugmaster left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anotherbugmaster Jan 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anotherbugmaster Jan 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fix documentation for `gensim.corpora`. Partial fix #1671 #1729

Fix documentation for `gensim.corpora`. Partial fix #1671 #1729

anotherbugmaster commented Nov 21, 2017 •

edited

Loading

anotherbugmaster Jan 25, 2018 •

edited

Loading

anotherbugmaster Jan 21, 2018 •

edited

Loading