Fix param parsing. #286

Carreau · 2020-07-18T01:24:40Z

Closes #285

This fixes two tings:

When first sentence of the docstring is onteh first line, Parameters
is not properly parse, which for example mis parsed numpy.array
docstring.
many project have paremeters description list with : afer the
name, even if no type is present. If there is no space after the :
the parameter name includes the : which is most likely wrong.

Closes numpy#285 This fixes two tings: - When first sentence of the docstring is onteh first line, Parameters is not properly parse, which for example mis parsed numpy.array docstring. - many project have paremeters description list with ` :` afer the name, even if no type is present. If there is no space after the `:` the parameter name includes the ` :` which is most likely wrong.

codecov-commenter · 2020-07-18T01:27:28Z

Codecov Report

Merging #286 into master will decrease coverage by 1.11%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #286      +/-   ##
==========================================
- Coverage   93.10%   91.99%   -1.12%     
==========================================
  Files           7        7              
  Lines        1261     1261              
==========================================
- Hits         1174     1160      -14     
- Misses         87      101      +14

rossbar

It seems like the indenting/parsing gets screwed up for more than just the Parameters section. Using np.array.__doc__ as and example::

>>> print(NumpyDocString(np.array.__doc__))
array(object, dtype=None, \*, copy=True, order='K', subok=False, ndmin=0)

    Create an array.

Parameters
----------
object : array_like
        An array, any object exposing the array interface, an object whose
        __array__ method returns an array, or any (nested) sequence.
    dtype : data-type, optional
        The desired data-type for the array.  If not given, then the type will
        be determined as the minimum type required to hold the objects in the
        sequence.
    copy : bool, optional
        If true (default), then the object is copied.  Otherwise, a copy will
        only be made if __array__ returns a copy, if obj is a nested sequence,
        or if a copy is needed to satisfy any of the other requirements
        (`dtype`, `order`, etc.).
    order : {'K', 'A', 'C', 'F'}, optional
        Specify the memory layout of the array. If object is not an array, the
        newly created array will be in C order (row major) unless 'F' is
        specified, in which case it will be in Fortran order (column major).
        If object is an array the following holds.
    
        ===== ========= ===================================================
        order  no copy                     copy=True
        ===== ========= ===================================================
        'K'   unchanged F & C order preserved, otherwise most similar order
        'A'   unchanged F order if input is F and not C, otherwise C order
        'C'   C order   C order
        'F'   F order   F order
        ===== ========= ===================================================
    
        When ``copy=False`` and a copy is made for other reasons, the result is
        the same as if ``copy=True``, with some exceptions for `A`, see the
        Notes section. The default order is 'K'.
    subok : bool, optional
        If True, then sub-classes will be passed-through, otherwise
        the returned array will be forced to be a base-class array (default).
    ndmin : int, optional
        Specifies the minimum number of dimensions that the resulting
        array should have.  Ones will be pre-pended to the shape as
        needed to meet this requirement.

Returns
-------
out : ndarray
    An array object satisfying the specified requirements.

See Also
--------

`empty_like`_
    Return an empty array with shape and type of input.
`ones_like`_
    Return an array of ones with shape and type of input.
`zeros_like`_
    Return an array of zeros with shape and type of input.
`full_like`_
    Return a new array with shape of input filled with value.
`empty`_
    Return a new uninitialized array.
`ones`_
    Return a new array setting values to one.
`zeros`_
    Return a new array setting values to zero.
`full`_
    Return a new array of given shape filled with value.


Notes
-----
    When order is 'A' and `object` is an array in neither 'C' nor 'F' order,
    and a copy is forced by a change in dtype, then the order of the result is
    not necessarily 'C' as expected. This is likely a bug.

Examples
--------
    >>> np.array([1, 2, 3])
    array([1, 2, 3])

    Upcasting:

    >>> np.array([1, 2, 3.0])
    array([ 1.,  2.,  3.])

    More than one dimension:

    >>> np.array([[1, 2], [3, 4]])
    array([[1, 2],
           [3, 4]])

    Minimum dimensions 2:

    >>> np.array([1, 2, 3], ndmin=2)
    array([[1, 2, 3]])

    Type provided:

    >>> np.array([1, 2, 3], dtype=complex)
    array([ 1.+0.j,  2.+0.j,  3.+0.j])

    Data-type consisting of more than one element:

    >>> x = np.array([(1,2),(3,4)],dtype=[('a','<i4'),('b','<i4')])
    >>> x['a']
    array([1, 3])

    Creating an array from sub-classes:

    >>> np.array(np.mat('1 2; 3 4'))
    array([[1, 2],
           [3, 4]])

    >>> np.array(np.mat('1 2; 3 4'), subok=True)
    matrix([[1, 2],
            [3, 4]])

Note that some sections are fully dedented (e.g. See Also) whereas others are partially dedented (Parameters) and others have only the headings dedented (Notes, Examples). Maybe there's a more general fix than the change to _parse_param_list to get the dedenting correct for the entire docstring when the first line doesn't match the indenting scheme of the rest of the docstring?

Carreau · 2020-07-20T22:03:52Z

to get the dedenting correct for the entire docstring when the first line doesn't match the indenting scheme of the rest of the docstring?

That was the first fix I tried, but this make validate fails as well as other things which are attempting to tell you when the docstring is not valid or have extra spaces in place it does not like.

Personally I end up always calling it as so Numpydoc(dedent_but_first(docstring)), to work around some other inconsistencies.

rossbar · 2020-07-20T22:19:55Z

That was the first fix I tried, but this make validate fails as well as other things which are attempting to tell you when the docstring is not valid or have extra spaces in place it does not like.

Yes, I noticed this too when I was messing around with alternatives (see #287, which will fail one validation test). It seems to me much more straightforward to catch issues due to dedenting errors as far upstream as possible, rather than modifying the downstream functions to handle the corner cases of less-robust docstring cleaning.

However, if there are strong backwards compatibility concerns with the validation module, then the approach in #287 is a non-starter.

Carreau · 2020-07-20T22:24:44Z

Yeah, I went with not validating, and just re-emitting the docstring from the Numpydoc _parsed_data data structure as the __repr__ does also some weird things and ended up with a docstring reformatter.

rossbar · 2020-07-20T22:56:49Z

Hmm, on the one hand the validation should definitely catch indentation errors, but it would be nice if the docstring parsing were more resilient to indentation problems.

Carreau · 2020-07-20T23:51:10Z

Yeah, I think the two logics are intertwined too much, and the "parsing"/"guessing"/"validating" should be different steps.

There are many docstrings around that have invalid section names, and right now can't be parsed as Numpydoc Errors immediately, though that might be for a longer term project to clean that up.

rossbar · 2020-07-21T00:08:17Z

There are many docstrings around that have invalid section names, and right now can't be parsed as Numpydoc Errors immediately, though that might be for a longer term project to clean that up.

... including in numpy itself! (numpy/numpy#16791)

I agree it's probably a larger project, so let's not worry about it in this PR. I think the changes here look good. The only additions I'd make is to apply a similar pytest.mark.parametrize to the test_returns and test_other_parameters tests, since they also are affected by the changes to _parse_param_list. I also slightly prefer the parametrization I used in #287 as it's a little shorter, but YMMV.

jnothman · 2020-07-22T22:37:32Z

many project have paremeters description list with : afer the name, even if no type is present. If there is no space after the : the parameter name includes the : which is most likely wrong.

The problem being that this matched the relatively obscure syntax for ReST definition lists...

Carreau · 2020-07-24T03:57:40Z

ok, I made doc a parameterized fixture so every test using it are now parametrized on '' (flush) vs '\n ' (newline indented)

Carreau · 2020-08-07T21:39:09Z

Anything I can do to push this forward ?

larsoner

Other than two nitpicks, LGTM

numpydoc/tests/test_docscrape.py

Co-authored-by: Eric Larson <[email protected]>

Carreau · 2020-08-10T16:03:56Z

Thanks !

larsoner · 2020-08-10T17:03:21Z

Thanks @Carreau !

larsoner approved these changes Jul 20, 2020

View reviewed changes

rossbar reviewed Jul 20, 2020

View reviewed changes

test fixture

c76c3cf

Carreau force-pushed the fix-param-parsing branch from 9066488 to 7f40a9f Compare July 24, 2020 03:59

make doc a fixture

f8b424a

Carreau force-pushed the fix-param-parsing branch from 7f40a9f to f8b424a Compare July 24, 2020 04:00

rossbar mentioned this pull request Aug 3, 2020

MAINT,TST: use inspect.cleandoc ind docstring prep. #287

Closed

larsoner approved these changes Aug 10, 2020

View reviewed changes

numpydoc/tests/test_docscrape.py Show resolved Hide resolved

numpydoc/tests/test_docscrape.py Outdated Show resolved Hide resolved

Carreau and others added 2 commits August 10, 2020 09:02

Update numpydoc/tests/test_docscrape.py

083aebf

Co-authored-by: Eric Larson <[email protected]>

Update numpydoc/tests/test_docscrape.py

c390886

Co-authored-by: Eric Larson <[email protected]>

larsoner merged commit 676a8d4 into numpy:master Aug 10, 2020

jarrodmillman added this to the 1.2.0 milestone Jan 9, 2022

jarrodmillman added the type: Bug fix label Jan 22, 2022

MaozGelbart mentioned this pull request Sep 14, 2022

Parsing returns section with several types and no name #428

Closed

rossbar mentioned this pull request Sep 16, 2022

BUG: Fix returns parsing no name #429

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix param parsing. #286

Fix param parsing. #286

Carreau commented Jul 18, 2020

codecov-commenter commented Jul 18, 2020 •

edited

Loading

rossbar left a comment

Carreau commented Jul 20, 2020

rossbar commented Jul 20, 2020

Carreau commented Jul 20, 2020

rossbar commented Jul 20, 2020

Carreau commented Jul 20, 2020

rossbar commented Jul 21, 2020

jnothman commented Jul 22, 2020

Carreau commented Jul 24, 2020

Carreau commented Aug 7, 2020

larsoner left a comment

Carreau commented Aug 10, 2020

larsoner commented Aug 10, 2020

Fix param parsing. #286

Fix param parsing. #286

Conversation

Carreau commented Jul 18, 2020

codecov-commenter commented Jul 18, 2020 • edited Loading

Codecov Report

rossbar left a comment

Choose a reason for hiding this comment

Carreau commented Jul 20, 2020

rossbar commented Jul 20, 2020

Carreau commented Jul 20, 2020

rossbar commented Jul 20, 2020

Carreau commented Jul 20, 2020

rossbar commented Jul 21, 2020

jnothman commented Jul 22, 2020

Carreau commented Jul 24, 2020

Carreau commented Aug 7, 2020

larsoner left a comment

Choose a reason for hiding this comment

Carreau commented Aug 10, 2020

larsoner commented Aug 10, 2020

codecov-commenter commented Jul 18, 2020 •

edited

Loading