Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix param parsing. #286

Merged
merged 5 commits into from
Aug 10, 2020
Merged

Fix param parsing. #286

merged 5 commits into from
Aug 10, 2020

Conversation

Carreau
Copy link
Contributor

@Carreau Carreau commented Jul 18, 2020

Closes #285

This fixes two tings:

  • When first sentence of the docstring is onteh first line, Parameters
    is not properly parse, which for example mis parsed numpy.array
    docstring.

  • many project have paremeters description list with : afer the
    name, even if no type is present. If there is no space after the :
    the parameter name includes the : which is most likely wrong.

Closes numpy#285

This fixes two tings:

 - When first sentence of the docstring is onteh first line, Parameters
 is not properly parse, which for example mis parsed numpy.array
 docstring.

 - many project have paremeters description list with ` :` afer the
 name, even if no type is present. If there is no space after the `:`
 the parameter name includes the ` :` which is most likely wrong.
@codecov-commenter
Copy link

codecov-commenter commented Jul 18, 2020

Codecov Report

Merging #286 into master will decrease coverage by 1.11%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #286      +/-   ##
==========================================
- Coverage   93.10%   91.99%   -1.12%     
==========================================
  Files           7        7              
  Lines        1261     1261              
==========================================
- Hits         1174     1160      -14     
- Misses         87      101      +14     

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the indenting/parsing gets screwed up for more than just the Parameters section. Using np.array.__doc__ as and example::

>>> print(NumpyDocString(np.array.__doc__))
array(object, dtype=None, \*, copy=True, order='K', subok=False, ndmin=0)

    Create an array.

Parameters
----------
object : array_like
        An array, any object exposing the array interface, an object whose
        __array__ method returns an array, or any (nested) sequence.
    dtype : data-type, optional
        The desired data-type for the array.  If not given, then the type will
        be determined as the minimum type required to hold the objects in the
        sequence.
    copy : bool, optional
        If true (default), then the object is copied.  Otherwise, a copy will
        only be made if __array__ returns a copy, if obj is a nested sequence,
        or if a copy is needed to satisfy any of the other requirements
        (`dtype`, `order`, etc.).
    order : {'K', 'A', 'C', 'F'}, optional
        Specify the memory layout of the array. If object is not an array, the
        newly created array will be in C order (row major) unless 'F' is
        specified, in which case it will be in Fortran order (column major).
        If object is an array the following holds.
    
        ===== ========= ===================================================
        order  no copy                     copy=True
        ===== ========= ===================================================
        'K'   unchanged F & C order preserved, otherwise most similar order
        'A'   unchanged F order if input is F and not C, otherwise C order
        'C'   C order   C order
        'F'   F order   F order
        ===== ========= ===================================================
    
        When ``copy=False`` and a copy is made for other reasons, the result is
        the same as if ``copy=True``, with some exceptions for `A`, see the
        Notes section. The default order is 'K'.
    subok : bool, optional
        If True, then sub-classes will be passed-through, otherwise
        the returned array will be forced to be a base-class array (default).
    ndmin : int, optional
        Specifies the minimum number of dimensions that the resulting
        array should have.  Ones will be pre-pended to the shape as
        needed to meet this requirement.

Returns
-------
out : ndarray
    An array object satisfying the specified requirements.

See Also
--------

`empty_like`_
    Return an empty array with shape and type of input.
`ones_like`_
    Return an array of ones with shape and type of input.
`zeros_like`_
    Return an array of zeros with shape and type of input.
`full_like`_
    Return a new array with shape of input filled with value.
`empty`_
    Return a new uninitialized array.
`ones`_
    Return a new array setting values to one.
`zeros`_
    Return a new array setting values to zero.
`full`_
    Return a new array of given shape filled with value.


Notes
-----
    When order is 'A' and `object` is an array in neither 'C' nor 'F' order,
    and a copy is forced by a change in dtype, then the order of the result is
    not necessarily 'C' as expected. This is likely a bug.

Examples
--------
    >>> np.array([1, 2, 3])
    array([1, 2, 3])

    Upcasting:

    >>> np.array([1, 2, 3.0])
    array([ 1.,  2.,  3.])

    More than one dimension:

    >>> np.array([[1, 2], [3, 4]])
    array([[1, 2],
           [3, 4]])

    Minimum dimensions 2:

    >>> np.array([1, 2, 3], ndmin=2)
    array([[1, 2, 3]])

    Type provided:

    >>> np.array([1, 2, 3], dtype=complex)
    array([ 1.+0.j,  2.+0.j,  3.+0.j])

    Data-type consisting of more than one element:

    >>> x = np.array([(1,2),(3,4)],dtype=[('a','<i4'),('b','<i4')])
    >>> x['a']
    array([1, 3])

    Creating an array from sub-classes:

    >>> np.array(np.mat('1 2; 3 4'))
    array([[1, 2],
           [3, 4]])

    >>> np.array(np.mat('1 2; 3 4'), subok=True)
    matrix([[1, 2],
            [3, 4]])

Note that some sections are fully dedented (e.g. See Also) whereas others are partially dedented (Parameters) and others have only the headings dedented (Notes, Examples). Maybe there's a more general fix than the change to _parse_param_list to get the dedenting correct for the entire docstring when the first line doesn't match the indenting scheme of the rest of the docstring?

@Carreau
Copy link
Contributor Author

Carreau commented Jul 20, 2020

to get the dedenting correct for the entire docstring when the first line doesn't match the indenting scheme of the rest of the docstring?

That was the first fix I tried, but this make validate fails as well as other things which are attempting to tell you when the docstring is not valid or have extra spaces in place it does not like.

Personally I end up always calling it as so Numpydoc(dedent_but_first(docstring)), to work around some other inconsistencies.

@rossbar
Copy link
Contributor

rossbar commented Jul 20, 2020

That was the first fix I tried, but this make validate fails as well as other things which are attempting to tell you when the docstring is not valid or have extra spaces in place it does not like.

Yes, I noticed this too when I was messing around with alternatives (see #287, which will fail one validation test). It seems to me much more straightforward to catch issues due to dedenting errors as far upstream as possible, rather than modifying the downstream functions to handle the corner cases of less-robust docstring cleaning.

However, if there are strong backwards compatibility concerns with the validation module, then the approach in #287 is a non-starter.

@Carreau
Copy link
Contributor Author

Carreau commented Jul 20, 2020

Yeah, I went with not validating, and just re-emitting the docstring from the Numpydoc _parsed_data data structure as the __repr__ does also some weird things and ended up with a docstring reformatter.

@rossbar
Copy link
Contributor

rossbar commented Jul 20, 2020

Hmm, on the one hand the validation should definitely catch indentation errors, but it would be nice if the docstring parsing were more resilient to indentation problems.

@Carreau
Copy link
Contributor Author

Carreau commented Jul 20, 2020

Yeah, I think the two logics are intertwined too much, and the "parsing"/"guessing"/"validating" should be different steps.

There are many docstrings around that have invalid section names, and right now can't be parsed as Numpydoc Errors immediately, though that might be for a longer term project to clean that up.

@rossbar
Copy link
Contributor

rossbar commented Jul 21, 2020

There are many docstrings around that have invalid section names, and right now can't be parsed as Numpydoc Errors immediately, though that might be for a longer term project to clean that up.

... including in numpy itself! (numpy/numpy#16791)

I agree it's probably a larger project, so let's not worry about it in this PR. I think the changes here look good. The only additions I'd make is to apply a similar pytest.mark.parametrize to the test_returns and test_other_parameters tests, since they also are affected by the changes to _parse_param_list. I also slightly prefer the parametrization I used in #287 as it's a little shorter, but YMMV.

@jnothman
Copy link
Member

many project have paremeters description list with : afer the name, even if no type is present. If there is no space after the : the parameter name includes the : which is most likely wrong.

The problem being that this matched the relatively obscure syntax for ReST definition lists...

@Carreau
Copy link
Contributor Author

Carreau commented Jul 24, 2020

ok, I made doc a parameterized fixture so every test using it are now parametrized on '' (flush) vs '\n ' (newline indented)

@Carreau
Copy link
Contributor Author

Carreau commented Aug 7, 2020

Anything I can do to push this forward ?

Copy link
Collaborator

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than two nitpicks, LGTM

numpydoc/tests/test_docscrape.py Show resolved Hide resolved
numpydoc/tests/test_docscrape.py Outdated Show resolved Hide resolved
@Carreau
Copy link
Contributor Author

Carreau commented Aug 10, 2020

Thanks !

@larsoner larsoner merged commit 676a8d4 into numpy:master Aug 10, 2020
@larsoner
Copy link
Collaborator

Thanks @Carreau !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong number of Parameter for numpy array.
6 participants