-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically dedent docstring constants by default #81283
Comments
I'm spawning this issue of as a separate feature from https://bugs.python.org/issue36906 (adding string dedent method and an optimization to do it at compile timer on constants). It'd be great if docstrings were given a similar treatment. Right now we carry the whitespace burden within the str constants for __doc__ stored in all code objects in the process. This adds up. This is not _quite_ the same as calling textwrap.dedent() or the upcoming str.dedent method on them at compile time. We need to special case the first line of docstrings. inspect.getdoc() is our library function that does this for us today. Chance of breaking things with this change? not impossible, but extremely minor. Something using docstrings as data _and_ depending on leading indentation whitespace preservation would not like this. I am not aware of anything that would ever do that. |
Hi, I'm working on a PR. It should be ready in a couple of days. It's more involved than what I thought as to avoid importing inspect during compilation I will probably need to port cleandoc() in C. |
How about do
In such docstring, dedent can not strip indent well. There is existing attempt (in Japanese): |
This is the function I inlined and as far as I can tell, my approach as been similar to the one you linked. I'm still need to fix some issues as doctest was expecting to find the string before dedenting though. |
cleandoc is not idempotent. If we cleandoc on compile time, pydoc and inspect.getdoc() shouldn't cleandoc(doc). And if user write to import inspect
s = """
first line
second line
third line
"""
while True:
print('---')
print(s)
t = inspect.cleandoc(s)
if t == s:
break
s = t
print('---') output:
|
In the PR, compile time cleandoc doesn't remove leading and trailing newlines. code: import inspect
def foo():
"""
first line
second
third
"""
print('---')
print(foo.__doc__)
print('---')
print(inspect.getdoc(foo))
print('---') current behavior:
c-cleandoc (#106066)
By this way, I think the PR is minimize incompatibility. |
Co-authored-by: Éric <[email protected]>
Since Python 3.13.0a1, docstrings are automatically dedented. See python/cpython#81283 and https://docs.python.org/3.13/whatsnew/3.13.html#other-language-changes As a result, using a docstring with leading space as a test case breaks the test assumption. The initial commit which introduced this test a decade ago (6c0c791) does not specify why testing the spaces is important.
Tests often do that. So far I saw:
(toolz) And:
(oauthlib) |
Since Python 3.13.0a1, docstrings are automatically dedented. See python/cpython#81283 and https://docs.python.org/3.13/whatsnew/3.13.html#other-language-changes As a result, using a docstring with leading space as a test case breaks the test assumption. The initial commit which introduced this test a decade ago (6c0c791) does not specify why testing the spaces is important.
(zope-interface) |
Even scipy:
|
In Python 3.13, compiler strips indents from docstrings. See python/cpython#81283 Fixes: zopefoundation#279
In Python 3.13, compiler strips indents from docstrings. See python/cpython#81283 Fixes: zopefoundation#279
In Python 3.13, compiler strips indents from docstrings. See python/cpython#81283 Fixes: #279
Yep, this breaks IPython as well, as we have test testing that docstrings actually have leading spaces. Edit: I'll see if we can workaround this as I understand why one might want to do that. |
Do you think we should revert this change in 3.13? |
If there is a plan to include it in 3.14 anyway, then probably not. This is a breaking change but I have no idea how to properly have a deprecation period. |
I think this is still early enough, but the more time goes on, the more Python compile and ast lose informations that is useful for tooling. And it often would be nice to have a 2 passes tokenizer/compiler with maybe an option to not run the cleanup and normalisation passes ? |
It is ideal if the impacted things can improve their tests to not rely on leading spaces in the middle of docstrings. IMNSHO, it is still too early in 3.13 to decide if we should roll this back. So far I haven't seen any compelling examples that are not just over specific tests asserting things to be "as previously implemented" rather than code relying specifically on code indentation style/level based leading spaces being present in docstrings. The win for the world by reducing space consumed by docstrings by default feels rather large. So lets try to see if we can keep this. regarding future tokenizer & compiler do less or more options... other things have a desire for some of those as well, but so far it has been hard to get people interested in providing that kind of thing within CPython and the standard library as CPython won't use such things and offering such options is by its nature both slower and painful to maintain. It feels like third party Python code analysis tooling may be a better way to get that in a less-disruptive manner. |
This is also a compiler change which means that the tokenize and AST modules won't be affected by the optimisation. Tools can still analyse source code normally before this optimisation is applied. |
agreed. it'd be hard to do that in a meaningful manner. There are plausible designs to retain strict indentation included compatibility while reducing memory consumption & pyc size. They'd add implementation complexity and could cpu usage as a tradeoff. Example: Storing docstrings compressed and transparently decompressing them upon |
Does this affect Sphinx's API generation from docstrings? Has anyone checked? It appears to me that it could break syntaxes like doctests or example code blocks that should be indented in RST which is what often ends up in docstrings. |
This PR doesn't remove all indent. This PR remoes only common indent. |
Ah, great! Thanks for clarifying. I wasn't sure by looking at that PR. |
Was it intended that with this change tabs are now automatically converted into 8 spaces in For example: def f():
"""
hello
world
"""
print(repr(f.__doc__)) Python 3.12: Python 3.13: Previously I could call |
Yes it is intended. I intended almost ident to |
- Dedent docstrings in Python 3.13+ - Fix nipy#1311 - Ref: python/cpython#81283
Python compiler newly removes indent from docstrings python/cpython#81283
Python compiler newly removes indent from docstrings python/cpython#81283
Python compiler newly removes indent from docstrings python/cpython#81283
I think the test failure may be caused by recent Python dedenting docstrings (see python/cpython#81283).
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: