Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fixme's in docstrings #9744

Merged
merged 15 commits into from
Sep 30, 2024
Merged

Conversation

badsketch
Copy link
Contributor

Type of Changes

Type
πŸ› Bug fix
βœ“ ✨ New feature
πŸ”¨ Refactoring
πŸ“œ Docs

Description

Closes #9255

Previous PR discussion here: #9281

  • now an enhancement of existing fixme rather than a new message
  • check-fixme-in-docstring is the setting that enables it and defaults to false
  • also suggestions to improve the existing description: issue9255 - Detect FIXME words in docstringΒ #9281 (comment). I kind of like how the message is the TODO itself. As for the "Used when a warning note..." is that for a tooltip? I believe almost all checker messages have something like this, right?

Appreciate any feedback!

elif self.linter.config.check_fixme_in_docstring and self._is_docstring_comment(token_info):
docstring_lines = token_info.string.split("\n")
for line_no, line in enumerate(docstring_lines):
comment_text = line.removeprefix('"""').lstrip().removesuffix('"""') # trim '""""' and whitespace
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeprefix() and removesuffix() are new in python 3.9. Dumb question, but how do I tell what version this project supports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After seeing 3.8 test suites fail, I'm guessing I need to make this compatible πŸ˜…

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? Can't we just put the full docstring in the message for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a case like

"""
TODO msg1
TODO msg2
"""

this PR will create two TODO lint messages. If we put the full docstring for both, it might be confusing or overly wordy, no?

@badsketch
Copy link
Contributor Author

When running pylint -h, I get

Miscellaneous:
  BaseChecker for encoding issues.

  --notes <comma separated values>
                        List of note tags to take in consideration, separated by a comma. (default:
                        ('FIXME', 'XXX', 'TODO'))
  --notes-rgx <regexp>  Regular expression of note tags to take in consideration. (default: )
  --check-fixme-in-docstring <y or n>
                        Whether or not to search for fixme's in docstrings. (default: False)

Thoughts on updating the docstring to be "Checker for encoding issues and fixme notes"? instead of "BaseChecker for encoding issues."

This comment has been minimized.

@DanielNoord
Copy link
Collaborator

When running pylint -h, I get

Miscellaneous:
  BaseChecker for encoding issues.

  --notes <comma separated values>
                        List of note tags to take in consideration, separated by a comma. (default:
                        ('FIXME', 'XXX', 'TODO'))
  --notes-rgx <regexp>  Regular expression of note tags to take in consideration. (default: )
  --check-fixme-in-docstring <y or n>
                        Whether or not to search for fixme's in docstrings. (default: False)

Thoughts on updating the docstring to be "Checker for encoding issues and fixme notes"? instead of "BaseChecker for encoding issues."

Fine with me!

def _is_docstring_comment(self, token_info: tokenize.TokenInfo) -> bool:
return (
token_info.type == tokenize.STRING
and token_info.line.lstrip().startswith('"""')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a docstring can also start with '''. I'm wondering if this should live in this tokeniser checker as I think it is actually quite hard to recognise docstrings on tokens alone.

Have you considered doing it as a checker for nodes.Module, nodes.ClassDef, etc? Then you can just check if the regex is in the .doc attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a docstring can also start with '''

totally forgot about this, thanks!

Have you considered doing it as a checker for nodes.Module, nodes.ClassDef, etc? Then you can just check if the regex is in the .doc attribute.

I was considering the possibility docstrings could appear outside of nodes.Module and nodes.classDef, so I figured it's best to use the existing token stream to watch for all occurrences. However, you could argue it's not good python practice (?) in the first place to have docstrings outside modules/classes/methods. Agreed the PR doesn't use the safest heuristic to determine if it's a docstring.

So it looks like we could do

  1. update _is_docstring_comment() to also check startswith("'''")
  2. refactor to use nodes.Module, nodes.ClassDef, nodes.FunctionDef and we tighten the scope of docstring fixmes
  3. update tokenizer with a new docstring token similar to how we have a token.COMMENT type for a comment fixme. (Haven't looked too deep into this, will probably be higher effort)

Totally down to change it to 2, but would users claim false negatives when they try to create docstring fixme's outside of module/classes/function defs? Perhaps it would help if there were a lint message that recommends against docstrings outside of those places. Do we have something like that already?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pierre-Sassoulas Opinion? I think trying option 1 for now might be fine, we can always refactor to 2 later on. I just thought I would raise the question to see if it was consciously ignored.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the decision should be taken consciousely. I would have thougt that the node visitor implementation would be cleaner/terser, but maybe using the tokenizer is faster ? I expected less changes to be able to do docstrings' fixme check as we already have something working for comments ? Did we use the tokenizer for comments? I did not look very deep into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we already use tokenizer for comments. If I understand correctly, tokenizer is for instances where there's not a defined node for the check (eg, a comment can appear "anywhere" in the code, so we check for all tokens for that appearance). I tried to follow that logic with docstrings. Hence I piggy back on the tokenizer to examine for occurrences of """ and '''. If we decide we only wish to support docstrings in classes, functions, and methods, then I could use a node visitor on those 3 node types.

It may be easiest at this point to go with Option 1 at this point since it'd be straightforward to make the change in the PR. And as Daniel mentioned, we could refactor to 2 in the future. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings are not supposed to be everywhere in a module, that's useless statements otherwise, so a node approach would work. But if comments requires tokenizer, for consistency of aproachs let's go with 1)

Copy link

codecov bot commented Jun 30, 2024

Codecov Report

All modified and coverable lines are covered by tests βœ…

Project coverage is 95.80%. Comparing base (c0ecd70) to head (51a2a14).
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #9744   +/-   ##
=======================================
  Coverage   95.80%   95.80%           
=======================================
  Files         174      174           
  Lines       18934    18946   +12     
=======================================
+ Hits        18140    18152   +12     
  Misses        794      794           
Files with missing lines Coverage Ξ”
pylint/checkers/misc.py 90.41% <100.00%> (+1.88%) ⬆️

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome you got this to work!

Comment on lines 121 to 122
@set_config(check_fixme_in_docstring=True)
def test_docstring_with_message(self) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pierre-Sassoulas Do you want these unittests? I would be fine with having only the functional tests and remove these. They feel like duplicates

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, I use functional only almost all the time (terser / clearer once you know about functional).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, that makes sense! Reverted changes to this file.

self._comment_fixme_pattern = re.compile(comment_regex, re.I)
if self.linter.config.check_fixme_in_docstring:
docstring_regex = rf"((\"\"\")|(\'\'\'))\s*({notes})(?=(:|\s|\Z))"
self._docstring_fixme_pattern = re.compile(docstring_regex, re.I)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to always set these two pattern attributes so that we don't get an AttributeError in unexpected ways. In the old version we also did this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I've removed the conditionals here and just let process_tokens() handle the logic of what to parse if check docstring for fixme setting is enabled.

Comment on lines 171 to 178
if line.startswith(('"""', "'''")):
line = line[3:]
line = line.lstrip()
if line.endswith(('"""', "'''")):
line = line[:-3]
if self._docstring_fixme_pattern.search(
'"""' + line.lower()
) or self._docstring_fixme_pattern.search("'''" + line.lower()):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you have fixed the pattern, is this really necessary? I would prefer a pattern where we don't have to do a lot of complicated changes to the lines itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it was getting pretty messy. I've decided to rework the logic to use regex capture groups for both docstrings and comments (since in that case we were also sort of changing the line then stitching it together to examine with regex).

@badsketch
Copy link
Contributor Author

badsketch commented Jul 8, 2024

Apologies for the glacial pace of this PR 😬, really appreciate all the feedback! Some updates:

  • previously we were taking the token line and using string manipulation to get the fixme message, then rebuilding it so the regex could find it. I've since changed the logic to use just regex with capture groups instead. I've made this change for both docstrings and comments
  • To do this, I also had to split the docstring regex pattern between single and multiline. I tried it with just a single regex, but it got really ugly and I think the logic is much cleaner/intuitive if there's two. There's 3 regex patterns total now.
  • correct me if I'm wrong, but we were previously using re.search(). It seems if we're only searching the beginning of a string, then we can use re.match() which has better performance.

Let me know your thoughts!

This comment has been minimized.

@badsketch
Copy link
Contributor Author

Those primer tests are awesome! They thankfully caught something I missed when refactoring the comment regex pattern. I was no longer supporting cases like:

#     # TODO: something something

which is kind of curious as I wonder if that should've been supported in the first place? Regardless, I've updated it so it's supported and added that as an additional functional test.

This comment has been minimized.

This comment has been minimized.

@badsketch
Copy link
Contributor Author

In my last few messages, I mentioned updating how the message was extracted in comment-based fixme's so we wouldn't have to use string manipulation(198fd93, 1ce441c). I've decided to revert that change because it was causing a lot of primer test failures. I think this is due to how comment-based fixme's might be a little inconsistent in its current state:

#   TODO: msg1                                        
#   # TODO: msg2
# something # TODO: msg3
# something TODO: msg4

results in:

test.py:1:1: W0511: TODO: msg1 (fixme)
test.py:2:1: W0511: # TODO: msg2 (fixme)
test.py:3:1: W0511: something # TODO: msg3 (fixme)

I wasn't able to replicate what we're doing today using regex capture groups so in favor of not causing any disruptions, I decided to go back to the current method.

If we're open to standardizing some of the decided behavior maybe in the future and allowing changes to the primer tests, I'm also down to discuss!

@DanielNoord
Copy link
Collaborator

@badsketch With the danger of introducing more back and forth: would you be willing to create a proposal for this standardization and apply it to this PR? We can then see the test output and determine whether we are okay with that. That makes it a lot easier to discuss such standardization (with having some examples).

@badsketch
Copy link
Contributor Author

@DanielNoord Sure thing, is there a formal way of making these proposals for Pylint features and do I submit it somewhere? Or do you mean try to consider all the usage out there and come back with a more detailed comment?

@DanielNoord
Copy link
Collaborator

No, I meant: just write the code as you would want and make the tests pass. If the test output is acceptable I would be okay with accepting your proposed code.

I don't think we need a full proposal, just the code as you would propose to write it and then being able to see what changes that would give to the test output.

This comment has been minimized.

@badsketch
Copy link
Contributor Author

@DanielNoord
Gotcha, I've updated the functional tests for both comment and docstring fixme's to serve as my final proposal. The latest primer test results also align with what I've proposed. They boil down to:

  • comment fixme's must start with a # then any number of spaces (and only spaces), the fixme keyword, followed by a message
# TODO valid
#              TODO valid
# invalid TODO msg
  • single line docstring fixme's must start with 3 single/double quotes, then any number of spaces (and only spaces), the fixme keyword, followed by a message
'''TODO valid'''
""" TODO valid """
''' invalid TODO msg """
  • multi-line docstring fixme's are within a docstring block, but must be at the beginning of the line. Any number of indentations before it are okay.
'''
TODO valid
invalid TODO msg
'''
def foo():
      """
      TODO valid
      """

The majority of changed primer tests are for comment fixme's that no longer emit. They appear to be fixme's that are part of a larger section of code that was commented out, suggesting the fixme is no longer valid. I think it makes sense for those to no longer be emitted.

DanielNoord
DanielNoord previously approved these changes Jul 15, 2024
Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Pierre-Sassoulas Do you want to give this a review as well?

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this @badsketch !

I'm wondering why the regex contains rf"((\"\"\")|(\'\'\')), but we also do (token_info.line.lstrip().startswith(('"""', "'''"))) somewhere else ? My intuition would be that we could just search for the user defined "TODO" regex in comments and docstrings ? Is this because we want the option to not raise for docstring (for semver/compat with the old behavior) so we need to distinguish the two internally ? I didn't make any benchmarks for this but I think this kind of regex executed on each token can have huge performance impact.

doc/whatsnew/fragments/9255.feature Outdated Show resolved Hide resolved
@jacobtylerwalls
Copy link
Member

Could be false negatives, will look into this.

The primer tests still suffer from a fluctuating baseline, so you'll see unrelated messages moving in and out from time to time. We haven't yet identified the source of the indeterminacy. Surely it's a lack of a fixed ordering somewhere. Don't worry about it.

@jacobtylerwalls jacobtylerwalls added this to the 3.3.0 milestone Jul 28, 2024
@Pierre-Sassoulas Pierre-Sassoulas modified the milestones: 3.3.0, 3.4.0 Sep 20, 2024
@jacobtylerwalls jacobtylerwalls modified the milestones: 3.4.0, 4.0.0 Sep 25, 2024
@jacobtylerwalls
Copy link
Member

FYI @badsketch, thanks to some ace detective work by @akamat10, the primer is more stable now, so if you merge main you should see a more informative primer diff.

Copy link
Contributor

πŸ€– Effect of this PR on checked open source code: πŸ€–

Effect on astroid:
The following messages are no longer emitted:

  1. fixme:
    # TODO: This should return an Uninferable as this would raise
    https://github.com/pylint-dev/astroid/blob/62c5badc838419090ee319acfcef3b651ffb1e94/astroid/brain/brain_dataclasses.py#L186

Effect on black:
The following messages are no longer emitted:

  1. fixme:
    #assert ilabel not in first # XXX failed on <> ... !=
    https://github.com/psf/black/blob/f1a2f92bba7f1b8e4407e89d71a18fd1d6c61a91/src/blib2to3/pgen2/pgen.py#L69

Effect on music21:
The following messages are no longer emitted:

  1. fixme:
    # TODO: attach \noBeam to note if it is the last note
    https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/lily/translate.py#L1174
  2. fixme:
    # TODO: this file does not import correctly due to first/second
    https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/test/test_repeat.py#L475
  3. fixme:
    # TODO: Turn back on when a smaller work is found...
    https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/musedata/translate.py#L584
  4. fixme:
    # TODO: column 17 self.src[16] defines the graphic note type
    https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/musedata/__init__.py#L339
  5. fixme:
    # TODO: Something with 4.2 Repetitions; not in hum2xml
    https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/humdrum/spineParser.py#L2602
  6. fixme:
    # TODO: Find out what timeBase means; not in hum2xml
    https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/humdrum/spineParser.py#L2606
  7. fixme:
    # TODO: make staff numbers relevant; not in hum2xml
    https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/humdrum/spineParser.py#L2610
  8. fixme:
    # TODO: write
    https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/stream/base.py#L13037
  9. fixme:
    # TODO: write
    https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/stream/base.py#L13041
  10. fixme:
    # TODO: use the linter, reference DOESN'T have to be passed in
    https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/alpha/analysis/hasher.py#L485

Effect on pytest:
The following messages are no longer emitted:

  1. fixme:
    path.strpath # XXX svn?
    https://github.com/pytest-dev/pytest/blob/6486c3f3a858a0c8043f5c3f7c24297b82a0abe4/src/_pytest/_py/path.py#L193
  2. fixme:
    # XXX
    https://github.com/pytest-dev/pytest/blob/6486c3f3a858a0c8043f5c3f7c24297b82a0abe4/src/_pytest/_code/code.py#L913

Effect on pandas:
The following messages are no longer emitted:

  1. fixme:
    e.g. Sparse[bool, False] # TODO: no test cases get here
    https://github.com/pandas-dev/pandas/blob/d538a1cd1ad5d1e506c2dc36144e4cac5534858a/pandas/core/algorithms.py#L158

Effect on sentry:
The following messages are no longer emitted:

  1. fixme:
    type: ignore[assignment] # XXX: intentional resetting pk
    https://github.com/getsentry/sentry/blob/6564a46b4e5cf2e447cd2bfb725fbc29c4c583c2/src/sentry/reprocessing2.py#L550
  2. fixme:
    type: ignore[assignment] # XXX: clobbers Serializer.fields
    https://github.com/getsentry/sentry/blob/6564a46b4e5cf2e447cd2bfb725fbc29c4c583c2/src/sentry/discover/endpoints/serializers.py#L34
  3. fixme:
    type: ignore[assignment] # XXX: clobbers Serializer.fields
    https://github.com/getsentry/sentry/blob/6564a46b4e5cf2e447cd2bfb725fbc29c4c583c2/src/sentry/discover/endpoints/serializers.py#L191
  4. fixme:
    type: ignore[assignment] # TODO: make BitField a mypy plugin
    https://github.com/getsentry/sentry/blob/6564a46b4e5cf2e447cd2bfb725fbc29c4c583c2/src/sentry/organizations/services/organization/impl.py#L502
  5. fixme:
    type: ignore[misc] # TODO: make BitField a mypy plugin
    https://github.com/getsentry/sentry/blob/6564a46b4e5cf2e447cd2bfb725fbc29c4c583c2/src/sentry/organizations/services/organization/impl.py#L545
  6. fixme:
    type: ignore[assignment] # XXX: clobbering Serializer.fields
    https://github.com/getsentry/sentry/blob/6564a46b4e5cf2e447cd2bfb725fbc29c4c583c2/src/sentry/api/serializers/rest_framework/dashboard.py#L146

This comment was generated for commit 51a2a14

@badsketch
Copy link
Contributor Author

@jacobtylerwalls That's fantastic, appreciate the update! I merged from main and pushed just now.

As a refresher, since it's been a while, the performance profiling results are in this comment: #9744 (comment)

Also, the latest primer diff above makes sense to me. Most of the instances are where a TODO is part of a section that has been commented out. It would make sense for those to no longer alert as a fixme. The other instances are where a TODO is behind another comment like "# some note # TODO: xyz", which I would also like to propose as an acceptable primer test output change for the sake of consistency. Let me know if there are any issues πŸ‘

@DanielNoord DanielNoord merged commit 1a96a5d into pylint-dev:main Sep 30, 2024
44 checks passed
@jacobtylerwalls
Copy link
Member

@badsketch Sorry I didn't catch this before merge, but would you add a short breaking changes news fragment to indicate that we no longer emit fixme when contained in a commented out block? Thanks!

@badsketch
Copy link
Contributor Author

@jacobtylerwalls yeah for sure. Since there's already a https://github.com/pylint-dev/pylint/blob/main/doc/whatsnew/fragments/9255.feature, do I rename that to 9255.breaking and modify it, or do I create a new fragment?

@jacobtylerwalls
Copy link
Member

I think I like having two separate fragments for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

W0511: Doesn't detect TODO in docstrings
4 participants