Support fixme's in docstrings #9744

badsketch · 2024-06-22T07:49:54Z

Type of Changes

	Type
	🐛 Bug fix
✓	✨ New feature
	🔨 Refactoring
	📜 Docs

Description

Closes #9255

Previous PR discussion here: #9281

now an enhancement of existing fixme rather than a new message
check-fixme-in-docstring is the setting that enables it and defaults to false
also suggestions to improve the existing description: issue9255 - Detect FIXME words in docstring #9281 (comment). I kind of like how the message is the TODO itself. As for the "Used when a warning note..." is that for a tooltip? I believe almost all checker messages have something like this, right?

Appreciate any feedback!

badsketch · 2024-06-22T07:51:57Z

pylint/checkers/misc.py

+            elif self.linter.config.check_fixme_in_docstring and self._is_docstring_comment(token_info):
+                docstring_lines = token_info.string.split("\n")
+                for line_no, line in enumerate(docstring_lines):
+                    comment_text = line.removeprefix('"""').lstrip().removesuffix('"""')  # trim '""""' and whitespace


removeprefix() and removesuffix() are new in python 3.9. Dumb question, but how do I tell what version this project supports?

After seeing 3.8 test suites fail, I'm guessing I need to make this compatible 😅

Do we really need this? Can't we just put the full docstring in the message for now?

For a case like

""" TODO msg1 TODO msg2 """

this PR will create two TODO lint messages. If we put the full docstring for both, it might be confusing or overly wordy, no?

badsketch · 2024-06-22T07:55:30Z

When running pylint -h, I get

Miscellaneous:
  BaseChecker for encoding issues.

  --notes <comma separated values>
                        List of note tags to take in consideration, separated by a comma. (default:
                        ('FIXME', 'XXX', 'TODO'))
  --notes-rgx <regexp>  Regular expression of note tags to take in consideration. (default: )
  --check-fixme-in-docstring <y or n>
                        Whether or not to search for fixme's in docstrings. (default: False)

Thoughts on updating the docstring to be "Checker for encoding issues and fixme notes"? instead of "BaseChecker for encoding issues."

DanielNoord · 2024-06-23T10:43:38Z

When running pylint -h, I get

Miscellaneous:
  BaseChecker for encoding issues.

  --notes <comma separated values>
                        List of note tags to take in consideration, separated by a comma. (default:
                        ('FIXME', 'XXX', 'TODO'))
  --notes-rgx <regexp>  Regular expression of note tags to take in consideration. (default: )
  --check-fixme-in-docstring <y or n>
                        Whether or not to search for fixme's in docstrings. (default: False)

Thoughts on updating the docstring to be "Checker for encoding issues and fixme notes"? instead of "BaseChecker for encoding issues."

Fine with me!

DanielNoord · 2024-06-23T10:45:09Z

pylint/checkers/misc.py

+    def _is_docstring_comment(self, token_info: tokenize.TokenInfo) -> bool:
+        return (
+            token_info.type == tokenize.STRING
+            and token_info.line.lstrip().startswith('"""')


Note that a docstring can also start with '''. I'm wondering if this should live in this tokeniser checker as I think it is actually quite hard to recognise docstrings on tokens alone.

Have you considered doing it as a checker for nodes.Module, nodes.ClassDef, etc? Then you can just check if the regex is in the .doc attribute.

Note that a docstring can also start with '''

totally forgot about this, thanks!

Have you considered doing it as a checker for nodes.Module, nodes.ClassDef, etc? Then you can just check if the regex is in the .doc attribute.

I was considering the possibility docstrings could appear outside of nodes.Module and nodes.classDef, so I figured it's best to use the existing token stream to watch for all occurrences. However, you could argue it's not good python practice (?) in the first place to have docstrings outside modules/classes/methods. Agreed the PR doesn't use the safest heuristic to determine if it's a docstring.

So it looks like we could do

update _is_docstring_comment() to also check startswith("'''")

refactor to use nodes.Module, nodes.ClassDef, nodes.FunctionDef and we tighten the scope of docstring fixmes

update tokenizer with a new docstring token similar to how we have a token.COMMENT type for a comment fixme. (Haven't looked too deep into this, will probably be higher effort)

Totally down to change it to 2, but would users claim false negatives when they try to create docstring fixme's outside of module/classes/function defs? Perhaps it would help if there were a lint message that recommends against docstrings outside of those places. Do we have something like that already?

@Pierre-Sassoulas Opinion? I think trying option 1 for now might be fine, we can always refactor to 2 later on. I just thought I would raise the question to see if it was consciously ignored.

I think the decision should be taken consciousely. I would have thougt that the node visitor implementation would be cleaner/terser, but maybe using the tokenizer is faster ? I expected less changes to be able to do docstrings' fixme check as we already have something working for comments ? Did we use the tokenizer for comments? I did not look very deep into this.

Yeah we already use tokenizer for comments. If I understand correctly, tokenizer is for instances where there's not a defined node for the check (eg, a comment can appear "anywhere" in the code, so we check for all tokens for that appearance). I tried to follow that logic with docstrings. Hence I piggy back on the tokenizer to examine for occurrences of """ and '''. If we decide we only wish to support docstrings in classes, functions, and methods, then I could use a node visitor on those 3 node types.

It may be easiest at this point to go with Option 1 at this point since it'd be straightforward to make the change in the PR. And as Daniel mentioned, we could refactor to 2 in the future. Thoughts?

Docstrings are not supposed to be everywhere in a module, that's useless statements otherwise, so a node approach would work. But if comments requires tokenizer, for consistency of aproachs let's go with 1)

codecov · 2024-06-30T20:13:36Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.80%. Comparing base (c0ecd70) to head (51a2a14).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #9744   +/-   ##
=======================================
  Coverage   95.80%   95.80%           
=======================================
  Files         174      174           
  Lines       18934    18946   +12     
=======================================
+ Hits        18140    18152   +12     
  Misses        794      794

Files with missing lines	Coverage Δ
pylint/checkers/misc.py	`90.41% <100.00%> (+1.88%)`	⬆️

DanielNoord

Awesome you got this to work!

DanielNoord · 2024-07-01T06:55:19Z

tests/checkers/unittest_misc.py

+    @set_config(check_fixme_in_docstring=True)
+    def test_docstring_with_message(self) -> None:


@Pierre-Sassoulas Do you want these unittests? I would be fine with having only the functional tests and remove these. They feel like duplicates

I agree with you, I use functional only almost all the time (terser / clearer once you know about functional).

Gotcha, that makes sense! Reverted changes to this file.

DanielNoord · 2024-07-01T06:56:59Z

pylint/checkers/misc.py

+            self._comment_fixme_pattern = re.compile(comment_regex, re.I)
+            if self.linter.config.check_fixme_in_docstring:
+                docstring_regex = rf"((\"\"\")|(\'\'\'))\s*({notes})(?=(:|\s|\Z))"
+                self._docstring_fixme_pattern = re.compile(docstring_regex, re.I)


I think we need to always set these two pattern attributes so that we don't get an AttributeError in unexpected ways. In the old version we also did this.

Gotcha, I've removed the conditionals here and just let process_tokens() handle the logic of what to parse if check docstring for fixme setting is enabled.

DanielNoord · 2024-07-01T06:58:01Z

pylint/checkers/misc.py

+                    if line.startswith(('"""', "'''")):
+                        line = line[3:]
+                    line = line.lstrip()
+                    if line.endswith(('"""', "'''")):
+                        line = line[:-3]
+                    if self._docstring_fixme_pattern.search(
+                        '"""' + line.lower()
+                    ) or self._docstring_fixme_pattern.search("'''" + line.lower()):


Now that you have fixed the pattern, is this really necessary? I would prefer a pattern where we don't have to do a lot of complicated changes to the lines itself.

True, it was getting pretty messy. I've decided to rework the logic to use regex capture groups for both docstrings and comments (since in that case we were also sort of changing the line then stitching it together to examine with regex).

badsketch · 2024-07-08T02:53:19Z

Apologies for the glacial pace of this PR 😬, really appreciate all the feedback! Some updates:

previously we were taking the token line and using string manipulation to get the fixme message, then rebuilding it so the regex could find it. I've since changed the logic to use just regex with capture groups instead. I've made this change for both docstrings and comments
To do this, I also had to split the docstring regex pattern between single and multiline. I tried it with just a single regex, but it got really ugly and I think the logic is much cleaner/intuitive if there's two. There's 3 regex patterns total now.
correct me if I'm wrong, but we were previously using re.search(). It seems if we're only searching the beginning of a string, then we can use re.match() which has better performance.

Let me know your thoughts!

badsketch · 2024-07-08T03:42:11Z

Those primer tests are awesome! They thankfully caught something I missed when refactoring the comment regex pattern. I was no longer supporting cases like:

#     # TODO: something something

which is kind of curious as I wonder if that should've been supported in the first place? Regardless, I've updated it so it's supported and added that as an additional functional test.

badsketch · 2024-07-08T05:00:09Z

In my last few messages, I mentioned updating how the message was extracted in comment-based fixme's so we wouldn't have to use string manipulation(198fd93, 1ce441c). I've decided to revert that change because it was causing a lot of primer test failures. I think this is due to how comment-based fixme's might be a little inconsistent in its current state:

#   TODO: msg1                                        
#   # TODO: msg2
# something # TODO: msg3
# something TODO: msg4

results in:

test.py:1:1: W0511: TODO: msg1 (fixme)
test.py:2:1: W0511: # TODO: msg2 (fixme)
test.py:3:1: W0511: something # TODO: msg3 (fixme)

I wasn't able to replicate what we're doing today using regex capture groups so in favor of not causing any disruptions, I decided to go back to the current method.

If we're open to standardizing some of the decided behavior maybe in the future and allowing changes to the primer tests, I'm also down to discuss!

DanielNoord · 2024-07-08T07:32:35Z

@badsketch With the danger of introducing more back and forth: would you be willing to create a proposal for this standardization and apply it to this PR? We can then see the test output and determine whether we are okay with that. That makes it a lot easier to discuss such standardization (with having some examples).

badsketch · 2024-07-08T12:52:34Z

@DanielNoord Sure thing, is there a formal way of making these proposals for Pylint features and do I submit it somewhere? Or do you mean try to consider all the usage out there and come back with a more detailed comment?

DanielNoord · 2024-07-08T13:51:26Z

No, I meant: just write the code as you would want and make the tests pass. If the test output is acceptable I would be okay with accepting your proposed code.

I don't think we need a full proposal, just the code as you would propose to write it and then being able to see what changes that would give to the test output.

badsketch · 2024-07-14T04:50:28Z

@DanielNoord
Gotcha, I've updated the functional tests for both comment and docstring fixme's to serve as my final proposal. The latest primer test results also align with what I've proposed. They boil down to:

comment fixme's must start with a # then any number of spaces (and only spaces), the fixme keyword, followed by a message

# TODO valid
#              TODO valid
# invalid TODO msg

single line docstring fixme's must start with 3 single/double quotes, then any number of spaces (and only spaces), the fixme keyword, followed by a message

'''TODO valid'''
""" TODO valid """
''' invalid TODO msg """

multi-line docstring fixme's are within a docstring block, but must be at the beginning of the line. Any number of indentations before it are okay.

'''
TODO valid
invalid TODO msg
'''
def foo():
      """
      TODO valid
      """

The majority of changed primer tests are for comment fixme's that no longer emit. They appear to be fixme's that are part of a larger section of code that was commented out, suggesting the fixme is no longer valid. I think it makes sense for those to no longer be emitted.

DanielNoord

LGTM!

@Pierre-Sassoulas Do you want to give this a review as well?

Pierre-Sassoulas

Thank you for working on this @badsketch !

I'm wondering why the regex contains rf"((\"\"\")|(\'\'\')), but we also do (token_info.line.lstrip().startswith(('"""', "'''"))) somewhere else ? My intuition would be that we could just search for the user defined "TODO" regex in comments and docstrings ? Is this because we want the option to not raise for docstring (for semver/compat with the old behavior) so we need to distinguish the two internally ? I didn't make any benchmarks for this but I think this kind of regex executed on each token can have huge performance impact.

doc/whatsnew/fragments/9255.feature

…get the fixme message

jacobtylerwalls · 2024-07-28T15:17:38Z

Could be false negatives, will look into this.

The primer tests still suffer from a fluctuating baseline, so you'll see unrelated messages moving in and out from time to time. We haven't yet identified the source of the indeterminacy. Surely it's a lack of a fixed ordering somewhere. Don't worry about it.

jacobtylerwalls · 2024-09-29T14:12:22Z

FYI @badsketch, thanks to some ace detective work by @akamat10, the primer is more stable now, so if you merge main you should see a more informative primer diff.

github-actions · 2024-09-30T04:10:29Z

🤖 Effect of this PR on checked open source code: 🤖

Effect on astroid:
The following messages are no longer emitted:

fixme:
# TODO: This should return an Uninferable as this would raise
https://github.com/pylint-dev/astroid/blob/62c5badc838419090ee319acfcef3b651ffb1e94/astroid/brain/brain_dataclasses.py#L186

Effect on black:
The following messages are no longer emitted:

fixme:
#assert ilabel not in first # XXX failed on <> ... !=
https://github.com/psf/black/blob/f1a2f92bba7f1b8e4407e89d71a18fd1d6c61a91/src/blib2to3/pgen2/pgen.py#L69

Effect on music21:
The following messages are no longer emitted:

fixme:
# TODO: attach \noBeam to note if it is the last note
https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/lily/translate.py#L1174
fixme:
# TODO: this file does not import correctly due to first/second
https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/test/test_repeat.py#L475
fixme:
# TODO: Turn back on when a smaller work is found...
https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/musedata/translate.py#L584
fixme:
# TODO: column 17 self.src[16] defines the graphic note type
https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/musedata/__init__.py#L339
fixme:
# TODO: Something with 4.2 Repetitions; not in hum2xml
https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/humdrum/spineParser.py#L2602
fixme:
# TODO: Find out what timeBase means; not in hum2xml
https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/humdrum/spineParser.py#L2606
fixme:
# TODO: make staff numbers relevant; not in hum2xml
https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/humdrum/spineParser.py#L2610
fixme:
# TODO: write
https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/stream/base.py#L13037
fixme:
# TODO: write
https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/stream/base.py#L13041
fixme:
# TODO: use the linter, reference DOESN'T have to be passed in
https://github.com/cuthbertLab/music21/blob/e05fc53dfef7b2c9ac974c0cacb8b85e9c4d4605/music21/alpha/analysis/hasher.py#L485

Effect on pytest:
The following messages are no longer emitted:

fixme:
path.strpath # XXX svn?
https://github.com/pytest-dev/pytest/blob/6486c3f3a858a0c8043f5c3f7c24297b82a0abe4/src/_pytest/_py/path.py#L193
fixme:
# XXX
https://github.com/pytest-dev/pytest/blob/6486c3f3a858a0c8043f5c3f7c24297b82a0abe4/src/_pytest/_code/code.py#L913

Effect on pandas:
The following messages are no longer emitted:

fixme:
e.g. Sparse[bool, False] # TODO: no test cases get here
https://github.com/pandas-dev/pandas/blob/d538a1cd1ad5d1e506c2dc36144e4cac5534858a/pandas/core/algorithms.py#L158

Effect on sentry:
The following messages are no longer emitted:

fixme:
type: ignore[assignment] # XXX: intentional resetting pk
https://github.com/getsentry/sentry/blob/6564a46b4e5cf2e447cd2bfb725fbc29c4c583c2/src/sentry/reprocessing2.py#L550
fixme:
type: ignore[assignment] # XXX: clobbers Serializer.fields
https://github.com/getsentry/sentry/blob/6564a46b4e5cf2e447cd2bfb725fbc29c4c583c2/src/sentry/discover/endpoints/serializers.py#L34
fixme:
type: ignore[assignment] # XXX: clobbers Serializer.fields
https://github.com/getsentry/sentry/blob/6564a46b4e5cf2e447cd2bfb725fbc29c4c583c2/src/sentry/discover/endpoints/serializers.py#L191
fixme:
type: ignore[assignment] # TODO: make BitField a mypy plugin
https://github.com/getsentry/sentry/blob/6564a46b4e5cf2e447cd2bfb725fbc29c4c583c2/src/sentry/organizations/services/organization/impl.py#L502
fixme:
type: ignore[misc] # TODO: make BitField a mypy plugin
https://github.com/getsentry/sentry/blob/6564a46b4e5cf2e447cd2bfb725fbc29c4c583c2/src/sentry/organizations/services/organization/impl.py#L545
fixme:
type: ignore[assignment] # XXX: clobbering Serializer.fields
https://github.com/getsentry/sentry/blob/6564a46b4e5cf2e447cd2bfb725fbc29c4c583c2/src/sentry/api/serializers/rest_framework/dashboard.py#L146

This comment was generated for commit 51a2a14

badsketch · 2024-09-30T04:24:12Z

@jacobtylerwalls That's fantastic, appreciate the update! I merged from main and pushed just now.

As a refresher, since it's been a while, the performance profiling results are in this comment: #9744 (comment)

Also, the latest primer diff above makes sense to me. Most of the instances are where a TODO is part of a section that has been commented out. It would make sense for those to no longer alert as a fixme. The other instances are where a TODO is behind another comment like "# some note # TODO: xyz", which I would also like to propose as an acceptable primer test output change for the sake of consistency. Let me know if there are any issues 👍

jacobtylerwalls · 2024-09-30T14:37:57Z

@badsketch Sorry I didn't catch this before merge, but would you add a short breaking changes news fragment to indicate that we no longer emit fixme when contained in a commented out block? Thanks!

badsketch · 2024-10-01T02:30:01Z

@jacobtylerwalls yeah for sure. Since there's already a https://github.com/pylint-dev/pylint/blob/main/doc/whatsnew/fragments/9255.feature, do I rename that to 9255.breaking and modify it, or do I create a new fragment?

jacobtylerwalls · 2024-10-01T02:31:17Z

I think I like having two separate fragments for this.

badsketch commented Jun 22, 2024

View reviewed changes

badsketch force-pushed the feat/9255-fixme branch from d9b5b78 to 90df10b Compare June 22, 2024 08:01

This comment has been minimized.

Sign in to view

DanielNoord requested changes Jun 23, 2024

View reviewed changes

Pierre-Sassoulas added the Enhancement ✨ Improvement to a component label Jun 23, 2024

Pierre-Sassoulas mentioned this pull request Jun 23, 2024

issue9255 - Detect FIXME words in docstring #9281

Closed

badsketch force-pushed the feat/9255-fixme branch from 2cd6492 to 4b4ea5c Compare June 30, 2024 20:04

This comment has been minimized.

Sign in to view

badsketch force-pushed the feat/9255-fixme branch from c07134e to ec19e0a Compare June 30, 2024 21:03

This comment has been minimized.

Sign in to view

badsketch requested review from DanielNoord and Pierre-Sassoulas June 30, 2024 22:17

DanielNoord reviewed Jul 1, 2024

View reviewed changes

This comment has been minimized.

Sign in to view

DanielNoord previously approved these changes Jul 15, 2024

View reviewed changes

Pierre-Sassoulas reviewed Jul 15, 2024

View reviewed changes

doc/whatsnew/fragments/9255.feature Outdated Show resolved Hide resolved

badsketch added 12 commits July 25, 2024 02:05

Make python 3.8 compatible

4cb8be3

Support single quote docstrings

acf3b00

Fix formatting and linting

329fd77

Update docs

4e08680

Fix pypy38 failure

476931e

Refactor to use regex capture groups instead of string processing to …

cca53c6

…get the fixme message

Revert unit tests since functional tests cover all cases

26953c7

Tweak regex to account for comments starting with multiple pound signs

8fa2d2b

Fix spelling

9d5ed4d

Revert how comment fixmes extract a message

16bf9e3

Change fixme logic to allow only spaces between hash and fixme keyword

c7b81ad

Improve wording

1ffc033

jacobtylerwalls added this to the 3.3.0 milestone Jul 28, 2024

Pierre-Sassoulas modified the milestones: 3.3.0, 3.4.0 Sep 20, 2024

jacobtylerwalls modified the milestones: 3.4.0, 4.0.0 Sep 25, 2024

Merge remote-tracking branch 'upstream/main' into feat/9255-fixme

51a2a14

badsketch force-pushed the feat/9255-fixme branch from d06c562 to 51a2a14 Compare September 30, 2024 03:47

jacobtylerwalls approved these changes Sep 30, 2024

View reviewed changes

DanielNoord approved these changes Sep 30, 2024

View reviewed changes

DanielNoord merged commit 1a96a5d into pylint-dev:main Sep 30, 2024
44 checks passed

badsketch mentioned this pull request Oct 1, 2024

Add breaking fragment for fixme docstring support #9992

Merged

		@set_config(check_fixme_in_docstring=True)
		def test_docstring_with_message(self) -> None:

Support fixme's in docstrings #9744

Support fixme's in docstrings #9744

Conversation

badsketch commented Jun 22, 2024

Type of Changes

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

badsketch commented Jun 22, 2024

This comment has been minimized.

DanielNoord commented Jun 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jun 30, 2024 • edited Loading

Codecov Report

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

DanielNoord left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

badsketch commented Jul 8, 2024 • edited Loading

This comment has been minimized.

badsketch commented Jul 8, 2024

This comment has been minimized.

This comment has been minimized.

badsketch commented Jul 8, 2024

DanielNoord commented Jul 8, 2024

badsketch commented Jul 8, 2024

DanielNoord commented Jul 8, 2024

This comment has been minimized.

badsketch commented Jul 14, 2024

DanielNoord left a comment

Choose a reason for hiding this comment

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

jacobtylerwalls commented Jul 28, 2024

jacobtylerwalls commented Sep 29, 2024

github-actions bot commented Sep 30, 2024

badsketch commented Sep 30, 2024

jacobtylerwalls commented Sep 30, 2024

badsketch commented Oct 1, 2024

jacobtylerwalls commented Oct 1, 2024

codecov bot commented Jun 30, 2024 •

edited

Loading

badsketch commented Jul 8, 2024 •

edited

Loading