-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support fixme's in docstrings #9744
Conversation
pylint/checkers/misc.py
Outdated
elif self.linter.config.check_fixme_in_docstring and self._is_docstring_comment(token_info): | ||
docstring_lines = token_info.string.split("\n") | ||
for line_no, line in enumerate(docstring_lines): | ||
comment_text = line.removeprefix('"""').lstrip().removesuffix('"""') # trim '""""' and whitespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removeprefix()
and removesuffix()
are new in python 3.9. Dumb question, but how do I tell what version this project supports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After seeing 3.8 test suites fail, I'm guessing I need to make this compatible π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this? Can't we just put the full docstring in the message for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a case like
"""
TODO msg1
TODO msg2
"""
this PR will create two TODO lint messages. If we put the full docstring for both, it might be confusing or overly wordy, no?
When running
Thoughts on updating the docstring to be "Checker for encoding issues and fixme notes"? instead of "BaseChecker for encoding issues." |
d9b5b78
to
90df10b
Compare
This comment has been minimized.
This comment has been minimized.
Fine with me! |
pylint/checkers/misc.py
Outdated
def _is_docstring_comment(self, token_info: tokenize.TokenInfo) -> bool: | ||
return ( | ||
token_info.type == tokenize.STRING | ||
and token_info.line.lstrip().startswith('"""') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that a docstring can also start with '''
. I'm wondering if this should live in this tokeniser checker as I think it is actually quite hard to recognise docstrings on tokens alone.
Have you considered doing it as a checker for nodes.Module
, nodes.ClassDef
, etc? Then you can just check if the regex is in the .doc
attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that a docstring can also start with '''
totally forgot about this, thanks!
Have you considered doing it as a checker for nodes.Module, nodes.ClassDef, etc? Then you can just check if the regex is in the .doc attribute.
I was considering the possibility docstrings could appear outside of nodes.Module
and nodes.classDef
, so I figured it's best to use the existing token stream to watch for all occurrences. However, you could argue it's not good python practice (?) in the first place to have docstrings outside modules/classes/methods. Agreed the PR doesn't use the safest heuristic to determine if it's a docstring.
So it looks like we could do
- update
_is_docstring_comment()
to also checkstartswith("'''")
- refactor to use
nodes.Module
,nodes.ClassDef
,nodes.FunctionDef
and we tighten the scope of docstring fixmes - update tokenizer with a new docstring token similar to how we have a token.COMMENT type for a comment fixme. (Haven't looked too deep into this, will probably be higher effort)
Totally down to change it to 2, but would users claim false negatives when they try to create docstring fixme's outside of module/classes/function defs? Perhaps it would help if there were a lint message that recommends against docstrings outside of those places. Do we have something like that already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pierre-Sassoulas Opinion? I think trying option 1 for now might be fine, we can always refactor to 2 later on. I just thought I would raise the question to see if it was consciously ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the decision should be taken consciousely. I would have thougt that the node visitor implementation would be cleaner/terser, but maybe using the tokenizer is faster ? I expected less changes to be able to do docstrings' fixme check as we already have something working for comments ? Did we use the tokenizer for comments? I did not look very deep into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we already use tokenizer for comments. If I understand correctly, tokenizer is for instances where there's not a defined node for the check (eg, a comment can appear "anywhere" in the code, so we check for all tokens for that appearance). I tried to follow that logic with docstrings. Hence I piggy back on the tokenizer to examine for occurrences of """
and '''
. If we decide we only wish to support docstrings in classes, functions, and methods, then I could use a node visitor on those 3 node types.
It may be easiest at this point to go with Option 1 at this point since it'd be straightforward to make the change in the PR. And as Daniel mentioned, we could refactor to 2 in the future. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstrings are not supposed to be everywhere in a module, that's useless statements otherwise, so a node approach would work. But if comments requires tokenizer, for consistency of aproachs let's go with 1)
2cd6492
to
4b4ea5c
Compare
Codecov ReportAll modified and coverable lines are covered by tests β
Additional details and impacted files@@ Coverage Diff @@
## main #9744 +/- ##
=======================================
Coverage 95.80% 95.80%
=======================================
Files 174 174
Lines 18934 18946 +12
=======================================
+ Hits 18140 18152 +12
Misses 794 794
|
This comment has been minimized.
This comment has been minimized.
c07134e
to
ec19e0a
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome you got this to work!
tests/checkers/unittest_misc.py
Outdated
@set_config(check_fixme_in_docstring=True) | ||
def test_docstring_with_message(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pierre-Sassoulas Do you want these unittests? I would be fine with having only the functional tests and remove these. They feel like duplicates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, I use functional only almost all the time (terser / clearer once you know about functional).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, that makes sense! Reverted changes to this file.
pylint/checkers/misc.py
Outdated
self._comment_fixme_pattern = re.compile(comment_regex, re.I) | ||
if self.linter.config.check_fixme_in_docstring: | ||
docstring_regex = rf"((\"\"\")|(\'\'\'))\s*({notes})(?=(:|\s|\Z))" | ||
self._docstring_fixme_pattern = re.compile(docstring_regex, re.I) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to always set these two pattern attributes so that we don't get an AttributeError
in unexpected ways. In the old version we also did this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I've removed the conditionals here and just let process_tokens()
handle the logic of what to parse if check docstring for fixme setting is enabled.
pylint/checkers/misc.py
Outdated
if line.startswith(('"""', "'''")): | ||
line = line[3:] | ||
line = line.lstrip() | ||
if line.endswith(('"""', "'''")): | ||
line = line[:-3] | ||
if self._docstring_fixme_pattern.search( | ||
'"""' + line.lower() | ||
) or self._docstring_fixme_pattern.search("'''" + line.lower()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you have fixed the pattern, is this really necessary? I would prefer a pattern where we don't have to do a lot of complicated changes to the lines itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, it was getting pretty messy. I've decided to rework the logic to use regex capture groups for both docstrings and comments (since in that case we were also sort of changing the line then stitching it together to examine with regex).
Apologies for the glacial pace of this PR π¬, really appreciate all the feedback! Some updates:
Let me know your thoughts! |
This comment has been minimized.
This comment has been minimized.
Those primer tests are awesome! They thankfully caught something I missed when refactoring the comment regex pattern. I was no longer supporting cases like:
which is kind of curious as I wonder if that should've been supported in the first place? Regardless, I've updated it so it's supported and added that as an additional functional test. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
In my last few messages, I mentioned updating how the message was extracted in comment-based fixme's so we wouldn't have to use string manipulation(198fd93, 1ce441c). I've decided to revert that change because it was causing a lot of primer test failures. I think this is due to how comment-based fixme's might be a little inconsistent in its current state:
results in:
I wasn't able to replicate what we're doing today using regex capture groups so in favor of not causing any disruptions, I decided to go back to the current method. If we're open to standardizing some of the decided behavior maybe in the future and allowing changes to the primer tests, I'm also down to discuss! |
@badsketch With the danger of introducing more back and forth: would you be willing to create a proposal for this standardization and apply it to this PR? We can then see the test output and determine whether we are okay with that. That makes it a lot easier to discuss such standardization (with having some examples). |
@DanielNoord Sure thing, is there a formal way of making these proposals for Pylint features and do I submit it somewhere? Or do you mean try to consider all the usage out there and come back with a more detailed comment? |
No, I meant: just write the code as you would want and make the tests pass. If the test output is acceptable I would be okay with accepting your proposed code. I don't think we need a full proposal, just the code as you would propose to write it and then being able to see what changes that would give to the test output. |
This comment has been minimized.
This comment has been minimized.
@DanielNoord
The majority of changed primer tests are for comment fixme's that no longer emit. They appear to be fixme's that are part of a larger section of code that was commented out, suggesting the fixme is no longer valid. I think it makes sense for those to no longer be emitted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@Pierre-Sassoulas Do you want to give this a review as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this @badsketch !
I'm wondering why the regex contains rf"((\"\"\")|(\'\'\'))
, but we also do (token_info.line.lstrip().startswith(('"""', "'''")))
somewhere else ? My intuition would be that we could just search for the user defined "TODO" regex in comments and docstrings ? Is this because we want the option to not raise for docstring (for semver/compat with the old behavior) so we need to distinguish the two internally ? I didn't make any benchmarks for this but I think this kind of regex executed on each token can have huge performance impact.
β¦get the fixme message
The primer tests still suffer from a fluctuating baseline, so you'll see unrelated messages moving in and out from time to time. We haven't yet identified the source of the indeterminacy. Surely it's a lack of a fixed ordering somewhere. Don't worry about it. |
FYI @badsketch, thanks to some ace detective work by @akamat10, the primer is more stable now, so if you merge main you should see a more informative primer diff. |
d06c562
to
51a2a14
Compare
@jacobtylerwalls That's fantastic, appreciate the update! I merged from main and pushed just now. As a refresher, since it's been a while, the performance profiling results are in this comment: #9744 (comment) Also, the latest primer diff above makes sense to me. Most of the instances are where a TODO is part of a section that has been commented out. It would make sense for those to no longer alert as a fixme. The other instances are where a TODO is behind another comment like "# some note # TODO: xyz", which I would also like to propose as an acceptable primer test output change for the sake of consistency. Let me know if there are any issues π |
@badsketch Sorry I didn't catch this before merge, but would you add a short breaking changes news fragment to indicate that we no longer emit fixme when contained in a commented out block? Thanks! |
@jacobtylerwalls yeah for sure. Since there's already a https://github.com/pylint-dev/pylint/blob/main/doc/whatsnew/fragments/9255.feature, do I rename that to 9255.breaking and modify it, or do I create a new fragment? |
I think I like having two separate fragments for this. |
Type of Changes
Description
Closes #9255
Previous PR discussion here: #9281
fixme
rather than a new messagecheck-fixme-in-docstring
is the setting that enables it and defaults to falseAppreciate any feedback!