-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reset cursor at dedent after whitespace #11630
Reset cursor at dedent after whitespace #11630
Conversation
CodSpeed Performance ReportMerging #11630 will create unknown performance changesComparing Summary
Benchmarks breakdown
|
|
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
E117 | 152372 | 0 | 152372 | 0 | 0 |
E113 | 73488 | 0 | 73488 | 0 | 0 |
E116 | 6080 | 0 | 6080 | 0 | 0 |
RUF100 | 1 | 1 | 0 | 0 | 0 |
Formatter (stable)
✅ ecosystem check detected no format changes.
Formatter (preview)
✅ ecosystem check detected no format changes.
2d34e13
to
e8ab801
Compare
4faed6a
to
b600b42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are in total 5 usages of TextRange::empty(offset))
in the existing Lexer
code. Would you mind reviewing if we need to make the same change in other places?
// | ||
// Here, the cursor is at `^` and the `indentation` contains the whitespaces before | ||
// the `pass` token. | ||
self.cursor.start_token(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit that reading self.cursor.start_token()
is hard to understand. Why are we starting a token when we're just about to return it. Your comment certainly helps. That makes me wonder if we can instead add a method on Lexer
with a more meaningful name.
self.update_token_start_to_current_offset() // A bi tlong
self.reset_token_start()
but I must admit, I don't find them much better than what we have. But maybe you have an idea for a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree. Let me think about this.
I did review them, sorry for not mentioning in the PR description. This isn't needed in all places, it's only require if the lexer skips whitespaces. For posterity, the usages are: ruff/crates/ruff_python_parser/src/lexer.rs Lines 853 to 863 in 685d11a
⬆️ This emits any pending dedent tokens. There are no whitespaces consumed by the lexer before this, so no need to reset the cursor. ruff/crates/ruff_python_parser/src/lexer.rs Lines 1031 to 1041 in 685d11a
⬆️ We already reset the cursor before this function is called ⬇️ ruff/crates/ruff_python_parser/src/lexer.rs Lines 877 to 898 in b600b42
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining!
## Summary This PR fixes a bug to reset the cursor before emitting the `Dedent` token which had preceding whitespaces. Previously, we would just use `TextRange::empty(self.offset())` because the range for each token would be computed at the place where it was emitted. Now, the lexer handles this centrally in `next_token` which means we require special handling at certain places. This is one such case. Computing the range centrally in `next_token` has a benefit that the return type of lexer methods is a simple `TokenKind` but it does add a small complexity to handle such cases. I think it's worth doing this but if anyone thinks otherwise, feel free to comment. ## Test Plan Add a test case and update the snapshot.
Summary
This PR fixes a bug to reset the cursor before emitting the
Dedent
token which had preceding whitespaces.Previously, we would just use
TextRange::empty(self.offset())
because the range for each token would be computed at the place where it was emitted. Now, the lexer handles this centrally innext_token
which means we require special handling at certain places. This is one such case.Computing the range centrally in
next_token
has a benefit that the return type of lexer methods is a simpleTokenKind
but it does add a small complexity to handle such cases. I think it's worth doing this but if anyone thinks otherwise, feel free to comment.Test Plan
Add a test case and update the snapshot.