Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset cursor at dedent after whitespace #11630

Merged
merged 1 commit into from
May 31, 2024

Conversation

dhruvmanila
Copy link
Member

Summary

This PR fixes a bug to reset the cursor before emitting the Dedent token which had preceding whitespaces.

Previously, we would just use TextRange::empty(self.offset()) because the range for each token would be computed at the place where it was emitted. Now, the lexer handles this centrally in next_token which means we require special handling at certain places. This is one such case.

Computing the range centrally in next_token has a benefit that the return type of lexer methods is a simple TokenKind but it does add a small complexity to handle such cases. I think it's worth doing this but if anyone thinks otherwise, feel free to comment.

Test Plan

Add a test case and update the snapshot.

@dhruvmanila dhruvmanila added bug Something isn't working parser Related to the parser labels May 31, 2024
Copy link

codspeed-hq bot commented May 31, 2024

CodSpeed Performance Report

Merging #11630 will create unknown performance changes

Comparing dhruv/dedent-after-whitespace (b600b42) with dhruv/parser-phase-2 (e8ab801)

Summary

🆕 30 new benchmarks

Benchmarks breakdown

Benchmark dhruv/parser-phase-2 dhruv/dedent-after-whitespace Change
🆕 formatter[large/dataset.py] N/A 9.3 ms N/A
🆕 formatter[numpy/ctypeslib.py] N/A 1.8 ms N/A
🆕 formatter[numpy/globals.py] N/A 240.5 µs N/A
🆕 formatter[pydantic/types.py] N/A 3.5 ms N/A
🆕 formatter[unicode/pypinyin.py] N/A 649.7 µs N/A
🆕 lexer[large/dataset.py] N/A 1.1 ms N/A
🆕 lexer[numpy/ctypeslib.py] N/A 226 µs N/A
🆕 lexer[numpy/globals.py] N/A 29.6 µs N/A
🆕 lexer[pydantic/types.py] N/A 503.2 µs N/A
🆕 lexer[unicode/pypinyin.py] N/A 76.7 µs N/A
🆕 linter/all-rules[large/dataset.py] N/A 16.1 ms N/A
🆕 linter/all-rules[numpy/ctypeslib.py] N/A 4 ms N/A
🆕 linter/all-rules[numpy/globals.py] N/A 692.6 µs N/A
🆕 linter/all-rules[pydantic/types.py] N/A 7.7 ms N/A
🆕 linter/all-rules[unicode/pypinyin.py] N/A 2 ms N/A
🆕 linter/default-rules[large/dataset.py] N/A 3.7 ms N/A
🆕 linter/default-rules[numpy/ctypeslib.py] N/A 941.4 µs N/A
🆕 linter/default-rules[numpy/globals.py] N/A 184.5 µs N/A
🆕 linter/default-rules[pydantic/types.py] N/A 1.9 ms N/A
🆕 linter/default-rules[unicode/pypinyin.py] N/A 358.3 µs N/A
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Copy link
Contributor

github-actions bot commented May 31, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+1 -231940 violations, +0 -0 fixes in 31 projects; 19 projects unchanged)

DisnakeDev/disnake (+0 -6809 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- disnake/__main__.py:240:1: E117 Over-indented
- disnake/__main__.py:268:1: E113 Unexpected indentation
- disnake/__main__.py:281:1: E117 Over-indented
- disnake/__main__.py:285:1: E113 Unexpected indentation
- disnake/__main__.py:290:1: E117 Over-indented
- disnake/__main__.py:293:1: E117 Over-indented
- disnake/__main__.py:296:1: E117 Over-indented
- disnake/__main__.py:299:1: E117 Over-indented
... 4710 additional changes omitted for rule E117
- disnake/__main__.py:328:1: E113 Unexpected indentation
- disnake/__main__.py:34:1: E113 Unexpected indentation
... 6799 additional changes omitted for project

PostHog/HouseWatch (+0 -115 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- housewatch/api/analyze.py:102:1: E113 Unexpected indentation
- housewatch/api/analyze.py:113:1: E113 Unexpected indentation
- housewatch/api/analyze.py:124:1: E113 Unexpected indentation
- housewatch/api/analyze.py:130:1: E113 Unexpected indentation
- housewatch/api/analyze.py:133:1: E113 Unexpected indentation
- housewatch/api/analyze.py:140:1: E113 Unexpected indentation
... 45 additional changes omitted for rule E113
- housewatch/api/analyze.py:148:1: E117 Over-indented
- housewatch/api/analyze.py:250:1: E117 Over-indented
- housewatch/api/analyze.py:299:1: E117 Over-indented
- housewatch/api/analyze.py:313:1: E117 Over-indented
... 105 additional changes omitted for project

RasaHQ/rasa (+1 -6707 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- .github/scripts/download_pretrained.py:48:1: E113 Unexpected indentation
- .github/scripts/download_pretrained.py:50:1: E117 Over-indented
- .github/scripts/download_pretrained.py:60:1: E113 Unexpected indentation
- .github/scripts/download_pretrained.py:67:1: E113 Unexpected indentation
- .github/scripts/download_pretrained.py:88:1: E117 Over-indented
- .github/scripts/mr_generate_summary.py:54:1: E113 Unexpected indentation
- .github/scripts/mr_publish_results.py:107:1: E117 Over-indented
- .github/scripts/mr_publish_results.py:110:1: E113 Unexpected indentation
- .github/scripts/mr_publish_results.py:114:1: E117 Over-indented
- .github/scripts/mr_publish_results.py:117:1: E117 Over-indented
... 6698 additional changes omitted for project

alteryx/featuretools (+0 -2208 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- docs/notebook_version_standardizer.py:126:1: E117 Over-indented
- docs/notebook_version_standardizer.py:131:1: E117 Over-indented
- docs/notebook_version_standardizer.py:153:1: E117 Over-indented
- docs/notebook_version_standardizer.py:160:1: E117 Over-indented
- docs/notebook_version_standardizer.py:16:1: E117 Over-indented
- docs/notebook_version_standardizer.py:25:1: E113 Unexpected indentation
- docs/notebook_version_standardizer.py:30:1: E117 Over-indented
... 1456 additional changes omitted for rule E117
- docs/notebook_version_standardizer.py:61:1: E113 Unexpected indentation
- docs/notebook_version_standardizer.py:73:1: E113 Unexpected indentation
- featuretools/computational_backends/calculate_feature_matrix.py:172:1: E113 Unexpected indentation
... 2198 additional changes omitted for project

apache/airflow (+0 -42650 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

- airflow/__init__.py:109:1: E117 Over-indented
- airflow/__init__.py:118:1: E113 Unexpected indentation
- airflow/__init__.py:122:1: E117 Over-indented
- airflow/__init__.py:126:1: E113 Unexpected indentation
- airflow/__main__.py:47:1: E113 Unexpected indentation
- airflow/api/__init__.py:37:1: E117 Over-indented
- airflow/api/__init__.py:40:1: E113 Unexpected indentation
- airflow/api/__init__.py:46:1: E117 Over-indented
- airflow/api/auth/backend/kerberos_auth.py:101:1: E117 Over-indented
- airflow/api/auth/backend/kerberos_auth.py:103:1: E117 Over-indented
... 23997 additional changes omitted for rule E117
... 42640 additional changes omitted for project

aws/aws-sam-cli (+0 -10085 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- samcli/cli/cli_config_file.py:110:1: E117 Over-indented
- samcli/cli/cli_config_file.py:121:1: E117 Over-indented
- samcli/cli/cli_config_file.py:155:1: E117 Over-indented
- samcli/cli/cli_config_file.py:172:1: E117 Over-indented
- samcli/cli/cli_config_file.py:244:1: E113 Unexpected indentation
- samcli/cli/cli_config_file.py:310:1: E113 Unexpected indentation
- samcli/cli/cli_config_file.py:315:1: E117 Over-indented
- samcli/cli/cli_config_file.py:319:1: E113 Unexpected indentation
- samcli/cli/cli_config_file.py:323:1: E113 Unexpected indentation
- samcli/cli/cli_config_file.py:327:1: E113 Unexpected indentation
... 10075 additional changes omitted for project

bloomberg/pytest-memray (+0 -66 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- src/pytest_memray/marks.py:122:1: E113 Unexpected indentation
- src/pytest_memray/marks.py:144:1: E113 Unexpected indentation
- src/pytest_memray/marks.py:164:1: E113 Unexpected indentation
- src/pytest_memray/marks.py:171:1: E113 Unexpected indentation
- src/pytest_memray/marks.py:183:1: E113 Unexpected indentation
- src/pytest_memray/marks.py:214:1: E113 Unexpected indentation
... 33 additional changes omitted for rule E113
- src/pytest_memray/marks.py:217:1: E117 Over-indented
- src/pytest_memray/plugin.py:130:1: E117 Over-indented
- src/pytest_memray/plugin.py:142:1: E117 Over-indented
- src/pytest_memray/plugin.py:159:1: E117 Over-indented
... 56 additional changes omitted for project

bokeh/bokeh (+0 -6215 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

- docs/bokeh/docserver.py:100:1: E113 Unexpected indentation
- docs/bokeh/docserver.py:97:1: E117 Over-indented
- examples/advanced/extensions/parallel_plot/parallel_plot.py:23:1: E117 Over-indented
- examples/advanced/extensions/parallel_plot/parallel_plot.py:29:1: E113 Unexpected indentation
- examples/advanced/extensions/parallel_plot/parallel_plot.py:74:1: E113 Unexpected indentation
- examples/basic/data/server_sent_events_source.py:63:1: E113 Unexpected indentation
- examples/integration/layout/multi_plot_layout.py:13:1: E117 Over-indented
- examples/models/basic_plot.py:45:1: E113 Unexpected indentation
- examples/models/buttons.py:78:1: E113 Unexpected indentation
... 1984 additional changes omitted for rule E113
- examples/models/calendars.py:40:1: E117 Over-indented
... 6205 additional changes omitted for project

freedomofpress/securedrop (+0 -2565 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- admin/bootstrap.py:121:1: E117 Over-indented
- admin/bootstrap.py:126:1: E117 Over-indented
- admin/bootstrap.py:135:1: E117 Over-indented
- admin/bootstrap.py:158:1: E117 Over-indented (comment)
- admin/bootstrap.py:159:1: E117 Over-indented (comment)
- admin/bootstrap.py:160:1: E117 Over-indented (comment)
... 1743 additional changes omitted for rule E117
- admin/bootstrap.py:290:1: E113 Unexpected indentation
- admin/bootstrap.py:301:1: E113 Unexpected indentation
- admin/bootstrap.py:61:1: E113 Unexpected indentation
- admin/securedrop_admin/__init__.py:1033:1: E116 Unexpected indentation (comment)
... 2555 additional changes omitted for project

fronzbot/blinkpy (+0 -503 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- blinkpy/api.py:423:1: E117 Over-indented
- blinkpy/api.py:425:1: E117 Over-indented
- blinkpy/api.py:452:1: E117 Over-indented
- blinkpy/api.py:454:1: E117 Over-indented
- blinkpy/api.py:512:1: E117 Over-indented
- blinkpy/api.py:514:1: E117 Over-indented
... 332 additional changes omitted for rule E117
- blinkpy/auth.py:148:1: E113 Unexpected indentation
- blinkpy/auth.py:161:1: E113 Unexpected indentation
- blinkpy/auth.py:47:1: E113 Unexpected indentation
- blinkpy/auth.py:61:1: E113 Unexpected indentation
... 493 additional changes omitted for project

... Truncated remaining completed project reports due to GitHub comment length restrictions

Changes by rule (4 rules affected)

code total + violation - violation + fix - fix
E117 152372 0 152372 0 0
E113 73488 0 73488 0 0
E116 6080 0 6080 0 0
RUF100 1 1 0 0 0

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are in total 5 usages of TextRange::empty(offset)) in the existing Lexer code. Would you mind reviewing if we need to make the same change in other places?

//
// Here, the cursor is at `^` and the `indentation` contains the whitespaces before
// the `pass` token.
self.cursor.start_token();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit that reading self.cursor.start_token() is hard to understand. Why are we starting a token when we're just about to return it. Your comment certainly helps. That makes me wonder if we can instead add a method on Lexer with a more meaningful name.

self.update_token_start_to_current_offset() // A bi tlong
self.reset_token_start()

but I must admit, I don't find them much better than what we have. But maybe you have an idea for a better name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. Let me think about this.

@dhruvmanila
Copy link
Member Author

dhruvmanila commented May 31, 2024

There are in total 5 usages of TextRange::empty(offset)) in the existing Lexer code. Would you mind reviewing if we need to make the same change in other places?

I did review them, sorry for not mentioning in the PR description. This isn't needed in all places, it's only require if the lexer skips whitespaces. For posterity, the usages are:

// Return dedent tokens until the current indentation level matches the indentation of the next token.
else if let Some(indentation) = self.pending_indentation.take() {
match self.indentations.current().try_compare(indentation) {
Ok(Ordering::Greater) => {
self.pending_indentation = Some(indentation);
let offset = self.offset();
self.indentations.dedent_one(indentation).map_err(|_| {
LexicalError::new(LexicalErrorType::IndentationError, self.token_range())
})?;
return Ok((Tok::Dedent, TextRange::empty(offset)));
}

⬆️ This emits any pending dedent tokens. There are no whitespaces consumed by the lexer before this, so no need to reset the cursor.

// Next, insert a trailing newline, if required.
if !self.state.is_new_logical_line() {
self.state = State::AfterNewline;
Ok((Tok::Newline, TextRange::empty(self.offset())))
}
// Next, flush the indentation stack to zero.
else if self.indentations.dedent().is_some() {
Ok((Tok::Dedent, TextRange::empty(self.offset())))
} else {
Ok((Tok::EndOfFile, TextRange::empty(self.offset())))
}

⬆️ We already reset the cursor before this function is called ⬇️

// The lexer might've skipped whitespaces, so update the start offset
self.cursor.start_token();
if let Some(c) = self.cursor.bump() {
if c.is_ascii() {
self.consume_ascii_character(c)
} else if is_unicode_identifier_start(c) {
let identifier = self.lex_identifier(c);
self.state = State::Other;
identifier
} else {
self.push_error(LexicalError::new(
LexicalErrorType::UnrecognizedToken { tok: c },
self.token_range(),
))
}
} else {
// Reached the end of the file. Emit a trailing newline token if not at the beginning of a logical line,
// empty the dedent stack, and finally, return the EndOfFile token.
self.consume_end()
}

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining!

@dhruvmanila dhruvmanila merged commit 8196720 into dhruv/parser-phase-2 May 31, 2024
19 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/dedent-after-whitespace branch May 31, 2024 11:04
dhruvmanila added a commit that referenced this pull request Jun 3, 2024
## Summary

This PR fixes a bug to reset the cursor before emitting the `Dedent`
token which had preceding whitespaces.

Previously, we would just use `TextRange::empty(self.offset())` because
the range for each token would be computed at the place where it was
emitted. Now, the lexer handles this centrally in `next_token` which
means we require special handling at certain places. This is one such
case.

Computing the range centrally in `next_token` has a benefit that the
return type of lexer methods is a simple `TokenKind` but it does add a
small complexity to handle such cases. I think it's worth doing this but
if anyone thinks otherwise, feel free to comment.

## Test Plan

Add a test case and update the snapshot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parser Related to the parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants