Reset cursor at dedent after whitespace #11630

dhruvmanila · 2024-05-31T06:04:56Z

Summary

This PR fixes a bug to reset the cursor before emitting the Dedent token which had preceding whitespaces.

Previously, we would just use TextRange::empty(self.offset()) because the range for each token would be computed at the place where it was emitted. Now, the lexer handles this centrally in next_token which means we require special handling at certain places. This is one such case.

Computing the range centrally in next_token has a benefit that the return type of lexer methods is a simple TokenKind but it does add a small complexity to handle such cases. I think it's worth doing this but if anyone thinks otherwise, feel free to comment.

Test Plan

Add a test case and update the snapshot.

codspeed-hq · 2024-05-31T06:09:27Z

CodSpeed Performance Report

Merging #11630 will create unknown performance changes

_{Comparing dhruv/dedent-after-whitespace (b600b42) with dhruv/parser-phase-2 (e8ab801)}

Summary

🆕 30 new benchmarks

Benchmarks breakdown

	Benchmark	`dhruv/parser-phase-2`	`dhruv/dedent-after-whitespace`	Change
🆕	`formatter[large/dataset.py]`	N/A	9.3 ms	N/A
🆕	`formatter[numpy/ctypeslib.py]`	N/A	1.8 ms	N/A
🆕	`formatter[numpy/globals.py]`	N/A	240.5 µs	N/A
🆕	`formatter[pydantic/types.py]`	N/A	3.5 ms	N/A
🆕	`formatter[unicode/pypinyin.py]`	N/A	649.7 µs	N/A
🆕	`lexer[large/dataset.py]`	N/A	1.1 ms	N/A
🆕	`lexer[numpy/ctypeslib.py]`	N/A	226 µs	N/A
🆕	`lexer[numpy/globals.py]`	N/A	29.6 µs	N/A
🆕	`lexer[pydantic/types.py]`	N/A	503.2 µs	N/A
🆕	`lexer[unicode/pypinyin.py]`	N/A	76.7 µs	N/A
🆕	`linter/all-rules[large/dataset.py]`	N/A	16.1 ms	N/A
🆕	`linter/all-rules[numpy/ctypeslib.py]`	N/A	4 ms	N/A
🆕	`linter/all-rules[numpy/globals.py]`	N/A	692.6 µs	N/A
🆕	`linter/all-rules[pydantic/types.py]`	N/A	7.7 ms	N/A
🆕	`linter/all-rules[unicode/pypinyin.py]`	N/A	2 ms	N/A
🆕	`linter/default-rules[large/dataset.py]`	N/A	3.7 ms	N/A
🆕	`linter/default-rules[numpy/ctypeslib.py]`	N/A	941.4 µs	N/A
🆕	`linter/default-rules[numpy/globals.py]`	N/A	184.5 µs	N/A
🆕	`linter/default-rules[pydantic/types.py]`	N/A	1.9 ms	N/A
🆕	`linter/default-rules[unicode/pypinyin.py]`	N/A	358.3 µs	N/A
...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

github-actions · 2024-05-31T06:24:54Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+1 -231940 violations, +0 -0 fixes in 31 projects; 19 projects unchanged)

DisnakeDev/disnake (+0 -6809 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- disnake/__main__.py:240:1: E117 Over-indented
- disnake/__main__.py:268:1: E113 Unexpected indentation
- disnake/__main__.py:281:1: E117 Over-indented
- disnake/__main__.py:285:1: E113 Unexpected indentation
- disnake/__main__.py:290:1: E117 Over-indented
- disnake/__main__.py:293:1: E117 Over-indented
- disnake/__main__.py:296:1: E117 Over-indented
- disnake/__main__.py:299:1: E117 Over-indented
... 4710 additional changes omitted for rule E117
- disnake/__main__.py:328:1: E113 Unexpected indentation
- disnake/__main__.py:34:1: E113 Unexpected indentation
... 6799 additional changes omitted for project

PostHog/HouseWatch (+0 -115 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- housewatch/api/analyze.py:102:1: E113 Unexpected indentation
- housewatch/api/analyze.py:113:1: E113 Unexpected indentation
- housewatch/api/analyze.py:124:1: E113 Unexpected indentation
- housewatch/api/analyze.py:130:1: E113 Unexpected indentation
- housewatch/api/analyze.py:133:1: E113 Unexpected indentation
- housewatch/api/analyze.py:140:1: E113 Unexpected indentation
... 45 additional changes omitted for rule E113
- housewatch/api/analyze.py:148:1: E117 Over-indented
- housewatch/api/analyze.py:250:1: E117 Over-indented
- housewatch/api/analyze.py:299:1: E117 Over-indented
- housewatch/api/analyze.py:313:1: E117 Over-indented
... 105 additional changes omitted for project

RasaHQ/rasa (+1 -6707 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- .github/scripts/download_pretrained.py:48:1: E113 Unexpected indentation
- .github/scripts/download_pretrained.py:50:1: E117 Over-indented
- .github/scripts/download_pretrained.py:60:1: E113 Unexpected indentation
- .github/scripts/download_pretrained.py:67:1: E113 Unexpected indentation
- .github/scripts/download_pretrained.py:88:1: E117 Over-indented
- .github/scripts/mr_generate_summary.py:54:1: E113 Unexpected indentation
- .github/scripts/mr_publish_results.py:107:1: E117 Over-indented
- .github/scripts/mr_publish_results.py:110:1: E113 Unexpected indentation
- .github/scripts/mr_publish_results.py:114:1: E117 Over-indented
- .github/scripts/mr_publish_results.py:117:1: E117 Over-indented
... 6698 additional changes omitted for project

alteryx/featuretools (+0 -2208 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- docs/notebook_version_standardizer.py:126:1: E117 Over-indented
- docs/notebook_version_standardizer.py:131:1: E117 Over-indented
- docs/notebook_version_standardizer.py:153:1: E117 Over-indented
- docs/notebook_version_standardizer.py:160:1: E117 Over-indented
- docs/notebook_version_standardizer.py:16:1: E117 Over-indented
- docs/notebook_version_standardizer.py:25:1: E113 Unexpected indentation
- docs/notebook_version_standardizer.py:30:1: E117 Over-indented
... 1456 additional changes omitted for rule E117
- docs/notebook_version_standardizer.py:61:1: E113 Unexpected indentation
- docs/notebook_version_standardizer.py:73:1: E113 Unexpected indentation
- featuretools/computational_backends/calculate_feature_matrix.py:172:1: E113 Unexpected indentation
... 2198 additional changes omitted for project

apache/airflow (+0 -42650 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

- airflow/__init__.py:109:1: E117 Over-indented
- airflow/__init__.py:118:1: E113 Unexpected indentation
- airflow/__init__.py:122:1: E117 Over-indented
- airflow/__init__.py:126:1: E113 Unexpected indentation
- airflow/__main__.py:47:1: E113 Unexpected indentation
- airflow/api/__init__.py:37:1: E117 Over-indented
- airflow/api/__init__.py:40:1: E113 Unexpected indentation
- airflow/api/__init__.py:46:1: E117 Over-indented
- airflow/api/auth/backend/kerberos_auth.py:101:1: E117 Over-indented
- airflow/api/auth/backend/kerberos_auth.py:103:1: E117 Over-indented
... 23997 additional changes omitted for rule E117
... 42640 additional changes omitted for project

aws/aws-sam-cli (+0 -10085 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- samcli/cli/cli_config_file.py:110:1: E117 Over-indented
- samcli/cli/cli_config_file.py:121:1: E117 Over-indented
- samcli/cli/cli_config_file.py:155:1: E117 Over-indented
- samcli/cli/cli_config_file.py:172:1: E117 Over-indented
- samcli/cli/cli_config_file.py:244:1: E113 Unexpected indentation
- samcli/cli/cli_config_file.py:310:1: E113 Unexpected indentation
- samcli/cli/cli_config_file.py:315:1: E117 Over-indented
- samcli/cli/cli_config_file.py:319:1: E113 Unexpected indentation
- samcli/cli/cli_config_file.py:323:1: E113 Unexpected indentation
- samcli/cli/cli_config_file.py:327:1: E113 Unexpected indentation
... 10075 additional changes omitted for project

bloomberg/pytest-memray (+0 -66 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- src/pytest_memray/marks.py:122:1: E113 Unexpected indentation
- src/pytest_memray/marks.py:144:1: E113 Unexpected indentation
- src/pytest_memray/marks.py:164:1: E113 Unexpected indentation
- src/pytest_memray/marks.py:171:1: E113 Unexpected indentation
- src/pytest_memray/marks.py:183:1: E113 Unexpected indentation
- src/pytest_memray/marks.py:214:1: E113 Unexpected indentation
... 33 additional changes omitted for rule E113
- src/pytest_memray/marks.py:217:1: E117 Over-indented
- src/pytest_memray/plugin.py:130:1: E117 Over-indented
- src/pytest_memray/plugin.py:142:1: E117 Over-indented
- src/pytest_memray/plugin.py:159:1: E117 Over-indented
... 56 additional changes omitted for project

bokeh/bokeh (+0 -6215 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

- docs/bokeh/docserver.py:100:1: E113 Unexpected indentation
- docs/bokeh/docserver.py:97:1: E117 Over-indented
- examples/advanced/extensions/parallel_plot/parallel_plot.py:23:1: E117 Over-indented
- examples/advanced/extensions/parallel_plot/parallel_plot.py:29:1: E113 Unexpected indentation
- examples/advanced/extensions/parallel_plot/parallel_plot.py:74:1: E113 Unexpected indentation
- examples/basic/data/server_sent_events_source.py:63:1: E113 Unexpected indentation
- examples/integration/layout/multi_plot_layout.py:13:1: E117 Over-indented
- examples/models/basic_plot.py:45:1: E113 Unexpected indentation
- examples/models/buttons.py:78:1: E113 Unexpected indentation
... 1984 additional changes omitted for rule E113
- examples/models/calendars.py:40:1: E117 Over-indented
... 6205 additional changes omitted for project

freedomofpress/securedrop (+0 -2565 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- admin/bootstrap.py:121:1: E117 Over-indented
- admin/bootstrap.py:126:1: E117 Over-indented
- admin/bootstrap.py:135:1: E117 Over-indented
- admin/bootstrap.py:158:1: E117 Over-indented (comment)
- admin/bootstrap.py:159:1: E117 Over-indented (comment)
- admin/bootstrap.py:160:1: E117 Over-indented (comment)
... 1743 additional changes omitted for rule E117
- admin/bootstrap.py:290:1: E113 Unexpected indentation
- admin/bootstrap.py:301:1: E113 Unexpected indentation
- admin/bootstrap.py:61:1: E113 Unexpected indentation
- admin/securedrop_admin/__init__.py:1033:1: E116 Unexpected indentation (comment)
... 2555 additional changes omitted for project

fronzbot/blinkpy (+0 -503 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- blinkpy/api.py:423:1: E117 Over-indented
- blinkpy/api.py:425:1: E117 Over-indented
- blinkpy/api.py:452:1: E117 Over-indented
- blinkpy/api.py:454:1: E117 Over-indented
- blinkpy/api.py:512:1: E117 Over-indented
- blinkpy/api.py:514:1: E117 Over-indented
... 332 additional changes omitted for rule E117
- blinkpy/auth.py:148:1: E113 Unexpected indentation
- blinkpy/auth.py:161:1: E113 Unexpected indentation
- blinkpy/auth.py:47:1: E113 Unexpected indentation
- blinkpy/auth.py:61:1: E113 Unexpected indentation
... 493 additional changes omitted for project

... Truncated remaining completed project reports due to GitHub comment length restrictions

Changes by rule (4 rules affected)

code	total	+ violation	- violation
E117	152372	0	152372
E113	73488	0	73488
E116	6080	0	6080
RUF100	1	1	0

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

MichaReiser

There are in total 5 usages of TextRange::empty(offset)) in the existing Lexer code. Would you mind reviewing if we need to make the same change in other places?

MichaReiser · 2024-05-31T06:41:26Z

crates/ruff_python_parser/src/lexer.rs

+                //
+                // Here, the cursor is at `^` and the `indentation` contains the whitespaces before
+                // the `pass` token.
+                self.cursor.start_token();


I must admit that reading self.cursor.start_token() is hard to understand. Why are we starting a token when we're just about to return it. Your comment certainly helps. That makes me wonder if we can instead add a method on Lexer with a more meaningful name.

self.update_token_start_to_current_offset() // A bi tlong self.reset_token_start()

but I must admit, I don't find them much better than what we have. But maybe you have an idea for a better name?

Yeah, I agree. Let me think about this.

dhruvmanila · 2024-05-31T07:48:46Z

There are in total 5 usages of TextRange::empty(offset)) in the existing Lexer code. Would you mind reviewing if we need to make the same change in other places?

I did review them, sorry for not mentioning in the PR description. This isn't needed in all places, it's only require if the lexer skips whitespaces. For posterity, the usages are:

ruff/crates/ruff_python_parser/src/lexer.rs

Lines 853 to 863 in 685d11a

    
           // Return dedent tokens until the current indentation level matches the indentation of the next token. 
        
           else if let Some(indentation) = self.pending_indentation.take() { 
        
               match self.indentations.current().try_compare(indentation) { 
        
                   Ok(Ordering::Greater) => { 
        
                       self.pending_indentation = Some(indentation); 
        
                       let offset = self.offset(); 
        
                       self.indentations.dedent_one(indentation).map_err(|_| { 
        
                           LexicalError::new(LexicalErrorType::IndentationError, self.token_range()) 
        
                       })?; 
        
                       return Ok((Tok::Dedent, TextRange::empty(offset))); 
        
                   }

⬆️ This emits any pending dedent tokens. There are no whitespaces consumed by the lexer before this, so no need to reset the cursor.

ruff/crates/ruff_python_parser/src/lexer.rs

Lines 1031 to 1041 in 685d11a

    
           // Next, insert a trailing newline, if required. 
        
           if !self.state.is_new_logical_line() { 
        
               self.state = State::AfterNewline; 
        
               Ok((Tok::Newline, TextRange::empty(self.offset()))) 
        
           } 
        
           // Next, flush the indentation stack to zero. 
        
           else if self.indentations.dedent().is_some() { 
        
               Ok((Tok::Dedent, TextRange::empty(self.offset()))) 
        
           } else { 
        
               Ok((Tok::EndOfFile, TextRange::empty(self.offset()))) 
        
           }

⬆️ We already reset the cursor before this function is called ⬇️

ruff/crates/ruff_python_parser/src/lexer.rs

Lines 877 to 898 in b600b42

    
           // The lexer might've skipped whitespaces, so update the start offset 
        
           self.cursor.start_token(); 
        
           if let Some(c) = self.cursor.bump() { 
        
               if c.is_ascii() { 
        
                   self.consume_ascii_character(c) 
        
               } else if is_unicode_identifier_start(c) { 
        
                   let identifier = self.lex_identifier(c); 
        
                   self.state = State::Other; 
        
                   identifier 
        
               } else { 
        
                   self.push_error(LexicalError::new( 
        
                       LexicalErrorType::UnrecognizedToken { tok: c }, 
        
                       self.token_range(), 
        
                   )) 
        
               } 
        
           } else { 
        
               // Reached the end of the file. Emit a trailing newline token if not at the beginning of a logical line, 
        
               // empty the dedent stack, and finally, return the EndOfFile token. 
        
               self.consume_end() 
        
           }

MichaReiser

Thanks for explaining!

## Summary This PR fixes a bug to reset the cursor before emitting the `Dedent` token which had preceding whitespaces. Previously, we would just use `TextRange::empty(self.offset())` because the range for each token would be computed at the place where it was emitted. Now, the lexer handles this centrally in `next_token` which means we require special handling at certain places. This is one such case. Computing the range centrally in `next_token` has a benefit that the return type of lexer methods is a simple `TokenKind` but it does add a small complexity to handle such cases. I think it's worth doing this but if anyone thinks otherwise, feel free to comment. ## Test Plan Add a test case and update the snapshot.

dhruvmanila added bug Something isn't working parser Related to the parser labels May 31, 2024

dhruvmanila requested a review from MichaReiser as a code owner May 31, 2024 06:04

Reset cursor at dedent after whitespace

b600b42

dhruvmanila force-pushed the dhruv/parser-phase-2 branch from 2d34e13 to e8ab801 Compare May 31, 2024 06:27

dhruvmanila requested a review from AlexWaygood as a code owner May 31, 2024 06:27

dhruvmanila force-pushed the dhruv/dedent-after-whitespace branch from 4faed6a to b600b42 Compare May 31, 2024 06:27

dhruvmanila removed the request for review from AlexWaygood May 31, 2024 06:28

MichaReiser reviewed May 31, 2024

View reviewed changes

MichaReiser approved these changes May 31, 2024

View reviewed changes

dhruvmanila merged commit 8196720 into dhruv/parser-phase-2 May 31, 2024
19 checks passed

dhruvmanila deleted the dhruv/dedent-after-whitespace branch May 31, 2024 11:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reset cursor at dedent after whitespace #11630

Reset cursor at dedent after whitespace #11630

dhruvmanila commented May 31, 2024

codspeed-hq bot commented May 31, 2024 •

edited

Loading

github-actions bot commented May 31, 2024 •

edited

Loading

MichaReiser left a comment

MichaReiser May 31, 2024

dhruvmanila May 31, 2024

dhruvmanila commented May 31, 2024 •

edited

Loading

MichaReiser left a comment

Reset cursor at dedent after whitespace #11630

Reset cursor at dedent after whitespace #11630

Conversation

dhruvmanila commented May 31, 2024

Summary

Test Plan

codspeed-hq bot commented May 31, 2024 • edited Loading

CodSpeed Performance Report

Merging #11630 will create unknown performance changes

Summary

Benchmarks breakdown

github-actions bot commented May 31, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

MichaReiser left a comment

Choose a reason for hiding this comment

MichaReiser May 31, 2024

Choose a reason for hiding this comment

dhruvmanila May 31, 2024

Choose a reason for hiding this comment

dhruvmanila commented May 31, 2024 • edited Loading

MichaReiser left a comment

Choose a reason for hiding this comment

codspeed-hq bot commented May 31, 2024 •

edited

Loading

github-actions bot commented May 31, 2024 •

edited

Loading

`ruff-ecosystem` results

dhruvmanila commented May 31, 2024 •

edited

Loading