Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on f-string with \{ #4421

Closed
JelleZijlstra opened this issue Jul 31, 2024 · 8 comments · Fixed by #4422
Closed

Crash on f-string with \{ #4421

JelleZijlstra opened this issue Jul 31, 2024 · 8 comments · Fixed by #4422
Labels
T: bug Something isn't working

Comments

@JelleZijlstra
Copy link
Collaborator

Black fails to format a valid Python f-string containing \{. I minimized this from a file in Pygments, pygments/lexers/int_fiction.py.

% cat rf.py 
rf'{a}\{{[\}}'
% black --check rf.py
error: cannot format rf.py: Cannot parse: 1:10: rf'{a}\{{[\}}'

Oh no! 💥 💔 💥
1 file would fail to reformat.
% python rf.py 
Traceback (most recent call last):
  File "/Users/jelle/py/black/rf.py", line 1, in <module>
    rf'{a}\{{[\}}'
        ^
NameError: name 'a' is not defined
% python -V
Python 3.12.4
% black --version
black, 24.4.3.dev27+g7fa1faf (compiled: no)
Python (CPython) 3.12.4

cc @tusharsadhwani for f-strings.

@JelleZijlstra JelleZijlstra added the T: bug Something isn't working label Jul 31, 2024
@tusharsadhwani
Copy link
Contributor

tusharsadhwani commented Jul 31, 2024

>>> print(rf'{1}\{{[\}}')
1\{[\}

small tweak to the example as it shows expected formatting output

@tusharsadhwani
Copy link
Contributor

Minimal repro: f'{1}\{{'

@JelleZijlstra
Copy link
Collaborator Author

Here's another example that doesn't yield a Black crash but that we parse incorrectly:

>>> black.parsing.lib2to3_parse(r"rf'\{{ {a}'")
Node(file_input, [Node(simple_stmt, [Node(fstring, [Leaf(FSTRING_START, "rf'"), Leaf(FSTRING_MIDDLE, '\\{{ {a}'), Leaf(FSTRING_END, "'")]), Leaf(NEWLINE, '\n')]), Leaf(ENDMARKER, '')])

The {a} part gets put inside an FSTRING_MIDDLE leaf, when it should be an fstring_replacement_field node. Compare the tree if you remove the backslash:

>>> black.parsing.lib2to3_parse(r"rf'{{ {a}'")
Node(file_input, [Node(simple_stmt, [Node(fstring, [Leaf(FSTRING_START, "rf'"), Leaf(FSTRING_MIDDLE, '{{ '), Node(fstring_replacement_field, [Leaf(LBRACE, '{'), Leaf(NAME, 'a'), Leaf(RBRACE, '}')]), Leaf(FSTRING_MIDDLE, ''), Leaf(FSTRING_END, "'")]), Leaf(NEWLINE, '\n')]), Leaf(ENDMARKER, '')])

@JelleZijlstra
Copy link
Collaborator Author

And another one:

>>> black.parsing.lib2to3_parse(r"rf'\{1}'")
Node(file_input, [Node(simple_stmt, [Node(fstring, [Leaf(FSTRING_START, "rf'"), Leaf(FSTRING_MIDDLE, '\\{1}'), Leaf(FSTRING_END, "'")]), Leaf(NEWLINE, '\n')]), Leaf(ENDMARKER, '')])
>>> rf'\{1}'
'\\1'

@tusharsadhwani
Copy link
Contributor

tusharsadhwani commented Jul 31, 2024

Normal case:

$ cat foo.py
f'foo \{{'

$ python -m blib2to3.pgen2.tokenize foo.py
1,0-1,2:        FSTRING_START   "f'"
1,2-1,9:        FSTRING_MIDDLE  'foo \\{{'
1,9-1,10:       FSTRING_END     "'"
1,10-1,11:      NEWLINE '\n'
2,0-2,0:        ENDMARKER       ''

Abnormal case:

$ cat foo.py
f'{1} \{{'

$ python -m blib2to3.pgen2.tokenize foo.py
1,0-1,2:        FSTRING_START   "f'"
1,2-1,2:        FSTRING_MIDDLE  ''
1,2-1,3:        LBRACE  '{'
1,3-1,4:        NUMBER  '1'
1,4-1,5:        RBRACE  '}'
1,5-1,8:        FSTRING_MIDDLE  ' \\{'
1,8-1,9:        LBRACE  '{'
1,9-1,10:       ERRORTOKEN      "'"
1,10-1,11:      NL      '\n'
Traceback (most recent call last):
[...]

It deviates at parsing \{{ successfully, but only after having parsed at least one braced expression. This also explains the deviation in parsing.

@tusharsadhwani
Copy link
Contributor

Yeah this is interesting too.

$ python -m tokenize <<<"rf'\{2+3}'"
1,0-1,3:            FSTRING_START  "rf'"          
1,3-1,4:            FSTRING_MIDDLE '\\'           
1,4-1,5:            OP             '{'            
1,5-1,6:            NUMBER         '2'            
1,6-1,7:            OP             '+'            
1,7-1,8:            NUMBER         '3'            
1,8-1,9:            OP             '}'            
1,9-1,10:           FSTRING_END    "'"            
1,10-1,11:          NEWLINE        '\n'           
2,0-2,0:            ENDMARKER      ''             

$ python -m blib2to3.pgen2.tokenize <<<"rf'\{2+3}'"
1,0-1,3:        FSTRING_START   "rf'"
1,3-1,9:        FSTRING_MIDDLE  '\\{2+3}'
1,9-1,10:       FSTRING_END     "'"
1,10-1,11:      NEWLINE '\n'
2,0-2,0:        ENDMARKER       ''

@tusharsadhwani
Copy link
Contributor

Since we don't format expressions inside f-strings right now, this wouldn't have been caught for a long time, but thanks to this bug we know of this regression :D

@JelleZijlstra
Copy link
Collaborator Author

You can also construct crashes from these bad parses, if the replacement field contains nested strings relying on PEP 701:

% black -c 'rf"\{"a"}"'
rf"\{"a"}"
error: cannot format <string>: Cannot parse: 1:6: rf"\{"a"}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants