-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-121130: Fix f-string format specifiers with debug expressions #121150
Conversation
pablogsal
commented
Jun 29, 2024
•
edited by bedevere-app
bot
Loading
edited by bedevere-app
bot
- Issue: Self-documenting f-string in conversion specifier throws ValueError #121130
@lysnikolaou do you mind taking a look at this approach? I think is not too bad and it solidifies a bit more the concept of being in a format spec |
Parser/action_helpers.c
Outdated
@@ -1444,8 +1444,16 @@ expr_ty _PyPegen_formatted_value(Parser *p, expr_ty expression, Token *debug, Re | |||
conversion_val = (int)'r'; | |||
} | |||
|
|||
expr_ty format_expr = format ? (expr_ty) format->result : NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This here is because when there are debug expressions we are wrapping the values in a node that then is wrapped in turn, but that is not what 3.11 and before does so we need to detect that and unwrap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach, although it seems like the unpacking does not work correctly.
@@ -1387,6 +1395,7 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct | |||
return MAKE_TOKEN(_PyTokenizer_syntaxerror(tok, "f-string: expressions nested too deeply")); | |||
} | |||
TOK_GET_MODE(tok)->kind = TOK_REGULAR_MODE; | |||
current_tok->in_format_spec = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this always be 0 here, if we've correctly reset it when the format spec is ending?
@lysnikolaou Ugh... this is much more complicated indeed. We need to restructure how we propagate the format specifier and how we capture the f-string expressions on format specifiers :( |
Also the AST is so weird because it removes
notice that the node is |
I think this should do the trick. Boy this was hard to fix :S |
Hummm I will investigate the failure soon |
res = _PyAST_JoinedStr(spec, lineno, col_offset, end_lineno, | ||
end_col_offset, p->arena); | ||
} else { | ||
res = _PyPegen_concatenate_strings(p, spec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is so we merge and concatenate the Constant and JoinedStr nodes. See how the tree originally looks like here: #121150 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the tree in #121150 is false, but also running that same example with the latest status of your branch gives me the same exact one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion. That tree is the correct one, but without this call it basically has a bunch of Constants all together and the Joined strings are nested (comment the code and check it out to see what I mean).
} | ||
n_flattened_elements++; | ||
break; | ||
case JoinedStr_kind: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is to accommodate the case when we call this function with other things than JoinedStr
and constants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this happen? It probably shouldn't, but I may be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I am flattening collections that contain new nodes in https://github.com/python/cpython/pull/121150/files#r1662410566 (it's a new call over a new array that was not happening before)
@@ -1317,15 +1329,18 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct | |||
tok->multi_line_start = tok->line_start; | |||
while (end_quote_size != current_tok->f_string_quote_size) { | |||
int c = tok_nextc(tok); | |||
if (tok->done == E_ERROR) { | |||
if (tok->done == E_ERROR || tok->done == E_DECODE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug, as if we enter E_DECODE
state we were not returning soon enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recently came across this in another branch I'm working on. If we check for this here, we can probably remove the if (tok->decoding_erred)
about 10 lines below, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments/questions.
res = _PyAST_JoinedStr(spec, lineno, col_offset, end_lineno, | ||
end_col_offset, p->arena); | ||
} else { | ||
res = _PyPegen_concatenate_strings(p, spec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the tree in #121150 is false, but also running that same example with the latest status of your branch gives me the same exact one.
} | ||
n_flattened_elements++; | ||
break; | ||
case JoinedStr_kind: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this happen? It probably shouldn't, but I may be missing something.
@@ -1317,15 +1329,18 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct | |||
tok->multi_line_start = tok->line_start; | |||
while (end_quote_size != current_tok->f_string_quote_size) { | |||
int c = tok_nextc(tok); | |||
if (tok->done == E_ERROR) { | |||
if (tok->done == E_ERROR || tok->done == E_DECODE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recently came across this in another branch I'm working on. If we check for this here, we can probably remove the if (tok->decoding_erred)
about 10 lines below, right?
@@ -1406,13 +1418,15 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct | |||
// scanning (indicated by the end of the expression being set) and we are not at the top level | |||
// of the bracket stack (-1 is the top level). Since format specifiers can't legally use double | |||
// brackets, we can bypass it here. | |||
if (peek == '}' && !in_format_spec) { | |||
int cursor = current_tok->curly_bracket_depth; | |||
if (peek == '}' && !in_format_spec && cursor == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the cursor check here?
Co-authored-by: Lysandros Nikolaou <[email protected]>
Co-authored-by: Lysandros Nikolaou <[email protected]>
@lysnikolaou can you take another look so we can get this into rc1? |
I'm landing so this gets into the last 3.13 beta to get some coverage. We can fix anything @lysnikolaou doesn't like later |
Thanks @pablogsal for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. |
Sorry, @pablogsal, I could not cleanly backport this to
|
Sorry, @pablogsal, I could not cleanly backport this to
|
GH-121868 is a backport of this pull request to the 3.13 branch. |
…ressions (pythonGH-121150) (cherry picked from commit c46d64e) Co-authored-by: Pablo Galindo Salgado <[email protected]>
…ressions (pythonGH-121150) (cherry picked from commit c46d64e) Co-authored-by: Pablo Galindo Salgado <[email protected]>
GH-122063 is a backport of this pull request to the 3.12 branch. |