Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-121130: Fix f-string format specifiers with debug expressions #121150

Merged
merged 10 commits into from
Jul 16, 2024

Conversation

pablogsal
Copy link
Member

@pablogsal pablogsal commented Jun 29, 2024

@pablogsal
Copy link
Member Author

@lysnikolaou do you mind taking a look at this approach? I think is not too bad and it solidifies a bit more the concept of being in a format spec

@@ -1444,8 +1444,16 @@ expr_ty _PyPegen_formatted_value(Parser *p, expr_ty expression, Token *debug, Re
conversion_val = (int)'r';
}

expr_ty format_expr = format ? (expr_ty) format->result : NULL;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This here is because when there are debug expressions we are wrapping the values in a node that then is wrapped in turn, but that is not what 3.11 and before does so we need to detect that and unwrap.

Copy link
Contributor

@lysnikolaou lysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach, although it seems like the unpacking does not work correctly.

@@ -1387,6 +1395,7 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct
return MAKE_TOKEN(_PyTokenizer_syntaxerror(tok, "f-string: expressions nested too deeply"));
}
TOK_GET_MODE(tok)->kind = TOK_REGULAR_MODE;
current_tok->in_format_spec = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this always be 0 here, if we've correctly reset it when the format spec is ending?

Parser/action_helpers.c Outdated Show resolved Hide resolved
@pablogsal
Copy link
Member Author

@lysnikolaou Ugh... this is much more complicated indeed. We need to restructure how we propagate the format specifier and how we capture the f-string expressions on format specifiers :(

@pablogsal pablogsal marked this pull request as draft July 1, 2024 15:16
@pablogsal
Copy link
Member Author

Also the AST is so weird because it removes {. Check this out:

python3.11 -m ast <<< 'f"{datetime.datetime.now():h1{y2=}}"'
Module(
   body=[
      Expr(
         value=JoinedStr(
            values=[
               FormattedValue(
                  value=Call(
                     func=Attribute(
                        value=Attribute(
                           value=Name(id='datetime', ctx=Load()),
                           attr='datetime',
                           ctx=Load()),
                        attr='now',
                        ctx=Load()),
                     args=[],
                     keywords=[]),
                  conversion=-1,
                  format_spec=JoinedStr(
                     values=[
                        Constant(value='h1y2='),
                        FormattedValue(
                           value=Name(id='y2', ctx=Load()),
                           conversion=114)]))]))],
   type_ignores=[])

notice that the node is h1y2 without the { in the middle :S

@pablogsal pablogsal marked this pull request as ready for review July 1, 2024 20:45
@pablogsal
Copy link
Member Author

I think this should do the trick. Boy this was hard to fix :S

@pablogsal
Copy link
Member Author

Hummm I will investigate the failure soon

res = _PyAST_JoinedStr(spec, lineno, col_offset, end_lineno,
end_col_offset, p->arena);
} else {
res = _PyPegen_concatenate_strings(p, spec,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is so we merge and concatenate the Constant and JoinedStr nodes. See how the tree originally looks like here: #121150 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the tree in #121150 is false, but also running that same example with the latest status of your branch gives me the same exact one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. That tree is the correct one, but without this call it basically has a bunch of Constants all together and the Joined strings are nested (comment the code and check it out to see what I mean).

}
n_flattened_elements++;
break;
case JoinedStr_kind:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is to accommodate the case when we call this function with other things than JoinedStr and constants

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this happen? It probably shouldn't, but I may be missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I am flattening collections that contain new nodes in https://github.com/python/cpython/pull/121150/files#r1662410566 (it's a new call over a new array that was not happening before)

@@ -1317,15 +1329,18 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct
tok->multi_line_start = tok->line_start;
while (end_quote_size != current_tok->f_string_quote_size) {
int c = tok_nextc(tok);
if (tok->done == E_ERROR) {
if (tok->done == E_ERROR || tok->done == E_DECODE) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug, as if we enter E_DECODE state we were not returning soon enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently came across this in another branch I'm working on. If we check for this here, we can probably remove the if (tok->decoding_erred) about 10 lines below, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will do

Parser/lexer/lexer.c Outdated Show resolved Hide resolved
Copy link
Contributor

@lysnikolaou lysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments/questions.

Parser/action_helpers.c Outdated Show resolved Hide resolved
res = _PyAST_JoinedStr(spec, lineno, col_offset, end_lineno,
end_col_offset, p->arena);
} else {
res = _PyPegen_concatenate_strings(p, spec,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the tree in #121150 is false, but also running that same example with the latest status of your branch gives me the same exact one.

}
n_flattened_elements++;
break;
case JoinedStr_kind:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this happen? It probably shouldn't, but I may be missing something.

Parser/lexer/lexer.c Outdated Show resolved Hide resolved
@@ -1317,15 +1329,18 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct
tok->multi_line_start = tok->line_start;
while (end_quote_size != current_tok->f_string_quote_size) {
int c = tok_nextc(tok);
if (tok->done == E_ERROR) {
if (tok->done == E_ERROR || tok->done == E_DECODE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently came across this in another branch I'm working on. If we check for this here, we can probably remove the if (tok->decoding_erred) about 10 lines below, right?

@@ -1406,13 +1418,15 @@ tok_get_fstring_mode(struct tok_state *tok, tokenizer_mode* current_tok, struct
// scanning (indicated by the end of the expression being set) and we are not at the top level
// of the bracket stack (-1 is the top level). Since format specifiers can't legally use double
// brackets, we can bypass it here.
if (peek == '}' && !in_format_spec) {
int cursor = current_tok->curly_bracket_depth;
if (peek == '}' && !in_format_spec && cursor == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the cursor check here?

pablogsal and others added 2 commits July 4, 2024 15:32
Co-authored-by: Lysandros Nikolaou <[email protected]>
Co-authored-by: Lysandros Nikolaou <[email protected]>
@pablogsal
Copy link
Member Author

@lysnikolaou can you take another look so we can get this into rc1?

@pablogsal pablogsal added needs backport to 3.13 bugs and security fixes needs backport to 3.12 bug and security fixes labels Jul 16, 2024
@pablogsal
Copy link
Member Author

I'm landing so this gets into the last 3.13 beta to get some coverage. We can fix anything @lysnikolaou doesn't like later

@pablogsal pablogsal merged commit c46d64e into python:main Jul 16, 2024
38 checks passed
@miss-islington-app
Copy link

Thanks @pablogsal for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @pablogsal, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker c46d64e0ef8e92a6b4ab4805d813d7e4d6663380 3.13

@miss-islington-app
Copy link

Sorry, @pablogsal, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker c46d64e0ef8e92a6b4ab4805d813d7e4d6663380 3.12

@bedevere-app
Copy link

bedevere-app bot commented Jul 16, 2024

GH-121868 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Jul 16, 2024
pablogsal added a commit to pablogsal/cpython that referenced this pull request Jul 16, 2024
…ressions (pythonGH-121150)

(cherry picked from commit c46d64e)

Co-authored-by: Pablo Galindo Salgado <[email protected]>
pablogsal added a commit that referenced this pull request Jul 16, 2024
pablogsal added a commit to pablogsal/cpython that referenced this pull request Jul 20, 2024
…ressions (pythonGH-121150)

(cherry picked from commit c46d64e)

Co-authored-by: Pablo Galindo Salgado <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Jul 20, 2024

GH-122063 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 bug and security fixes label Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants