Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The method for storing docstrings in code objects is awkward and prevents optimizations. #126072

Open
markshannon opened this issue Oct 28, 2024 · 11 comments
Labels
3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@markshannon
Copy link
Member

markshannon commented Oct 28, 2024

Currently, the zeroth constant in a code object's co_consts tuple is the docstring, iff it is a string.

This means that any code object without a docstring must not have a string as its first constant. To guarantee this we generally insert None as the first constant.

This prevents a few improvements we would like to make, such as moving None from LOAD_CONST to LOAD_COMMON_CONST,
and complicates handling of code objects in the compiler.

I propose adding a flag to co_flags, CO_HAS_DOCSTRING. If this flag is set then the docstring is the zeroth string, otherwise there is no docstring.

Linked PRs

@markshannon markshannon added performance Performance or resource usage 3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Oct 28, 2024
@xuantengh
Copy link
Contributor

xuantengh commented Oct 28, 2024

Hi, I'm not sure whether the PR is simple. If so, I would like to take this.

@markshannon
Copy link
Member Author

I don't know how simple it will be. It will involve the internals of the bytecode compiler.
But feel free to give it a try.

Feel free to ask questions if you get stuck.
If you decide not to do it, please let us know so someone else can take over.

@xuantengh
Copy link
Contributor

This prevents a few improvements we would like to make, such as moving None from LOAD_CONST to LOAD_COMMON_CONST,
and complicates handling of code objects in the compiler.

IIUC, this issue does not include these topics, these are the future works, right?

But feel free to give it a try.

Thanks, I'll give it a try and open a PR.

@markshannon
Copy link
Member Author

IIUC, this issue does not include these topics, these are the future works, right?

Yes

@iritkatriel
Copy link
Member

Note that once this is done, astfold_body in Python/ast_opt.c can be simplified - it currently converts strings in optimised mode into an f-string (via_PyAST_JoinedStr) to avoid them being interpreted as docstrings.

@xuantengh
Copy link
Contributor

Note that once this is done, astfold_body in Python/ast_opt.c can be simplified - it currently converts strings in optimised mode into an f-string (via_PyAST_JoinedStr) to avoid them being interpreted as docstrings.

Hi Irit, I'm also interested in this. But currently I'm not sure how astfold_body can be simplified.

cpython/Python/ast_opt.c

Lines 703 to 717 in dcad8fe

if (!docstring && _PyAST_GetDocString(stmts) != NULL) {
stmt_ty st = (stmt_ty)asdl_seq_GET(stmts, 0);
asdl_expr_seq *values = _Py_asdl_expr_seq_new(1, ctx_);
if (!values) {
return 0;
}
asdl_seq_SET(values, 0, st->v.Expr.value);
expr_ty expr = _PyAST_JoinedStr(values, st->lineno, st->col_offset,
st->end_lineno, st->end_col_offset,
ctx_);
if (!expr) {
return 0;
}
st->v.Expr.value = expr;
}

Currently, I can enter this if block via compiling a function with multiple lines of "docstrings" with PYTHONOPTIMIZE=2:

def has_docstring(x, y):
    """This is a fisrt-line doc string"""
    """This is a second-line doc string"""
    a = x + y
    b = x - y
    return a, b

IIUC, it converts the expression in the first statement from Constant_kind to JoinedStr_kind. But this happens before the symbol table building, while current CO_HAS_DOCSTRING calculation relies on symbol table entry.

@iritkatriel
Copy link
Member

This function removes the docstrings in optimised mode (running python.exe -OO). If it removes a docstring, and the next expression is also a string, then we don't want the compiler to get confused and think that the second expression is the docstring. So here it turns it into an f-string so that it has the same behaviour in the program, but the compiler leaves it alone.

It's a hack, will be good to get rid of it now if we can. But I see your point about this happening before symtable. Maybe the docstring removal optimization can move to the symtable where you currently mark the docstring as existing/not existing?

This would need to be documented as a visible change to the "optimised ast", which will no longer have docstrings removed. But I don't think that a problem because all these apis are considered unstable, and the "optimised ast" was only exposed to users in 3.13 in #108154 so there shouldn't be much disruption in changing it now.

@xuantengh
Copy link
Contributor

Maybe the docstring removal optimization can move to the symtable where you currently mark the docstring as existing/not existing?

It's feasible. But I'm not sure whether deferring the optimization from the AST to code object construction will affect the semantics of -OO. Meanwhile, for docstrings in modules and classses, I'm concerning whether this change applies to them as well.

@xuantengh
Copy link
Contributor

xuantengh commented Oct 30, 2024

Meanwhile, for docstrings in modules and classses, I'm concerning whether this change applies to them as well.

After preliminary investigation, I think they follow the same logic with function, and it's not the concern. Maybe we can open another issue to move the -OO docstring removal optimization to code object construction stage?

@xuantengh
Copy link
Contributor

Just found some cases we missed, need to fix them together with the docstring removal PR or a new isolated PR:

cpython/Python/codegen.c

Lines 667 to 669 in d467d92

// Insert None into consts to prevent an annotation
// appearing to be a docstring
_PyCompile_AddConst(c, Py_None);

cpython/Python/codegen.c

Lines 1595 to 1597 in d467d92

/* Make None the first constant, so the evaluate function can't have a
docstring. */
RETURN_IF_ERROR(_PyCompile_AddConst(c, Py_None));

cpython/Python/codegen.c

Lines 1893 to 1895 in d467d92

/* Make None the first constant, so the lambda can't have a
docstring. */
RETURN_IF_ERROR(_PyCompile_AddConst(c, Py_None));

I think we can use ste->ste_has_docstring to ensure there are no docstring in these cases.

@iritkatriel
Copy link
Member

Would be good to cover those cases with some tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.14 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

3 participants