Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyEval_GetLocals() leaks locals #118934

Closed
colesbury opened this issue May 10, 2024 · 12 comments
Closed

PyEval_GetLocals() leaks locals #118934

colesbury opened this issue May 10, 2024 · 12 comments
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-C-API type-bug An unexpected behavior, bug, or error

Comments

@colesbury
Copy link
Contributor

colesbury commented May 10, 2024

Bug report

PyEval_GetLocals() is documented as returning a borrowed reference. It now returns a new reference, which causes callers to leak the local variables:

cpython/Python/ceval.c

Lines 2478 to 2479 in 35c4361

PyObject *locals = _PyEval_GetFrameLocals();
return locals;

cc @gaogaotiantian @markshannon

Linked PRs

@colesbury colesbury added type-bug An unexpected behavior, bug, or error topic-C-API 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels May 10, 2024
@gaogaotiantian
Copy link
Member

This is tricky, because after PEP 669, the reference PyEval_GetLocals() gets, is the only reference, for the function-level cases. We used to have a f_locals dict in the frame to host the dictionary, but we do not anymore. Each frame.f_locals call generates a new proxy and locals() (or PyEval_GetLocals()) turns that to a dict. That's the only reference, we can't return a borrowed reference.

So we either

  • Change PyEval_GetLocals() to return a new reference, or
  • Have a list in the frame object to host all the results from PyEval_GetLocals() call (because each call could result in a different dict, it's a snapshot), and release those when the frame is released.

Which is the lesser of two evils?

@encukou
Copy link
Member

encukou commented May 13, 2024

Per PEP 667:

PyEval_GetLocals() will be implemented roughly as follows:

PyObject *PyEval_GetLocals(void) {
    PyFrameObject * = ...; // Get the current frame.
    if (frame->_locals_cache == NULL) {
        frame->_locals_cache = PyEval_GetFrameLocals();
    }
    return frame->_locals_cache;
}

As with all functions that return a borrowed reference, care must be taken to ensure that the reference is not used beyond the lifetime of the object.

i.e. a proxy should be created on first acccess, and then returned.

Is that not an option any more?

@gaogaotiantian
Copy link
Member

I had a fix in #119769. It's not the prettiest solution, but it should work. We don't have a plan to deprecate this API yet, but it should be discouraged to use PyEval_GetLocals() because it returns borrowed reference, so I guess we don't need the best structure for it.

@ncoghlan
Copy link
Contributor

ncoghlan commented Jun 1, 2024

It turns out there's an internal inconsistency in PEP 667 here. When writing the docs updates in #119893 I was going off https://peps.python.org/pep-0667/#pyeval-getlocals stating that the "proxy mapping" would be cached on the frame, and https://peps.python.org/pep-0667/#changes-to-existing-apis saying "The semantics of PyEval_GetLocals() is changed as it now returns a view of the frame locals, not a dictionary." (as accepted in https://github.com/python/peps/blob/134897bc1f610a1ca9e24dd8d7ab3053fa3520aa/peps/pep-0667.rst?plain=1#L172 )

I missed the subsequent definition in terms of PyEval_GetFrameLocals() in https://peps.python.org/pep-0667/#id3 or I would have postponed those docs changes until the intention was clarified.

Caching a snapshot (as suggested in the sample code) rather than a proxy instance (as suggested in the prose) would mean that updates made via the return value of PyEval_GetLocals() wouldn't be visible through the frame's f_locals attribute or when calling PyFrame_GetLocals(f) on the result of PyEval_GetFrame().

That interoperability issue is avoided if PyEval_GetLocals() caches and returns the result of calling PyFrame_GetLocals(f) on the result of PyEval_GetFrame() (i.e. keeping its equivalence to Python-level f_locals access) rather than having it start making independent snapshots the way PyEval_GetFrameLocals() does or keeping the old "make and update a shared snapshot" behaviour. We do get the reference cycle problem mentioned in the PEP (and the updated docs), but that seems better than having the old and new APIs not playing nice with each other.

@gaogaotiantian
Copy link
Member

I think it's better to keep the behavior of PyEval_GetLocals() (return a dict, not a proxy). PyEval_GetLocals() should be equivalent to locals() except for the borrowed reference. PyEval_GetFrameLocals() is strictly equivalent to locals().

This is consistent because in that way:

PyEval_GetLocals()
PyEval_GetGlobals()
PyEval_GetBuiltins()

are borrowed version of

PyEval_GetFrameLocals()
PyEval_GetFrameGlobals()
PyEval_GetFrameBuiltins()

to their python equivalent

locals()
globals()
# a way to get builtins dict

Changing the semantics of PyEval_GetLocals() is equivalent to changing the semantics of locals() which would cause many backwards compatibility issues. It's our intention to keep locals() as it is instead of returning a proxy like f_locals, and we should do the same with PyEval_GetLocals().

@ncoghlan
Copy link
Contributor

ncoghlan commented Jun 1, 2024

We'll need to get confirmation from the SC then, since that's definitely not what the prose in the accepted PEP 667 says.

PyEval_GetLocals() never historically distinguished between whether it was emulating locals() or frame.f_locals at the Python level, since they both returned references to the same shared cache of the local variable bindings. (In gh-119893 I amended the PyEval_GetLocals() deprecation notice to point to both PyEval_GetFrameLocals and calling PyFrame_GetLocals() on the result of PyEval_GetFrame() depending on which behaviour the caller actually wants)

As written, aside from the reference to PyEval_GetFrameLocals in the sample code, PEP 667 brings PyEval_GetLocals() down on the frame.f_locals side of things, with PyEval_GetFrameLocals being introduced if you actually want a locals() style snapshot.

This approach makes sense, since using shared mutable storage is only beneficial if other consumers of that storage can see the changes that you make. If PyEval_GetLocals() keeps its Python 3.12 behaviour, then only other callers of PyEval_GetLocals() will be able to see any changes. Users of PyFrame_GetLocals() and frame.f_locals aren't accessing the internal cache anymore, so they won't see any edits. By contrast, if PyEval_GetLocals() returns a cached proxy instance, then changes will be visible to other consumers as expected (including the frame itself for callers that were also using PyFrame_LocalsToFast() in previous versions). (The original PEP 558 implementation did a bunch of tapdancing to let proxy instances still see values that only existed in the internal cache, and to replicate writes so they affected the cache as well, but actually doing that is sufficiently messy that I think tolerating the reference cycle is a better option)

Ensuring that PyEval_GetLocals() returns a dict isn't a significant concern, since that already isn't guaranteed in class scopes.

@ncoghlan
Copy link
Contributor

ncoghlan commented Jun 2, 2024

After further reflection, I've come around to the point of view that reverting PyEval_GetLocals() to its Python 3.12 behaviour isn't likely to be more disruptive than the alternative. That means as long as the SC are OK with it, the simplest resolution is to update the docs and the PEP text to align with the PyEval_GetFrameLocals() based implementation sketch rather than the other way around.

The cases that were concerning me were mainly tracing functions implemented in C, but those are going to be calling PyFrame_GetLocals on the passed in frame reference, they're not going to be calling PyEval_GetLocals.

I'll post new PRs bringing the docs and PEP into line with PyEval_GetLocals() continuing to return a cached snapshot dict.

ncoghlan added a commit to ncoghlan/cpython that referenced this issue Jun 2, 2024
PEP 667's description of the planned changes to PyEval_GetLocals
was internally inconsistent when accepted, so the docs added for
pythongh-74929 didn't match either the current behaviour or the intended
behaviour once pythongh-118934 is fixed.

This PR updates the documentation and 3.13 What's New to match the
intended behaviour (once pythongh-118934 is fixed).
ncoghlan added a commit to ncoghlan/cpython that referenced this issue Jun 2, 2024
ncoghlan added a commit that referenced this issue Jun 2, 2024
PEP 667's description of the planned changes to PyEval_GetLocals
was internally inconsistent when accepted, so the docs added for
gh-74929 didn't match either the current behaviour or the intended
behaviour once gh-118934 is fixed.

This PR updates the documentation and 3.13 What's New to match the
intended behaviour (once gh-118934 is fixed).

It also tidies up lingering references to `f_locals` always being a
dictionary (this hasn't been true since at least when custom
namespace support for class statement execution was added)
ncoghlan added a commit to ncoghlan/peps that referenced this issue Jun 2, 2024
See Implementation Notes in the updated PEP for details
of the correction/clarification and the rationale for it.

The discrepancy was detected when writing the documentation for
python/cpython#74929 and when reviewing
python/cpython#118934
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 2, 2024
PEP 667's description of the planned changes to PyEval_GetLocals
was internally inconsistent when accepted, so the docs added for
pythongh-74929 didn't match either the current behaviour or the intended
behaviour once pythongh-118934 is fixed.

This PR updates the documentation and 3.13 What's New to match the
intended behaviour (once pythongh-118934 is fixed).

It also tidies up lingering references to `f_locals` always being a
dictionary (this hasn't been true since at least when custom
namespace support for class statement execution was added)
(cherry picked from commit fd6cd62)

Co-authored-by: Alyssa Coghlan <[email protected]>
@ncoghlan
Copy link
Contributor

ncoghlan commented Jun 2, 2024

PEP text clarification PR posted here: python/peps#3809

SC ratification requested here: python/steering-council#245

(I'll be genuinely surprised if they object, but I figure it's better to be fully transparent about it)

I've already updated the docs to reflect the way PyEval_GetLocals will work once this issue is resolved.

ncoghlan added a commit that referenced this issue Jun 2, 2024
PEP 667's description of the planned changes to PyEval_GetLocals
was internally inconsistent when accepted, so the docs added for
gh-74929 didn't match either the current behaviour or the intended
behaviour once gh-118934 is fixed.

This PR updates the documentation and 3.13 What's New to match the
intended behaviour (once gh-118934 is fixed).

It also tidies up lingering references to `f_locals` always being a
dictionary (this hasn't been true since at least when custom
namespace support for class statement execution was added)
(cherry picked from commit fd6cd62)

Co-authored-by: Alyssa Coghlan <[email protected]>
@markshannon
Copy link
Member

I really think we should implement PyEval_GetLocals as described in the PEP.
https://peps.python.org/pep-0667/#id3 seems quite clear and unambiguous.

It seems that all the complication stems from wanting to return a dict instead of a FrameLocalsProxy.
What is the advantage of returning a dict? There are semantic differences from 3.12 in both cases.

barneygale pushed a commit to barneygale/cpython that referenced this issue Jun 5, 2024
PEP 667's description of the planned changes to PyEval_GetLocals
was internally inconsistent when accepted, so the docs added for
pythongh-74929 didn't match either the current behaviour or the intended
behaviour once pythongh-118934 is fixed.

This PR updates the documentation and 3.13 What's New to match the
intended behaviour (once pythongh-118934 is fixed).

It also tidies up lingering references to `f_locals` always being a
dictionary (this hasn't been true since at least when custom
namespace support for class statement execution was added)
@ncoghlan
Copy link
Contributor

ncoghlan commented Jul 10, 2024

This proved complex enough that it became a question for the Steering Council in python/steering-council#245 (the PEP text was internally inconsistent, with different sections suggesting different behaviour for PyEval_GetLocals() in 3.13+)

@Yhg1s I added the deferred blocker label (if we don't get this resolved for the final beta, it should definitely be resolved before the first release candidate)

noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
PEP 667's description of the planned changes to PyEval_GetLocals
was internally inconsistent when accepted, so the docs added for
pythongh-74929 didn't match either the current behaviour or the intended
behaviour once pythongh-118934 is fixed.

This PR updates the documentation and 3.13 What's New to match the
intended behaviour (once pythongh-118934 is fixed).

It also tidies up lingering references to `f_locals` always being a
dictionary (this hasn't been true since at least when custom
namespace support for class statement execution was added)
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 16, 2024
…honGH-119769)

(cherry picked from commit e65cb4c)

Co-authored-by: Tian Gao <[email protected]>
Co-authored-by: Alyssa Coghlan <[email protected]>
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
PEP 667's description of the planned changes to PyEval_GetLocals
was internally inconsistent when accepted, so the docs added for
pythongh-74929 didn't match either the current behaviour or the intended
behaviour once pythongh-118934 is fixed.

This PR updates the documentation and 3.13 What's New to match the
intended behaviour (once pythongh-118934 is fixed).

It also tidies up lingering references to `f_locals` always being a
dictionary (this hasn't been true since at least when custom
namespace support for class statement execution was added)
ncoghlan added a commit that referenced this issue Jul 18, 2024
…-119769) (#121869)

gh-118934: Make PyEval_GetLocals return borrowed reference (GH-119769)
(cherry picked from commit e65cb4c)

Co-authored-by: Tian Gao <[email protected]>
Co-authored-by: Alyssa Coghlan <[email protected]>
@ncoghlan
Copy link
Contributor

PyEval_GetLocals has been reverted back to its Python 3.12 behaviour for 3.13rc1 (this was supposed to be in the b4 release, but it had only been merged to main when the release was cut)

@mgorny
Copy link
Contributor

mgorny commented Aug 6, 2024

I think something went wrong with the behavior revert since the linked commit is now causing assertion errors in gpgme test suite, and I don't think upstream has done any changes to support 3.13 betas, so I doubt they've actually caused the regression. I've filed #122728.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-C-API type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

6 participants