Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] keep_alive_impl should admit reference cycles, and ideally be released by GC? #2761

Open
EricCousineau-TRI opened this issue Dec 30, 2020 · 6 comments
Assignees

Comments

@EricCousineau-TRI
Copy link
Collaborator

EricCousineau-TRI commented Dec 30, 2020

Upstream motivation - memory leaks that don't get released:
RobotLocomotion/drake#14387

Basically, because of our usage of unique_ptr and how we've design our C++ public API and bindings (RobotLocomotion/drake#13058), there are some awkward edge cases when we have "opaque" memory transfers.

This arose because we wanted to fix a segfault: RobotLocomotion/drake#14356
However, this reference cycle appears to grow without bounds.

TODO(me): Explicitly try gc.collect() with aggressive collection arguments.
See below: #2761 (comment)

I'm filing this now because @YannickJadoul mentioned that keep_alive may be useful for #1941 (py::dynamic_attr coordinating with "mismatched" lifetime between py::detail::instance and the actual shared_ptr<T>). A keep_alive may fix that issue in a more general way, but will admit reference cycles

Note: I am speculating that the current approach (based on Boost.Python, per @bmerry's comment in #880) doesn't admit GC collection when cycles are introduced. I should confirm this.

\cc @rwgk @rhaschke about this sharp edge case with unique_ptr<> (#2646)

@EricCousineau-TRI
Copy link
Collaborator Author

Related references about CPython GC can be seen here:
#2757

Note: I will only be considering CPython as we do not use PyPy for our usage at present (Drake: https://github.com/RobotLocomotion/drake)

@EricCousineau-TRI
Copy link
Collaborator Author

EricCousineau-TRI commented Dec 30, 2020

Some thoughts after reading the CPython dev guide (https://devguide.python.org/garbage_collector/)

  • Could be possible if keep-alive tracking is exposed more directly to garbage collector; e.g. make them be proper references, but somehow add a gc hook to rescan the keep alive list.
  • Could also make an explicit admission of reference-compatible keep-alive; i.e. py::keep_alive_cycle<A, B>(). However, making this work could be a bit weird.

Additional steps:

On point (3) - @wlav, can I ask if y'all ever had to deal with this kind of thing in cppyy?

@EricCousineau-TRI
Copy link
Collaborator Author

Filed #2762 to show random stuff

@EricCousineau-TRI EricCousineau-TRI self-assigned this Dec 30, 2020
@bmerry
Copy link
Contributor

bmerry commented Dec 31, 2020

Keep in mind that it probably won't be possible to implement tp_clear without the user providing the C++ side of it (to safely clear the C++ references that keep-alive is tracking).

@wlav
Copy link

wlav commented Jan 2, 2021

@EricCousineau-TRI: the equivalent of keep_alive in cppyy are "life lines" that are placed in the relevant object's dictionary with a name based on the expected replacement policy/life time length. The whole machinery is thus fully exposed to the garbage collector. (There are internal flags for performance reasons to prevent the need of filling/checking the dictionary unnecessarily.) The big difference is that cppyy, rather than the python developer, determines the need for life lines (there is a custom flag that developers can set, but that just tells cppyy to add life lines as necessary, it does not set the life line directly, and I'm not aware of anyone actually using it).

Also, cppyy uses life lines when python objects are used as buffers (I don't think it's possible to automatically support unicode properly without it), which pybind11 (AFAIK) does not do; and selectively for iterators, which pybind11 does, too (but for slightly different selections). It's for the former case that life lines are labelled to deal with replacement (usually with the pointer address on the C++ side for which the python object buffers).

@EricCousineau-TRI
Copy link
Collaborator Author

@rwgk and @dabrahams had a brief convo regarding this. Regarding (2):

Ralf's question:

... about the mechanism behind custodian & ward in BP terms (patient & nurse in pybind11 terms):

  1. Review Boost.Python's setup for similar mechanisms (see if they have notes there)

Focused [sic] question: are you doing something special to avoid introducing reference cycles that GC cannot discover?

Dave's response:

I don't think I do anything to try to avoid reference cycles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants