-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python (shutdown) leaks #412
Milestone
Comments
jiridanek
added a commit
to jiridanek/skupper-router
that referenced
this issue
May 6, 2022
jiridanek
added a commit
to jiridanek/skupper-router
that referenced
this issue
May 6, 2022
Another resource is NumPy, https://numpy.org/devdocs/dev/development_advanced_debugging.html |
jiridanek
added a commit
to jiridanek/skupper-router
that referenced
this issue
May 9, 2022
This was
linked to
pull requests
May 9, 2022
jiridanek
added a commit
to jiridanek/skupper-router
that referenced
this issue
May 9, 2022
jiridanek
added a commit
that referenced
this issue
May 9, 2022
jiridanek
added a commit
to jiridanek/skupper-router
that referenced
this issue
May 9, 2022
jiridanek
added a commit
to jiridanek/skupper-router
that referenced
this issue
May 10, 2022
jiridanek
added a commit
to jiridanek/skupper-router
that referenced
this issue
May 13, 2022
jiridanek
added a commit
to jiridanek/skupper-router
that referenced
this issue
May 25, 2022
jiridanek
added a commit
that referenced
this issue
May 26, 2022
* This is revival of #259 * It attempts to fix issue described in #412 * and by the way it seems to also fix #253 ## About capsules and men The Capsule is a container for a pointer, opaque from the Python side, and translucent from the C side. The C code can put a pointer in and Python can treat it as py_object and pass it to some other C code, but cannot unwrap the pointer (unless it really tries, see example code on Stack Overflow for how to break the encapsulation). The important thing for our purposes is that Capsules can own the pointer and can have a destructor associated. More at https://docs.python.org/3/c-api/capsule.html The idea of the #253 fix is to use Capsule to associate destructor to the pointer, and then rely on Python to run GC eventually and free the memory. Thereby leak is prevented. ## Getting the GC to actually run Capsule is owned by `AppStats`, which is owned by `PolicyLocal`, which is contained in `PolicyManager` which is contained in `Agent`, which is very bad, because `Agent` cannot be GC'd because there is a refcounting cycle with `IoAdapter`, which does not participate in the cyclic GC. The Dispatch PR which tried to fix the GC for IoAdapter hit into weird shutdown crashes because when the Python was actually running GC fully (which it was not before) it hit... issues. ## In conclusion I considered writing a TODO list here, but I guess it would be best to just discuss this first... Honestly, I am not sure why I am not seeing the horrible problems that plagued the Dispatch PR. And I still feel uneasy about it. One thing I had to do in Dispatch that does not seem necessary here is this (in python_embedded.c::qd_io_rx_handler) ```diff diff --git a/src/python_embedded.c b/src/python_embedded.c --- a/src/python_embedded.c (revision 922ca0ad7d43f2d838c4bbb282d69e14b8b457c3) +++ b/src/python_embedded.c (revision 714e506d601054eb839e5968caf01c81afa1c3f0) @@ -639,6 +639,11 @@ IoAdapter *self = (IoAdapter*) context; *error = 0; + if (self->handler == NULL) { + *error = qdr_error(QD_AMQP_COND_INTERNAL_ERROR, "Router is shutting down"); + return PN_REJECTED; + } + ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Brought forward from DISPATCH-1962 Make LSan suppressions more targeted and specific (within reason)
= Python leaks
Leaks in CPython itself are known and there is ongoing effort to fight them: https://bugs.python.org/issue1635741 (https://bugs.python.org/issue25302) and https://www.python.org/dev/peps/pep-3121
Here's valgrind suppression file from somebody who actually investigated the Python leaks and identified the harmless ones: https://github.com/libgit2/pygit2/blob/master/misc/valgrind-python.supp
= Router leaks
One example of a hidden leak in dispatch, which is revealed by making Python suppressions more targetted:
The problem is in
IoAdapter
refers toAgent
(through the bound method referenceself.receive
) andAgent
refers toIoAdapter
(through propertyself.io
). SinceIoAdapter
is implemented in C and does not implement support for Python's cyclic GC, there is no way to break the cycle, collect the garbage, and run destructors.Heap dump in attachment of the original Jira. The bound method is at the top of the picture. (Ignore the
Mock
objects, I was trying to simplify the picture while not getting crashes due to too much meddling).== Random observations
It is possible to build special Debug build of Python, which has tools to detect leaks, asserts to prevent negative refcounts, etc. https://pythonextensionpatterns.readthedocs.io/en/latest/debugging/debug_python.html#debug-version-of-python-memory-alloc-label
Use the following to detect python leaks (instead of valgrind)
https://docs.python.org/3/library/tracemalloc.html
Use https://pypi.org/project/objgraph (with graphviz) to view heap object trees. The following renders the picture as a png under /tmp and prints the path to stdout.
== Tools
=== CPyChecker
=== Pungi
The text was updated successfully, but these errors were encountered: