Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make upstream window and app references weak to avoid reference cycles #2061

Closed
wants to merge 28 commits into from

Conversation

freakboy3742
Copy link
Member

@freakboy3742 freakboy3742 commented Aug 4, 2023

Fixes #1215.

The manual test case for this is in #1215. Windows won't be deleted immediately, but once the garbage collector kicks in, both the window interface and impl will be freed. I'm not sure how we'd manufacture an automated test, since it's entirely dependent on garbage collection behavior.

The underlying problem appears to be that the relations between widgets, App and Window are complex.

  • Apps own Windows, and Windows keep a reference to app
  • Widgets keep a reference to both app and window
  • Interface objects point at implementation objects, and vice versa
  • Apps and Windows both maintain a list of widgets
  • Bound methods used as signal handlers retain references to the objects they were created on. Most user handlers are bound to App; the updated macOS/iOS container handling introduced by [widget Audit] toga.ScrollContainer #1969 has a handler bound to the Window impl retained by the container, which is owned by the Window.

Python's garbage collection should pick up closed cycles between objects, but there's clearly something in this nest that is preventing collection.

This PR corrects the problem by making all "upstream" references (i.e., references from impl to interface, references from window to the app that owns the window, and references to app or window from a widget) weak. This is done by:

  • Modifying the getter and setter for app and window on the base widget into a wrapper around a weak property, rather than a full object reference,
  • Adding a getter and setter for interface on the Cocoa implementation of App and Window so that these properties are a wrapper around a weak property, rather than a full object reference
  • Modifying Cocoa's Container to retain a weak reference to the "parent" container, rather than a bound method for callbacks

This PR builds on #2058 so that the full GUI test suite for Window, plus all the changes to Container are incorporated. For convenience this is a link to the diff that is specific to this bug fix

There is almost certainly an analogous set of changes needed on the iOS backend (although it's less critical there because you can't create secondary windows).

PR Checklist:

  • All new features have been tested
  • All new features have been documented
  • I have read the CONTRIBUTING.md file
  • I will abide by the code of conduct

@samschott samschott mentioned this pull request Aug 4, 2023
11 tasks
@samschott
Copy link
Member

I can't argue with the results, it does fix the test case from #1215 (which I've updated to account for recent API changes).

Python's garbage collection should pick up closed cycles between objects, but there's clearly something in this nest that is preventing collection.

It is indeed strange that GC does not pick up the reference cycles here. My first instinct would have been a reference cycle involving ObjC instances, but those should be prevented by the explicit weak ObjC properties that we have introduced some time ago.

Another possibility could be that we are indeed holding on to an additional reference somewhere that is not part of a closed cycle. Replacing it by a weakref enables collecting the cycle, but might return an unexpected None somewhere at some point.

I'm a bit cautious about this fix because it may be only addressing the symptom and not the cause. The cause being (1) potentially a leftover non-cyclic reference somewhere and (2) a complex reference graph that makes it difficult to reason about.

But clearly you have gone through most of the reference graph, it does not seem too complex after all, and one or multiple of references that you replaced with a weekref fixed the issue. I'd be curious to see a minimal fix that addresses it though, just to maybe narrow down which references were problematic.

I'm not sure how we'd manufacture an automated test, since it's entirely dependent on garbage collection behavior.

Maybe holding on to a weak reference to a Window in a test, forcing a GC run, and then asserting that the reference was collected, would work?

@freakboy3742
Copy link
Member Author

It is indeed strange that GC does not pick up the reference cycles here. My first instinct would have been a reference cycle involving ObjC instances, but those should be prevented by the explicit weak ObjC properties that we have introduced some time ago.

That was my thought too - and my "but it can't be that" for the same reason.

Another possibility could be that we are indeed holding on to an additional reference somewhere that is not part of a closed cycle. Replacing it by a weakref enables collecting the cycle, but might return an unexpected None somewhere at some point.

That also occurred to me; but if it's true, it's a subtle one that I haven't been able to discover. I tried mapping out all the relationships in a big spiderweb of a diagram, but that has rapidly converged on being one of those "red string between photos on the wall" conspiracy maps, rather than actual enlightenment :-)

I'm a bit cautious about this fix because it may be only addressing the symptom and not the cause. The cause being (1) potentially a leftover non-cyclic reference somewhere and (2) a complex reference graph that makes it difficult to reason about.

100% agreed on this. As a fix, there's at least a consistent rationale (i.e., "upstreamness") behind why the specific references are weakrefs, but it's not a very satisfying answer specifically because circular references should be resolved by the GC.

But clearly you have gone through most of the reference graph, it does not seem too complex after all, and one or multiple of references that you replaced with a weekref fixed the issue. I'd be curious to see a minimal fix that addresses it though, just to maybe narrow down which references were problematic.

Honestly - AFAICT, this is the minimal fix (or, at least, a minimal fix). The only part that is maybe optional is the references to app - app references don't appear to cause problems deleting windows, but the implementation is so symmetrical with window that I included them in this PR. But if I only apply 1 or 2 of the three other fixes (interface references, window references, and container callback references), the GC doesn't kick in - you need all three for it to work.

@samschott
Copy link
Member

samschott commented Aug 6, 2023

Ok, you have convinced me that weakrefs are indeed a pragmatic solution :)

Do you know if the window deletion issue is limited to macOS or also occurs on Windows and Gtk backends? The window and app reference fixes will apply to all backends but your interface reference fix is macOS only. Since the general structure of backrefs to the interface layer is more general, it may be worth extending the weakrefs to other backends as well. I don't currently have access to a machine with VMs to test this myself.

Curiously, widget implementations have similar backrefs to the interface layer. You haven't introduced weakrefs here, and it seems that those are indeed not needed. A similar example as in #1215 but for Widgets works just fine with the current public release 0.3.1. Admittedly, the reference graph is a lot simpler here.

Widgets memleak test
import gc
import toga
from toga.constants import COLUMN
from toga.style import Pack


class Window(toga.Window):

    instances = 0

    def __init__(self):
        super().__init__(title=f'Window: {Window.instances}', position=(200, 200))
        Window.instances += 1

    def __del__(self):
        Window.instances -= 1


class WidgetRefcountExample(toga.App):

    def startup(self):

        self.window = None

        self.button_create = toga.Button(
            'Create window',
            style=Pack(padding=10, width=120),
            on_press=self.on_create,
        )

        self.button_delete = toga.Button(
            'Close and delete window',
            style=Pack(padding=10, width=120),
            on_press=self.on_delete,
        )

        self.label = toga.Label(
            f'Instances: {Window.instances}',
            style=Pack(padding=10, width=120),
        )

        self.button_create.enabled = True
        self.button_delete.enabled = False

        self.box = toga.Box(
            children=[
                self.button_create,
                self.button_delete,
                self.label,
            ],
            style=Pack(direction=COLUMN),
        )

        self.main_window = toga.MainWindow(title="Toga")
        self.main_window.content = self.box
        self.main_window.show()

    def on_create(self, sender):
        if not self.window:
            self.window = Window()
            self.app.windows += self.window
            self.window.content = toga.Box()
            self.window.show()
            self.label.text = f'Instances: {Window.instances}'

        self.button_create.enabled = False
        self.button_delete.enabled = True

    def on_delete(self, sender):
        if self.window:
            self.window.close()
            self.window = None
            gc.collect()  # clean up ref cycles before displaying instance count
            self.label.text = f'Instances: {Window.instances}'

        self.button_create.enabled = True
        self.button_delete.enabled = False


def main():
    return WidgetRefcountExample('Example', 'org.beeware.refcount')


if __name__ == '__main__':
    app = main()
    app.main_loop()

self._interface = weakref.ref(value)

def __del__(self):
print("DEL WINDOW IMPL")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover debug statement?

Copy link
Member

@samschott samschott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went spelunking around a bit and removed as many of the weak refs as I could without re-introducing the memory leak.

I could remove all but one: The weakref that a toga.Widget holds to the toga.Window it belongs to. That lead me to conclude: The window is kept alive by widgets that still reference it, therefore the widgets themselves must be kept alive by something even though they are no longer required when the window is closed.

The obvious reference cycles are all closed loops, but one point stood out: The registries toga.App.windows and toga.App.widgets. When a window is closed, it is removed from toga.App.windows by toga.Window.close():

def close(self) -> None:
self.app.windows -= self
self._impl.close()

However, the window's widgets still remain in toga.App.widgets add each one of them still holds a reference to the window. There are our leftover non-cycling references!

I've drafted a minimal fix as a PR at #2066.

My original diagnosis still stands, all the cross references and the different parts of code that manage them are too complicated to track and reason about easily. Even though my suggestion fixes the memory leak, your weak ref solution would make regressions more difficult (though note that it only cleans up the references to windows, references to their widgets would have still lived on forever). If both of us are struggling, imagine someone new to the codebase.

I don't have immediate suggestions to improve things though :(

Maybe remove all cross-references that are not strictly required? The registries are very convenient though. Or centralise and simplify the code when:

  • Windows are added to or removed from an app.
  • Widgets are added to or removed from a window, a parent widget (such as a box) or a container.

There is a lot of very similar code for the above cases and a regression in any of those places can be problematic.

@freakboy3742
Copy link
Member Author

Do you know if the window deletion issue is limited to macOS or also occurs on Windows and Gtk backends?

It's not limited to macOS; the same test case causes problems on GTK and Windows, and the same fix resolves the issue. I've added that to the PR.

I guess conceptually the same issue exists on iOS and Android, but you can't create and destroy additional windows, so it's a moot point. I've added the weakrefs for consistency.

Curiously, widget implementations have similar backrefs to the interface layer.

Yeah - I did some poking around this and came to the same conclusion. I do wonder if a more complicated widget graph, with widget references held by the app and closures on widget event handlers would manufacture a problem, but at least simple tests of this didn't reveal a problem for me.

However, the window's widgets still remain in toga.App.widgets add each one of them still holds a reference to the window. There are our leftover non-cycling references!

This is indeed an obvious source of leakage; however, as noted on that PR, it seems to be an incomplete fix.

Maybe remove all cross-references that are not strictly required?

The problem is that it's difficult to judge what is strictly required. I guess now that we've got a (near 100%) functional test suite, we'd be in a position to prune the references that aren't actually used in practice;

There is a lot of very similar code for the above cases and a regression in any of those places can be problematic.

Agreed there's some similarities; @mhsmith has also remarked that there are some commonalities that could be cleaned up. Conceptually, I'm in favor of any cleanup that reduces duplication and turns the backends into an even more "dumb" wrapper around native widgets; however, I'm not sure I have a clear vision on exactly what that simplification would look like in practice. There's enough differences between widget lifecycle on different platforms that I'm not sure there is a completely clean abstraction of widget management. I'd definitely be happy to be proven wrong on this, though.

@freakboy3742
Copy link
Member Author

I've added a test case on the testbed to verify that both the window interface and impl is deleted on each platform. I've done this on the testbed rather than core/dummy because:

  1. The dummy backend contains a bunch of additional references in the event log that are very difficult to remove
  2. Backends can have additional complications (such as references to native objects) that the dummy backend won't exhibit.

@freakboy3742 freakboy3742 marked this pull request as ready for review August 7, 2023 04:03
@samschott
Copy link
Member

Hm, from the last CI run, I still see the Windows test test_secondary_window_cleanup fail due to a remaining reference. Could this possibly be a similar issue as previously on macOS with the native layer holding references to the interface and implementation that introduce non-Python reference cycles?

@freakboy3742
Copy link
Member Author

Hm, from the last CI run, I still see the Windows test test_secondary_window_cleanup fail due to a remaining reference. Could this possibly be a similar issue as previously on macOS with the native layer holding references to the interface and implementation that introduce non-Python reference cycles?

I think that failure is a red herring (at least for now). There's no Window test backend implemented for Winforms yet, so all the windows tests are failing because they can't find the probe for a Window. The secondary window reference test fails in a slightly different way specifically because it doesn't have a fixture, so it's not failing during fixture creation like all the other tests.

@mhsmith and I have been splitting the work of the audit process; I do a first pass handling Cocoa, iOS, and GTK; he follows behind handling Windows and Android. As a side effect of this distribution of labor, the tests for Android and Winforms tend to be explosive until he's started work on whatever feature is being audited.

@samschott
Copy link
Member

samschott commented Aug 12, 2023

I think that failure is a red herring (at least for now). There's no Window test backend implemented for Winforms yet, so all the windows tests are failing because they can't find the probe for a Window. The secondary window reference test fails in a slightly different way specifically because it doesn't have a fixture, so it's not failing during fixture creation like all the other tests.

I'm not so sure about that. From the test logs for test_secondary_window_cleanup:

Traceback (most recent call last):
  File "D:\a\toga\toga\testbed\build\testbed\windows\app\src\app\tests\test_window.py", line 211, in test_secondary_window_cleanup
    assert window_ref() is None
AssertionError: assert <toga.window.Window object at 0x000001CAE3CB4FA0> is None
 +  where <toga.window.Window object at 0x000001CAE3CB4FA0> = <weakref at 0x000001CAE4F77B50; to 'Window' at 0x000001CAE3CB4FA0>()

This looks like a test case failure to me. I also finally spun up a windows VM (pythonnet works on Apple Silicon!), and my manual example from #1215 still fails with this branch. It does pass however after removing circular references from Pythonnet to the toga_winforms.Window instance (mostly from instance methods that are set as callbacks).

@freakboy3742
Copy link
Member Author

I'm going to close this in favor of #2066; the approach there is a lot simpler, and isn't platform dependent.

@freakboy3742 freakboy3742 deleted the ownership-cleanup branch August 16, 2023 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows are not released when main reference is deleted
2 participants