Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid creating reference cycles #1805

Merged
merged 3 commits into from
Jan 8, 2021
Merged

Avoid creating reference cycles #1805

merged 3 commits into from
Jan 8, 2021

Conversation

njsmith
Copy link
Member

@njsmith njsmith commented Nov 16, 2020

When put together, this change +
python-trio/outcome#29 should fix #1770

Note that the tests on this PR are expected to fail until that fix to
outcome lands and has been released, since this PR includes a full
end-to-end test.

@codecov
Copy link

codecov bot commented Nov 16, 2020

Codecov Report

Merging #1805 (de07659) into master (5110153) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1805      +/-   ##
==========================================
+ Coverage   99.59%   99.62%   +0.03%     
==========================================
  Files         114      114              
  Lines       14512    14550      +38     
  Branches     1108     1110       +2     
==========================================
+ Hits        14453    14496      +43     
+ Misses         42       38       -4     
+ Partials       17       16       -1     
Impacted Files Coverage Δ
trio/_core/_multierror.py 100.00% <100.00%> (ø)
trio/_core/_run.py 100.00% <100.00%> (ø)
trio/_core/tests/test_run.py 100.00% <100.00%> (ø)
trio/tests/test_socket.py 100.00% <0.00%> (+0.81%) ⬆️

@@ -114,6 +114,9 @@ def push_tb_down(tb, exc, preserved):
preserved = set()
new_root_exc = filter_tree(root_exc, preserved)
push_tb_down(None, root_exc, preserved)
# Delete the local functions avoid a reference cycle (see
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to avoid a..."

trio/_core/_multierror.py Show resolved Hide resolved
@pytest.mark.skipif(
sys.implementation.name != "cpython", reason="Only makes sense with refcounting GC"
)
async def test_simple_cancel_scope_usage_doesnt_create_cyclic_garbage():
Copy link
Member

@belm0 belm0 Nov 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Would it be worth adding a test to catch garbage regressions at a higher level, say sleep()? Even if that case isn't at 0 now, we could assert on the count being below what's observed currently.

@njsmith
Copy link
Member Author

njsmith commented Nov 16, 2020

Note: in Trio there are several places that have the same pattern as Error.unwrap, where they stash an exception in a local variable and then raise it, which should create a reference cycle. Specifically, the __exit__ handler for MultiError.catch, CancelScope, and open_nursery. I think this means that they should probably get a similar fix. But I tried extending the test in this PR to confirm this, and I couldn't get it to fail. So something funky is going on that I don't totally understand yet. (This PR should be enough to fix the original issue in #1770 though. @belm0 if you have a chance to try the modified outcome + this PR and can confirm that it fixes the actual production issue you were seeing that would be great.)

@belm0
Copy link
Member

belm0 commented Nov 16, 2020

Note: in Trio there are several places that have the same pattern as Error.unwrap, where they stash an exception in a local variable and then raise it, which should create a reference cycle. Specifically, the __exit__ handler for MultiError.catch, CancelScope, and open_nursery. I think this means that they should probably get a similar fix. But I tried extending the test in this PR to confirm this, and I couldn't get it to fail. So something funky is going on that I don't totally understand yet. (This PR should be enough to fix the original issue in #1770 though. @belm0 if you have a chance to try the modified outcome + this PR and can confirm that it fixes the actual production issue you were seeing that would be great.)

In #1770 (comment) there is a self-contained example that counts garbage across sleep(). I'd expect it to work within a test 🤔

I'll try to test staged trio + outcome changes against that test and our app in a day or so.

@njsmith
Copy link
Member Author

njsmith commented Nov 16, 2020

In #1770 (comment) there is a self-contained example that counts garbage across sleep(). I'd expect it to work within a test thinking

Yeah, the test included in this PR was derived from that example, and the test does indeed fail without the fixes in these two PRs.

The weird thing is that I tried locally adding some extra versions of the test to catch cases that I thought should also fail, like a nursery where one task is cancelled and another raises an exception, and in those cases I couldn't get it to fail or figure out why it wasn't failing.

I guess if anyone is desperately curious you could check out the video at https://www.twitch.tv/videos/804632095 since I did this all on stream (NB that link will probably stop working in a few weeks, following twitch's garbage collection algorithm). But I'll probably get back to it later, just ran out of steam for today.

@pquentin
Copy link
Member

outcome was released, and this pull request now tests successfully against it.

@njsmith Should we request outcome>=1.1.0 in setup.py? It's not mandatory, but without that version the newsfragment would no longer be true, I guess.

@belm0
Copy link
Member

belm0 commented Nov 17, 2020

@njsmith Should we request outcome>=1.1.0 in setup.py? It's not mandatory, but without that version the newsfragment would no longer be true, I guess.

I'd opt for just noting in the newsfragment that Trio is doing its part, and together with the recent outcome package addresses the worst of the GC issue.

@belm0
Copy link
Member

belm0 commented Nov 19, 2020

From the standalone test at #1770 (comment), here are the garbage items per sleep count for various cases:

  • no fixes - 36
  • both fixes - 0
  • without outcome fix - 28
  • without trio fix - 8

However, even with both fixes I can create significant garbage (about 40 items) by way of a cancelled background task:

async with trio.open_nursery() as nursery:
    nursery.start_soon(trio.sleep_forever)
    await trio.sleep(T/N)   # a little less garbage if this is removed
    nursery.cancel_scope.cancel()

Cancelled nursery without the background task is 25 items:

async with trio.open_nursery() as nursery:
    nursery.cancel_scope.cancel()

@richardsheridan
Copy link
Contributor

richardsheridan commented Nov 19, 2020

@belm0 does this test code reproduce what you are talking about here?

import gc
import trio

import pprint
import objgraph


async def amain():
    for _ in range(3):
        async with trio.open_nursery() as n:
            gc.collect()
            gc.set_debug(gc.DEBUG_LEAK)
            n.cancel_scope.cancel()
        gc.collect()
        gc.set_debug(0)
        print(len(gc.garbage))
        if _ < 2:
            gc.garbage.clear()


if __name__ == "__main__":
    trio.run(amain)
    gc.collect()
    pprint.pprint(gc.garbage)
    stuff = [x for x in gc.garbage if "Cancelled" in repr(x)]
    objgraph.show_backrefs(stuff,extra_ignore=(id(gc.garbage),),max_depth=10,filename="nursery_cycle.png")

This is the reference cycle I observe with that code:
nursery_cycle
(how do you scale images on github? this is way too big)
The problem seems to originate in this bit of code.

trio/trio/_core/_run.py

Lines 932 to 940 in e113e56

try:
await checkpoint()
except BaseException as exc:
self._add_exc(exc)
popped = self._parent_task._child_nurseries.pop()
assert popped is self
if self._pending_excs:
return MultiError(self._pending_excs)

I've been able to break the cycle by doing:

        if self._pending_excs:
            try:
                return MultiError(self._pending_excs)
            finally:
                del self._pending_excs

or

        if self._pending_excs:
            try:
                return MultiError(self._pending_excs)
            finally:
                del popped, self

@belm0
Copy link
Member

belm0 commented Nov 19, 2020

@richardsheridan thank you for your contributions on the GC issue

I think you'll find some other cycle cases by adding these inside the cancelled nursery:

nursery.start_soon(trio.sleep_forever)
await trio.sleep(.01)

@richardsheridan
Copy link
Contributor

richardsheridan commented Nov 20, 2020

Adding those doesn't seem to make any more garbage for me with the _pending_excs fix I outlined above. I guess the "more garbage" you see is more objects kept alive by connection to the same cycle. (they all merge up in aexit)

@belm0
Copy link
Member

belm0 commented Nov 20, 2020

Adding those doesn't seem to make any more garbage for me with the fixes I outlined above.

confirmed, thank you

       if self._pending_excs:
          try:
               return MultiError(self._pending_excs)
           finally:
              del self._pending_excs

For what it's worth, the try/finally may not be needed. I think self._pending_excs can be stored in a local and deleted upfront.

@belm0
Copy link
Member

belm0 commented Dec 4, 2020

Shall we merge this PR as is, or add the additional fix by @richardsheridan with corresponding test? I'm happy to commit the latter to this PR.

@belm0 belm0 changed the title Avoid creating reference cycles in MultiError.filter Avoid creating reference cycles Jan 8, 2021
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@belm0 belm0 merged commit e7f3f37 into python-trio:master Jan 8, 2021
belm0 pushed a commit that referenced this pull request Jul 18, 2021
from MultiErrorCatcher.__exit__, _multierror.copy_tb, MultiError.filter, CancelScope.__exit__, and NurseryManager.__aexit__ methods. This was nearly impossible to catch until #1864 landed so it wasn't cleaned up in #1805 (or during the live stream: https://www.youtube.com/watch?v=E_jvJVYXUAk).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trio run loop generates cyclic garbage, which can cause more frequent GC stalls on CPython
4 participants