gh-115874: Fix segfault in `FutureIter_dealloc` #117741

savannahostrowski · 2024-04-11T00:07:10Z

Issue: Segfaults when accessing module state in tp_dealloc (itertools teedataobject clear) #115874

savannahostrowski · 2024-04-11T00:07:48Z

cc: @brandtbucher for his review as well

gvanrossum

I'll leave it to @brandtbucher to review this -- I am somewhat disappointed that in order to fix this we have to copy a bunch of subtle code from inside PyType_GetModuleByDef() -- especially since there the code is enclosed in macros BEGIN_TYPE_LOCK() and END_TYPE_LOCK() (though those seem needed only because of the call to lookup_tp_mro() there).

At the same time I understand you don't want to call PyType_GetModuleByDef() and have to deal with the exception it raises in exactly the scenario we're trying to tiptoe around here.

brandtbucher · 2024-04-12T21:00:08Z

At the same time I understand you don't want to call PyType_GetModuleByDef() and have to deal with the exception it raises in exactly the scenario we're trying to tiptoe around here.

It looks like it's actually a bit more subtle than that.

If I understand correctly, this isn't being done to avoid an exception, but to avoid a crash due to the type's MRO being NULL (this happens when the type and instance are part of a cycle, and the type is cleared before the instance is deallocated). In general, it seems unsafe to assume types aren't cleared in an instance's tp_dealloc, as we do in the current code.

brandtbucher · 2024-04-12T21:08:06Z

@savannahostrowski, do you mind adding a NEWS blurb? No need to get too technical, just explaining that we've fixed a possible crash during garbage-collection of _asyncio.FutureIter objects.

Can you also add a comment or two to the new code referencing the issue number and explaining that:

We can't use PyType_GetModuleByDef, since the type might have already been cleared. This is also why we need to check that ht_module isn't NULL.
Since it's uncommon to subclass this type, it's fine (and probably faster for the common case!) to just check our type's module and not bother walking the MRO.
This means subclasses can't make use of the free list... oh well.

Can you also confirm for me that our old reproducer crashes on 3.12? If so, we can add the test (it's okay if it no longer works on 3.13, probably still good to have) and flag this for backport.

gvanrossum · 2024-04-13T05:19:32Z

If I understand correctly, this isn't being done to avoid an exception, but to avoid a crash due to the type's MRO being NULL (this happens when the type and instance are part of a cycle, and the type is cleared before the instance is deallocated). In general, it seems unsafe to assume types aren't cleared in an instance's tp_dealloc, as we do in the current code.

Hm, that would point to a pretty universal problem (the instance being cleared after the type, when both are involved in a cycle). Why isn’t that crashing other code? What’s special about this example?

(I am pushing back on this because the fix requires breaking through a nice abstraction, potentially in many more cases.)

savannahostrowski · 2024-04-13T19:22:46Z

Thanks for the feedback, folks. I'll add some comments to the code.

@brandtbucher I can confirm that the segfault still repros on 3.12 so I can add a test here. I know you did some investigation in other modules and that's how you found this issue. That said, is it worth doing a bit more spelunking to understand if this is happening in other places as well? I'm new to this part of the codebase to really understand how prolific this might be 😅 but I want to address concerns here.

brandtbucher · 2024-04-16T19:13:11Z

Hm, that would point to a pretty universal problem (the instance being cleared after the type, when both are involved in a cycle). Why isn’t that crashing other code? What’s special about this example?

What's special is that only tp_clear and tp_dealloc functions are susceptible to this fragile situation. Doing anything other than untracking, clearing, and freeing objects in these functions is rare. Clever stuff belongs in finalizers, where you have an object in a known sane state... the obvious exception being freelists like this, which must go in a dealloc function, since it's "freeing" the memory.

(I am pushing back on this because the fix requires breaking through a nice abstraction, potentially in many more cases.)

In the original issue, we found a similar crash in itertools.tee. I was worried too, but after a (very) quick scan of literally every function matching \w+(clear|dealloc)\( in C code, this was the only other one that worried me.

A perhaps more "principled" fix would be to change PyType_GetModuleByDef to raise in the case where tp_mro is NULL and change _asyncio to handle the error instead of asserting that its return value is non-NULL. But that's a lot more invasive, and will slow down the happy path of common operations to work around a really tricky edge case that doesn't affect that much code.

In my opinion, the bug here is assuming that your type hasn't been cleared in a clear or dealloc func. That's incorrect.

brandtbucher · 2024-04-16T19:14:51Z

I know you did some investigation in other modules and that's how you found this issue. That said, is it worth doing a bit more spelunking to understand if this is happening in other places as well? I'm new to this part of the codebase to really understand how prolific this might be 😅 but I want to address concerns here.

See my comment above. I'm reasonably confident that there aren't other offenders, but another pair of eyes definitely wouldn't hurt.

Modules/_asynciomodule.c

gvanrossum · 2024-04-16T21:21:06Z

In my opinion, the bug here is assuming that your type hasn't been cleared in a clear or dealloc func. That's incorrect.

Hmm... But it's an easy trap to fall into. Not exactly a bug magnet (only two instances in CPython itself), perhaps, but hard to debug, and hard to reason about: it seems it only happens when the type is cleared first. So isn't the bug that the type is cleared while it still has instances?

Do we understand how exactly this happened? Is module_clear called prematurely?

brandtbucher · 2024-04-16T21:32:54Z

So isn't the bug that the type is cleared while it still has instances?

Do we understand how exactly this happened? Is module_clear called prematurely?

My hunch is that we have a cycle: instance -> type -> module -> [...] -> instance. I don't think the GC can be faulted for breaking the cycle at type -> module in this case, since it just chooses an element to clear essentially at random. Here, it just happens to be the type.

Co-authored-by: Brandt Bucher <[email protected]>

erlend-aasland · 2024-04-17T16:39:19Z

@erlend-aasland I took a look at your suggestion but in this case, even if we stored the state in futureiterobject, I think that the state could still be cleared before the instance is deallocated, making it possibly as unreliable as the module to grab the necessary info for deallocation.

Yes, I noticed that after reading through the discussion again (which is why I marked my comment as outdated); it is an unfortunate issue. Well, thanks a lot for implementing this workaround :)

miss-islington-app · 2024-04-19T22:38:08Z

Thanks @savannahostrowski for the PR, and @brandtbucher for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

(cherry picked from commit d8f3503) Co-authored-by: Savannah Ostrowski <[email protected]>

bedevere-app · 2024-04-19T22:38:18Z

GH-118114 is a backport of this pull request to the 3.12 branch.

GH-115874: Fix segfault in FutureIter_dealloc (GH-117741) (cherry picked from commit d8f3503) Co-authored-by: Savannah Ostrowski <[email protected]>

vstinner · 2024-04-30T07:47:57Z

Thanks for fixing this crash!

Modules/_asynciomodule.c

Misc/NEWS.d/next/Core and Builtins/2024-04-13-18-59-25.gh-issue-115874.c3xG-E.rst

…ealloc`) (GH-121638) Address comments

…Iter_dealloc`) (pythonGH-121638) Address comments (cherry picked from commit 65feded) Co-authored-by: Savannah Ostrowski <[email protected]>

…eIter_dealloc`) (GH-121638) (GH-121642) Update retroactive comments from GH-117741 (segfault in `FutureIter_dealloc`) (GH-121638) Address comments (cherry picked from commit 65feded) Co-authored-by: Savannah Ostrowski <[email protected]>

…Iter_dealloc`) (pythonGH-121638) Address comments

savannahostrowski added 3 commits April 9, 2024 02:48

Add conditional block to check if modules asyncio

ec21ef8

remove get_asyncio_state_by_def

ca1c043

Always untrack and clear

0845b28

savannahostrowski requested review from 1st1, asvetlov, gvanrossum, kumaraditya303 and willingc as code owners April 11, 2024 00:07

bedevere-app bot added the awaiting review label Apr 11, 2024

bedevere-app bot mentioned this pull request Apr 11, 2024

Segfaults when accessing module state in tp_dealloc (itertools teedataobject clear) #115874

Closed

gvanrossum requested a review from brandtbucher April 11, 2024 00:27

gvanrossum reviewed Apr 11, 2024

View reviewed changes

brandtbucher requested a review from erlend-aasland April 12, 2024 21:09

savannahostrowski and others added 3 commits April 13, 2024 11:43

Merge branch 'python:main' into fix-115874

812a3d6

📜🤖 Added by blurb_it.

297d8dd

fix backticks

803c320

Add comment to reference issue

9b78e14

This comment was marked as outdated.

Sign in to view

savannahostrowski added 2 commits April 15, 2024 14:25

Merge branch 'main' into fix-115874

c6bb110

Remove newline

9070afc

brandtbucher reviewed Apr 16, 2024

View reviewed changes

Update Modules/_asynciomodule.c

69856ec

Co-authored-by: Brandt Bucher <[email protected]>

Merge branch 'main' into fix-115874

715a946

erlend-aasland approved these changes Apr 17, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Apr 17, 2024

savannahostrowski added 5 commits April 18, 2024 02:39

Update comment for clarity

7af7811

Add comment to assertion

67f8c59

Merge branch 'main' into fix-115874

3a87d93

Merge branch 'main' into fix-115874

90ffe4a

Merge branch 'main' into fix-115874

beedfb6

brandtbucher enabled auto-merge (squash) April 19, 2024 22:13

brandtbucher merged commit d8f3503 into python:main Apr 19, 2024
36 of 48 checks passed

bedevere-app bot removed the awaiting merge label Apr 19, 2024

brandtbucher added the needs backport to 3.12 bug and security fixes label Apr 19, 2024

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Apr 19, 2024

pythonGH-115874: Fix segfault in FutureIter_dealloc (pythonGH-117741)

2f6fc1a

(cherry picked from commit d8f3503) Co-authored-by: Savannah Ostrowski <[email protected]>

bedevere-app bot removed the needs backport to 3.12 bug and security fixes label Apr 19, 2024

kumaraditya303 reviewed Jul 2, 2024

View reviewed changes

Modules/_asynciomodule.c Show resolved Hide resolved

kumaraditya303 reviewed Jul 2, 2024

View reviewed changes

Modules/_asynciomodule.c Show resolved Hide resolved

kumaraditya303 reviewed Jul 2, 2024

View reviewed changes

Misc/NEWS.d/next/Core and Builtins/2024-04-13-18-59-25.gh-issue-115874.c3xG-E.rst Show resolved Hide resolved

savannahostrowski mentioned this pull request Jul 12, 2024

Update retroactive comments from GH-117741 (segfault in FutureIter_dealloc) #121638

Merged

willingc pushed a commit that referenced this pull request Jul 12, 2024

Update retroactive comments from GH-117741 (segfault in `FutureIter_d…

65feded

…ealloc`) (GH-121638) Address comments

estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024

Update retroactive comments from pythonGH-117741 (segfault in `Future…

ff058c2

…Iter_dealloc`) (pythonGH-121638) Address comments

savannahostrowski deleted the fix-115874 branch September 27, 2024 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-115874: Fix segfault in `FutureIter_dealloc` #117741

gh-115874: Fix segfault in `FutureIter_dealloc` #117741

savannahostrowski commented Apr 11, 2024 •

edited by bedevere-app bot

Loading

savannahostrowski commented Apr 11, 2024

gvanrossum left a comment

brandtbucher commented Apr 12, 2024

brandtbucher commented Apr 12, 2024

gvanrossum commented Apr 13, 2024

savannahostrowski commented Apr 13, 2024 •

edited

Loading

This comment was marked as outdated.

brandtbucher commented Apr 16, 2024 •

edited

Loading

brandtbucher commented Apr 16, 2024

gvanrossum commented Apr 16, 2024

brandtbucher commented Apr 16, 2024

erlend-aasland commented Apr 17, 2024

miss-islington-app bot commented Apr 19, 2024

bedevere-app bot commented Apr 19, 2024

vstinner commented Apr 30, 2024

gh-115874: Fix segfault in FutureIter_dealloc #117741

gh-115874: Fix segfault in FutureIter_dealloc #117741

Conversation

savannahostrowski commented Apr 11, 2024 • edited by bedevere-app bot Loading

savannahostrowski commented Apr 11, 2024

gvanrossum left a comment

Choose a reason for hiding this comment

brandtbucher commented Apr 12, 2024

brandtbucher commented Apr 12, 2024

gvanrossum commented Apr 13, 2024

savannahostrowski commented Apr 13, 2024 • edited Loading

This comment was marked as outdated.

brandtbucher commented Apr 16, 2024 • edited Loading

brandtbucher commented Apr 16, 2024

gvanrossum commented Apr 16, 2024

brandtbucher commented Apr 16, 2024

erlend-aasland commented Apr 17, 2024

miss-islington-app bot commented Apr 19, 2024

bedevere-app bot commented Apr 19, 2024

vstinner commented Apr 30, 2024

gh-115874: Fix segfault in `FutureIter_dealloc` #117741

gh-115874: Fix segfault in `FutureIter_dealloc` #117741

savannahostrowski commented Apr 11, 2024 •

edited by bedevere-app bot

Loading

savannahostrowski commented Apr 13, 2024 •

edited

Loading

brandtbucher commented Apr 16, 2024 •

edited

Loading