Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes and errors in test_embed with PYTHONUOPS=1 #111339

Closed
gvanrossum opened this issue Oct 25, 2023 · 13 comments
Closed

Crashes and errors in test_embed with PYTHONUOPS=1 #111339

gvanrossum opened this issue Oct 25, 2023 · 13 comments
Assignees
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@gvanrossum
Copy link
Member

gvanrossum commented Oct 25, 2023

For a detailed report see #110384 (comment) (a comment by @brandtbucher on a merged PR by @markshannon ).

I am keeping notes about what I'm learning about this (though Mark said it's his).


The issue seems intermittent. I can repro it by running this command repeatedly on my Mac:

PYTHONUOPS=1 ./Programs/_testembed test_run_main_loop

Usually it passes (printing 5 lines of output); occasionally I get this error:

Py_RunMain(): sys.argv=['-c', 'arg2']
Assertion failed: (PyUnicode_CheckExact(ep->me_key)), function unicodekeys_lookup_unicode, file dictobject.c, line 938.
Abort trap: 6

I can only repro this with PYTHONUOPS=1, but it never prints any uops debug output. Could it be that the issue is in the code for checking the env var?

Linked PRs

@brandtbucher
Copy link
Member

brandtbucher commented Oct 25, 2023

It's intermittent, and it seems like it's manifesting itself as refleaks and prematurely freed interned strings. My hunch is that the linked list is broken somehow, but I've combed through it a bunch of times and haven't found anything wrong.

Weirdly, I've found that I can't reproduce anymore after commenting out the PyMemberDef for the valid member of _PyUOpExecutorObject. Seems like a red herring... maybe it changes the order of allocations or GC traversals somehow (or the bug is unrelated to this PR, and just triggered by the allocation patterns it creates)?

@brandtbucher
Copy link
Member

If you look at the ob_type member of that key, it's a freed byte pattern (0xDDDDDDDDDDDDDDDD).

@brandtbucher brandtbucher added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Oct 25, 2023
@gvanrossum
Copy link
Member Author

gvanrossum commented Oct 25, 2023

Using the VS Code C debugger on macOS to run _testembed, I got a traceback. The failing dict lookup is called (indirectly) from PyUnstable_Optimizer_NewUOpOptimizer(), via PyType_Ready() and _PyType_CheckConsistency() and PyDict_Contains(). And this is called directly from pylifecycle.c. (But maybe not for the main interpreter?)


It always happens on the second invocation of PyUnstable_Optimizer_NewUOpOptimizer() from init_interp_main() -- so it's presumably due to finalization. I'm going to look more into your observation about valid -- that's a property, maybe there's something fishy there (the optimizer types don't seem to follow best practices for static types, I've had a problem with this before).

@brandtbucher
Copy link
Member

brandtbucher commented Oct 25, 2023

It also goes away for me if I replace PyUnicode_InternFromString with PyUnicode_FromString in descr_new (which is called for each member in the tp_members array during PyType_Ready).

@gvanrossum
Copy link
Member Author

I have tried a variety of things, but so far I can't make a dent in the problem. The last thing I've uncovered is that UOpExecutor_Type.tp_dict survives the finalization, but printing it crashes with a different assert, so presumably something there has been invalidated. Probably one of the keys.

If I comment out the PyMemberDef for valid, this is the dict (and it seems fine):

{'__len__': <slot wrapper '__len__' of 'uop_executor' objects>, '__getitem__': <slot wrapper '__getitem__' of 'uop_executor' objects>, '__doc__': None}

If I keep valid in, this is it after the first PyType_Ready():

{'__len__': <slot wrapper '__len__' of 'uop_executor' objects>, '__getitem__': <slot wrapper '__getitem__' of 'uop_executor' objects>, 'valid': <member 'valid' of 'uop_executor' objects>, '__doc__': None}

Printing it before the second PyType_Ready() call crashes:

Assertion failed: (PyUnicode_CheckExact(o)), function unicode_get_hash, file dictobject.c, line 285.

Hopefully @markshannon can take up the baton.

@gvanrossum
Copy link
Member Author

gvanrossum commented Oct 26, 2023

It also goes away for me if I replace PyUnicode_InternFromString with PyUnicode_FromString in descr_new (which is called for each member in the tp_members array during PyType_Ready).

Oh, that points to a new idea. The valid key is a "normal" interned string, which presumably is deleted when the interpreter is finalized. The other keys are all standard dunders which are constructed using _Py_ID() (e.g. &_Py_ID(__new__) in the failing line). I guess those are truly immortal.

Clearly something ought to clear the tp_dict field on finalization, but doesn't, so it survives finalization in this static type. Maybe @ericsnowcurrently has some insight?

@ericsnowcurrently
Copy link
Member

From gh-111350 (which fixes the crash in this specific case):

Builtin methods seem more robust than builtin properties.
This is clearly a workaround, rather than a proper fix, but a proper fix will be a lot of work.

@markshannon, it sounds like you may have insight into the underlying problem. More info would be valuable.

@gvanrossum
Copy link
Member Author

I don't know about the true underlying problem, but I have a hunch that it's easy to repro (e.g. in _testcapi):

  • Define a very simple static type
  • Give it one PyMemberDef
  • Do the following a few times:
Py_Initialize();
PyType_Ready(<our type>);
// Possibly instantiate our type
Py_Finalize();

(Function names may differ from actual API, I've not looked it up.)

The problem appears that the static type's tp_dict survives the finalization but one of the keys (the name of the PyMemberDef) is an interned string and has somehow been invalidated by finalization.

It's quite possible that something needs to clear the tp_dict as part of finalization, but that part of the lifecycle code is rather complicated. :-(

@brandtbucher
Copy link
Member

@markshannon and I think we found the issue this morning.

It looks like we need to add our types to static_types in object.c. This arranges for calling _PyStaticType_InitBuiltin and _PyStaticType_Dealloc on them at the right time, which do some "special things" like arranging for tp_dict to be stashed on the interpreter, etc. Hopefully there aren't too many other types that we've forgotten to add here... :(

I'm working on a PR now.

@gvanrossum
Copy link
Member Author

Great sleuthing (as they say). I figured there was something like that, but I couldn't find it -- that's why I asked for Eric's help.

@brandtbucher brandtbucher self-assigned this Oct 26, 2023
@brandtbucher brandtbucher added the 3.13 bugs and security fixes label Oct 26, 2023
@brandtbucher
Copy link
Member

brandtbucher commented Oct 27, 2023

@savannahostrowski is going to work on the fix. If anybody's blocked on this, please let me know.

@jiridanek
Copy link

jiridanek commented Dec 29, 2023

I think I hit this while wanting to use Fedora's python3-debug Python binary. I want to use this debug build of Python because it works well with leak sanitizer (does not cause its own leaks, when I use it with PYTHONMALLOC=malloc_debug PYTHONDEVMODE=1, so I can be fairly certain that any leaks that do appear are caused by me (in an application that embeds Python interpreter.

c_unittests: /builddir/build/BUILD/Python-3.12.0/Objects/dictobject.c:936: unicodekeys_lookup_unicode: Assertion `PyUnicode_CheckExact(ep->me_key)' failed.
Signal: SIGABRT (Aborted)
Fatal Python error: Aborted

Current thread 0x00007ffff05926c0 (most recent call first):
  File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1288 in create_module
  File "<frozen importlib._bootstrap>", line 813 in module_from_spec
  File "<frozen importlib._bootstrap>", line 915 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1325 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1354 in _find_and_load
  File "/usr/lib64/python3.12/ctypes/__init__.py", line 8 in <module>
  File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 994 in exec_module
  File "<frozen importlib._bootstrap>", line 929 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1325 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1354 in _find_and_load
  File "/home/jdanek/repos/skupper-router/python/skupper_router_internal/dispatch.py", line 40 in <module>
  File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 994 in exec_module
  File "<frozen importlib._bootstrap>", line 929 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1325 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1354 in _find_and_load
Signal: SIGABRT (Aborted)

Is there a useable workaround to avoid the assert (while still using a debug build of Python3.12) or do I have to wait for Python3.13 with the fixes?

$ python3-debug --version
Python 3.12.0
$ rpm -q python3-debug
python3-debug-3.12.0-1.fc39.x86_64

@jiridanek
Copy link

I did one more check, it really looks like this issue (#111339 (comment))

(gdb) p ep->me_key.ob_type
$2 = (PyTypeObject *) 0xdddddddddddddddd

While trying to understand this further, I realized there is probably no way to avoid the assert, baring some crazy LD_PRELOAD tricks, possibly. Looks like I'd need a debug build of Python (to get clean asan bill) with asserts disabled (so as not to crash on this).

Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

4 participants