Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-87135: Hang non-main threads that attempt to acquire the GIL during finalization #28525

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
186 changes: 164 additions & 22 deletions Doc/c-api/init.rst
Original file line number Diff line number Diff line change
Expand Up @@ -409,7 +409,11 @@ Initializing and finalizing the interpreter
freed. Some memory allocated by extension modules may not be freed. Some
extensions may not work properly if their initialization routine is called more
than once; this can happen if an application calls :c:func:`Py_Initialize` and
:c:func:`Py_FinalizeEx` more than once.
:c:func:`Py_FinalizeEx` more than once. :c:func:`Py_FinalizeEx` must not be
called recursively from within itself. Therefore, it must not be called by any
code that may be run as part of the interpreter shutdown process, such as
:py:mod:`atexit` handlers, object finalizers, or any code that may be run while
flushing the stdout and stderr files.

.. audit-event:: cpython._PySys_ClearAuditHooks "" c.Py_FinalizeEx

Expand Down Expand Up @@ -1000,6 +1004,78 @@ thread, where the CPython global runtime was originally initialized.
The only exception is if :c:func:`exec` will be called immediately
after.

.. _cautions-regarding-runtime-finalization:

Cautions regarding runtime finalization
---------------------------------------

In the late stage of :term:`interpreter shutdown`, after attempting to wait for
non-daemon threads to exit (though this can be interrupted by
:class:`KeyboardInterrupt`) and running the :mod:`atexit` functions, the runtime
is marked as *finalizing*: :c:func:`_Py_IsFinalizing` and
:func:`sys.is_finalizing` return true. At this point, only the *finalization
thread* that initiated finalization (typically the main thread) is allowed to
acquire the :term:`GIL`.

If any thread, other than the finalization thread, attempts to acquire the GIL
during finalization, either explicitly via :c:func:`PyGILState_Ensure`,
:c:macro:`Py_END_ALLOW_THREADS`, :c:func:`PyEval_AcquireThread`, or
:c:func:`PyEval_AcquireLock`, or implicitly when the interpreter attempts to
reacquire it after having yielded it, the thread enters a permanently blocked
state where it remains until the program exits. In most cases this is harmless,
but this can result in deadlock if a later stage of finalization attempts to
acquire a lock owned by the blocked thread, or otherwise waits on the blocked
thread.

To avoid non-Python threads becoming blocked, or Python-created threads becoming
blocked while executing C extension code, you can use
:c:func:`PyThread_TryAcquireFinalizeBlock` and
:c:func:`PyThread_ReleaseFinalizeBlock`.
gpshead marked this conversation as resolved.
Show resolved Hide resolved

For example, to deliver an asynchronous notification to Python from a C
extension, you might be inclined to write the following code that is *not* safe
to execute during finalization:

.. code-block:: c

// some non-Python created thread that wants to send Python an async notification
PyGILState_STATE state = PyGILState_Ensure(); // may hang thread
// call `call_soon_threadsafe` on some event loop object
PyGILState_Release(state);

To avoid the possibility of the thread hanging during finalization, and also
support older Python versions:

.. code-block:: c

// some non-Python created thread that wants to send Python an async notification
PyGILState_STATE state;
#if PY_VERSION_HEX >= 0x030c0000 // API added in Python 3.12
int acquired = PyThread_TryAcquireFinalizeBlock();
if (!acquired) {
// skip sending notification since python is exiting
return;
}
#endif // PY_VERSION_HEX
state = PyGILState_Ensure(); // safe now
// call `call_soon_threadsafe` on some event loop object
PyGILState_Release(state);
#if PY_VERSION_HEX >= 0x030c0000 // API added in Python 3.12
PyThread_ReleaseFinalizeBlock();
#endif // PY_VERSION_HEX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks quite complicated just to acquire the GIL.

Why not adding a variant of PyGILState_Ensure() which tries to acquire it, or return a special value if Python is exiting? It would be simpler to use than having to add 2 new function calls:

   PyGILState_STATE state = PyGILState_TryEnsure();
   if (state == BAD_LUCK) { // XXX: find a better name
     // skip sending notification since python is exiting
     return;
   }
   // ... use the Python C API ...
   PyGILState_Release(state);  // regular code, unchanged

Code compatible with old Python version:

   PyGILState_STATE state;
#if PY_VERSION_HEX >= 0x030c0000 // API added in Python 3.12
   state = PyGILState_TryEnsure();
   if (state == BAD_LUCK) { // XXX: find a better name
     // skip sending notification since python is exiting
     return;
   }
#else
   state = PyGILState_Ensure();
#endif
   // ... use the Python C API ...
   PyGILState_Release(state);  // regular code, unchanged

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like your suggestion is that the same PyGILState_STATE enum be used by both the existing PyGILState_Ensure function, as well as the new PyGILState_TryAcquireFinalizeBlockAndGIL function, and that the existing PyGILState_Release function provide either its existing behavior, or the behavior of PyGILState_ReleaseGILAndFinalizeBlock, depending on the PyGILState_STATE value.

That is feasible, if we modify the definition of PyGILState_STATE as follows:

Existing definition:

typedef
    enum {PyGILState_LOCKED, PyGILState_UNLOCKED}
        PyGILState_STATE;

New definition:

typedef
    enum {
     PyGILState_LOCKED,
     PyGILState_UNLOCKED,
     PyGILState_EXITING,
     PyGILState_LOCKED_WITH_FINALIZE_BLOCK,
     PyGILState_UNLOCKED_WITH_FINALIZE_BLOCK}
        PyGILState_STATE;

The reason we need 3 new states rather than just 1 is because PyGILState_Release would need to know whether to release a finalize block or not.

But it seems like it may be confusing to essentially overload the meaning of PyGILState_Release in this way, just to simplify code that needs to use finalize blocks (which is rare) and that needs to be compatible with Python < 3.12. After a few years, code will likely no longer support Python < 3.12, but this more confusing API will remain. But I can certainly make this change if that is desired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yhg1s - any thoughts on this PyGILState_STATE API design question?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the point about PyGILState_Release() being confusing. I think the main question is whether there's a reasonable use-case for the finalize block outside of acquiring the GIL. For the use-case of making sure it's safe to acquire the GIL (and preventing finalization from starting while the GILState is held), I think the PyGILState API makes more sense. At that point it isn't exposed as a separate lock but as a new error condition (and a new safety guarantee when calling TryEnsure).

Framing it as "PyGILState_TryEnsure() is like PyGILState_Ensure() except it also prevents finalization from starting while the GILState is held" makes sense to me. I don't think most users of the C API care if their threads are blocked forever when finalizing, so they can just keep using PyGILState_Ensure. I don't think that's more confusing.


Or with the convenience interface (requires Python >=3.12):

.. code-block:: c

// some non-Python created thread that wants to send Python an async notification
PyGILState_TRY_STATE state = PyGILState_TryAcquireFinalizeBlockAndGIL();
if (!state) {
// skip sending notification since python is exiting
return;
}
// call `call_soon_threadsafe` on some event loop object
PyGILState_ReleaseGILAndFinalizeBlock(state);

High-level API
--------------
Expand Down Expand Up @@ -1082,11 +1158,14 @@ code, or when embedding the Python interpreter:
ensues.

.. note::
Calling this function from a thread when the runtime is finalizing
will terminate the thread, even if the thread was not created by Python.
You can use :c:func:`_Py_IsFinalizing` or :func:`sys.is_finalizing` to
check if the interpreter is in process of being finalized before calling
this function to avoid unwanted termination.
Calling this function from a thread when the runtime is finalizing will
hang the thread until the program exits, even if the thread was not
created by Python. Refer to
:ref:`cautions-regarding-runtime-finalization` for more details.

.. versionchanged:: 3.12
Hangs the current thread, rather than terminating it, if called while the
interpreter is finalizing.

.. c:function:: PyThreadState* PyThreadState_Get()

Expand Down Expand Up @@ -1128,11 +1207,14 @@ with sub-interpreters:
to call arbitrary Python code. Failure is a fatal error.

.. note::
Calling this function from a thread when the runtime is finalizing
will terminate the thread, even if the thread was not created by Python.
You can use :c:func:`_Py_IsFinalizing` or :func:`sys.is_finalizing` to
check if the interpreter is in process of being finalized before calling
this function to avoid unwanted termination.
Calling this function from a thread when the runtime is finalizing will
hang the thread until the program exits, even if the thread was not
created by Python. Refer to
:ref:`cautions-regarding-runtime-finalization` for more details.

.. versionchanged:: 3.12
Hangs the current thread, rather than terminating it, if called while the
interpreter is finalizing.

.. c:function:: void PyGILState_Release(PyGILState_STATE)

Expand All @@ -1144,6 +1226,36 @@ with sub-interpreters:
Every call to :c:func:`PyGILState_Ensure` must be matched by a call to
:c:func:`PyGILState_Release` on the same thread.

.. c:function:: PyGILState_TRY_STATE PyGILState_AcquireFinalizeBlockAndGIL()

Attempts to acquire a :ref:`finalize
block<cautions-regarding-runtime-finalization>`, and if successful, acquires
the :term:`GIL`.

This is a simple convenience interface that saves having to call
:c:func:`PyThread_TryAcquireFinalizeBlock` and :c:func:`PyGILState_Ensure`
separately.

Returns ``PyGILState_TRY_LOCK_FAILED`` (equal to 0) if the interpreter is
already waiting to finalize. In this case, the :term:`GIL` is not acquired
and Python C APIs that require the :term:`GIL` must not be called.

Otherwise, acquires a finalize block and then acquires the :term:`GIL`.

Each call that is successful (i.e. returns a non-zero
``PyGILState_TRY_STATE`` value) must be paired with a subsequent call to
:c:func:`PyGILState_ReleaseGILAndFinalizeBlock` with the same value returned
by this function. Calling :c:func:`PyGILState_ReleaseGILAndFinalizeBlock` with the
error value ``PyGILState_TRY_LOCK_FAILED`` is safe and does nothing.

.. versionadded:: 3.12

.. c:function:: void PyGILState_ReleaseGILAndFinalizeBlock(PyGILState_TRY_STATE)

Releases any locks acquired by the corresponding call to
:c:func:`PyGILState_AcquireFinalizeBlockAndGIL`.

.. versionadded:: 3.12

.. c:function:: PyThreadState* PyGILState_GetThisThreadState()

Expand Down Expand Up @@ -1410,17 +1522,20 @@ All of the following functions must be called after :c:func:`Py_Initialize`.
If this thread already has the lock, deadlock ensues.

.. note::
Calling this function from a thread when the runtime is finalizing
will terminate the thread, even if the thread was not created by Python.
You can use :c:func:`_Py_IsFinalizing` or :func:`sys.is_finalizing` to
check if the interpreter is in process of being finalized before calling
this function to avoid unwanted termination.
Calling this function from a thread when the runtime is finalizing will
hang the thread until the program exits, even if the thread was not
created by Python. Refer to
:ref:`cautions-regarding-runtime-finalization` for more details.

.. versionchanged:: 3.8
Updated to be consistent with :c:func:`PyEval_RestoreThread`,
:c:func:`Py_END_ALLOW_THREADS`, and :c:func:`PyGILState_Ensure`,
and terminate the current thread if called while the interpreter is finalizing.

.. versionchanged:: 3.12
Hangs the current thread, rather than terminating it, if called while the
interpreter is finalizing.

:c:func:`PyEval_RestoreThread` is a higher-level function which is always
available (even when threads have not been initialized).

Expand Down Expand Up @@ -1448,17 +1563,19 @@ All of the following functions must be called after :c:func:`Py_Initialize`.
instead.

.. note::
Calling this function from a thread when the runtime is finalizing
will terminate the thread, even if the thread was not created by Python.
You can use :c:func:`_Py_IsFinalizing` or :func:`sys.is_finalizing` to
check if the interpreter is in process of being finalized before calling
this function to avoid unwanted termination.
Calling this function from a thread when the runtime is finalizing will
hang the thread until the program exits, even if the thread was not
created by Python. Refer to
:ref:`cautions-regarding-runtime-finalization` for more details.

.. versionchanged:: 3.8
Updated to be consistent with :c:func:`PyEval_RestoreThread`,
:c:func:`Py_END_ALLOW_THREADS`, and :c:func:`PyGILState_Ensure`,
and terminate the current thread if called while the interpreter is finalizing.

.. versionchanged:: 3.12
Hangs the current thread, rather than terminating it, if called while the
interpreter is finalizing.

.. c:function:: void PyEval_ReleaseLock()

Expand All @@ -1469,6 +1586,32 @@ All of the following functions must be called after :c:func:`Py_Initialize`.
:c:func:`PyEval_SaveThread` or :c:func:`PyEval_ReleaseThread`
instead.

.. c:function:: int PyThread_AcquireFinalizeBlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, this API is interpreter-specific, rather than tied to a thread or the global runtime. So Py_AcquireFinalizeBlock() might be more consistent with similar existing API (e.g. Py_NewInterpreterFromConfig(), Py_EndInterpreter()).

That said, the proposed function relies on knowing the interpreter associated with the current thread (e.g. PyInterpreterState_Get()). I'd say we're trending away from that approach generally, and, ideally, we would not introduce new C-API that relies on that implicit knowledge. Instead, it may make more sense to add a variant instead: PyInterpreterState_AcquireFinalizeBlock().

The caller would explicitly provide the interpreter that should be blocked:

    PyInterpreterState *interp = PyInterpreterState_Get();
    if (!PyInterpreterState_AcquireFinalizeBlock(interp)) {
        end_early_because_we_are_finalizing_unexpectedly();
        return;
    }
    do_the_normal_work();
    PyInterpreterState_ReleaseFinalizeBlock(interp);
    return;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what I've said about "finalize block", consider alternate names:

  • PyInterpreterState_PreventFini() (and PyInterpreterState_AllowFini())
    • similarly, Py_PreventInterpreterFini() (and Py_AllowInterpreterFini())
  • PyInterpreterState_BlockFini() (and PyInterpreterState_ReleaseFini())

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is fairly straightforward to make these finalize blocks interpreter-specific, it is not clear to me, with my limited understanding of sub-interpreters, whether that is actually useful.

  • It isn't clear to me how PyFinalize_Ex interacts with multiple interpreters. It only seems to finalize the current interpreter.

  • The documentation for Py_EndInterpreter states that the interpreter must "have no other threads". In fact it only does this check after calling the AtExit functions for the interpreter, so it seems it would be sufficient to ensure that all other thread states are destroyed before the AtExit functions finish. But there is also the question of what happens if we try to create a thread state while Py_EndInterpreter is still in progress. Py_EndInterpreter doesn't seem to check for other thread states while holding the HEAD_LOCK, but that is not an issue as long as the check does not fail.

  • In general, given the "no other threads" constraint for Py_EndInterpreter it seems that if other non-Python-created or daermon threads hold references to the PyInterpreterState, then some external synchronization mechanism will be needed to ensure that they don't attempt to access the PyInterpreterState once the "no other threads" check completes.

As an example, suppose we have a C extension that provides a Python API that allows Python callbacks to be passed in, and then later calls those Python functions on its own non-Python-created thread pool. If this extension is to support sub-interpreters, then either during multi-phase module initialization, or when it receives the Python callback, it must record the PyInterpreterState associated with the callback. Then, in order to invoke the callback on a thread from its thread pool, it must obtain a PyThreadState for the (thread, interpreter) combination, creating one if one does not already exist. To ensure the PyInterpreterState pointers that it holds remain valid, it would need to register an AtExit function for the interpreter that ensures the PyInterpreterState won't be used. This AtExit function would likely need to essentially implement its own version of the "finalize block" mechanism introduced here.

Given the need for external synchronization of threads when calling Py_EndInterpreter, it seems to me that the finalize block mechanism defined by this PR is only useful for the main interpreter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I dig in to responding: one key point to consider is that Python users (especially extension authors, via the C-API) only interact directly with the global runtime via a few API functions. In nearly every case they are instead interacting with the Python thread (state) or interpreter associated with the current OS thread.

Another key point is that, as of 3.12, each interpreter has its own GIL.

Finally, it is certainly possible that I've misunderstood either the problem you're trying to solve or the way you're trying to solve it or both. I'm completely willing to learn and adjust. Then again, I might be completely right too!

(Sorry for the length of this post. I genuinely want to understand and to be sure we're taking the right approach. I appreciate the work you've done and your willingness to converse.)


Now, on to responses:

  • It isn't clear to me how PyFinalize_Ex interacts with multiple interpreters. It only seems to finalize the current interpreter.

PyFinalize_Ex() finalizes the main interpreter and the global runtime. At the point it is called, no other interpreters should exist.

Py_EndInterpreter() finalizes any other interpreter, almost entirely in the same way we finalize the main interpreter.

  • The documentation for Py_EndInterpreter states that the interpreter must "have no other threads".

Please clarify. The only thing I see is: "All thread states associated with this interpreter are destroyed."

The behavior should be the same for all interpreters, whether via Py_EndInterpreter() or the main interpreter via Py_FinalizeEx():

  1. check if current thread and holds GIL and not still running (ensures no reentrancy from the eval loop)
  2. internally mark the interpreter as finalizing
  3. wait for all non-daemon threads to finish
  4. call all of the interpreter's atexit callbacks, if any
  5. externally mark the interpreter as finalizing (this causes daemon threads to die, or, now, pause)
  6. finalize imports
  7. clear the interpreter state (including all its remaining thread states)
  8. delete the interpreter state

Caveats:

Py_FinalizeEx() does a few extra things at various places, but that should not relate to interpreter lifecycle.

I do see that it calls _PyThreadState_DeleteExcept() right after step (5), which Py_EndInterpreter() does not do. However, that's unexpected and should probably be resolved.

There are a few things we don't do in either that we probably should, probably before or right after step (5), e.g. disable the import system, disallow new threads (thread states).

Also, step (3) only applies to threads created by the threading module. We might want to extend that to all other threads states (i.e. created via PyThreadState_New() or PyGILState_Ensure()).

In fact it only does this check after calling the AtExit functions for the interpreter,

Looking at Py_EndInterpreter():

  • calls wait_for_thread_shutdown() right before _PyAtExit_Call()
  • publicly marks itself as finalizing right after _PyAtExit_Call()
  • destroys its remaining thread states at the end during finalize_interp_clear()

So I'm not sure what you mean specifically.

FWIW, Py_FinalizeEx() is exactly the same, except currently it does that last part a little earlier with _PyThreadState_DeleteExcept().

so it seems it would be sufficient to ensure that all other thread states are destroyed before the AtExit functions finish.

Only daemon threads (and, for now, threads (states) created via the C-API) would still be running at that point, and only until next step (5) above.

So are we talking about both the following?

  • move step (5) before step (4)
  • apply steps (3) and (5) to thread states that were not created by the threading module

Just to be clear, here are the ways thread states get created:

  • Py_Initialize*()
  • Py_NewInterpreter*()
  • PyThreadState_New()
  • PyGILState_Ensure()
  • _thread.start_new_thread() (via threading.Thread.start())

At the moment, it's mostly only with that last one that we are careful during runtime/interp finalization.

It occurs to me that this PR is mostly about addressing that: dealing with other thread states in the same way we currently do threads created via the threading module. Does that sound right?

But there is also the question of what happens if we try to create a thread state while Py_EndInterpreter is still in progress. Py_EndInterpreter doesn't seem to check for other thread states while holding the HEAD_LOCK, but that is not an issue as long as the check does not fail.

Yeah, we should probably be more deliberate about disallowing that sort of thing during finalization.

  • In general, given the "no other threads" constraint for Py_EndInterpreter it seems that if other non-Python-created or daermon threads hold references to the PyInterpreterState, then some external synchronization mechanism will be needed to ensure that they don't attempt to access the PyInterpreterState once the "no other threads" check completes.

That's what the proposed change in the PR is, AFAICS. The API you're adding must be specific to each interpreter, not to the global runtime. The resources that the proposed change protects are per-interpreter resources, not global ones. So I would not expect there to be any additional API or synchronization mechanism other than what you've already proposed (except applied to each interpreter instead of the main interpreter. Otherwise users of multiple interpreters will still be subject to the problem you're trying to solve.

As an example, suppose we have a C extension that provides a Python API that allows Python callbacks to be passed in, and then later calls those Python functions on its own non-Python-created thread pool. If this extension is to support sub-interpreters, then either during multi-phase module initialization, or when it receives the Python callback, it must record the PyInterpreterState associated with the callback. Then, in order to invoke the callback on a thread from its thread pool, it must obtain a PyThreadState for the (thread, interpreter) combination, creating one if one does not already exist.

That's literally what PyGILState_Ensure() is for and does. 😄

To ensure the PyInterpreterState pointers that it holds remain valid, it would need to register an AtExit function for the interpreter that ensures the PyInterpreterState won't be used. This AtExit function would likely need to essentially implement its own version of the "finalize block" mechanism introduced here.

Why wouldn't we just exclusively use the mechanism you're proposed here? Why would each interpreter have to have an additional duplicate? Again, the resources we're trying to protect here are specific to each interpreter, not to the global runtime, no?

Given the need for external synchronization of threads when calling Py_EndInterpreter, it seems to me that the finalize block mechanism defined by this PR is only useful for the main interpreter.

Hmm, I didn't catch what external synchonization of threads you are talking about. Sorry if I missed it or misunderstood. Please restate what you mean specifically. Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any experience with implementing extensions that work with multiple interpreters, but I'm trying to think how that would be done safely.

Let's say this extension lets the user schedule a Python callback to invoked at a specific time, on a global thread pool not created by Python.

With a single interpreter, the extension may either keep a cached PyThreadState per thread in the pool for the main interpreter, or create it on demand. I haven't checked exactly what happens when trying to create a new PyThreadState while PyFinalize_Ex is running, but I think there is no problem there. The core problem is that the extension could attempt to dispatch a callback just as Py_FinalizeEx is running. Using the existing finalize block mechanism in this PR, the extension can ensure that finalization does not start while a callback is being dispatched, in order to ensure threads in the thread pool won't hang trying to acquire the GIL.

With multiple interpreters, we need a separate PyThreadState per thread in the pool per interpreter, and for each callback that has been scheduled, we also need to store the associated PyInterpreterState*. However, we also need a way to know that the interpreter is exiting, and cancel any scheduled callbacks, so that we don't attempt to use a dangling PyInterpreterState pointer. If PyThreadState objects have been cached for the thread pool threads, we would also need to destroy those PyThreadState objects, to avoid violating the constraint of Py_EndInterpreter. This cancellation mechanism is what I mean by an external synchronization mechanism. Given this external synchronization mechanism, I don't think such an extension would need to use the "finalize block" mechanism. We can't use PyGILState_Ensure because that does not take a PyInterpreterState*, and even if it did, we would need a way to ensure that our PyInterpreterState* is not dangling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get it now. The "finalize block" API, as proposed, relies on the global runtime state, which is guaranteed to exist in the process, whereas a given interpreter state pointer may have already been freed.

That said, do we continually have the guarantees we might need relative to the global runtime state, since at a certain point we will have freed some of the state the proposed API would need, no? I suppose if we can rely on some final flag for already-finalized then we'd be okay.

As to interpreters, even if the target one has been finalized already, we can still know that. Interpreters may be looked up by ID, rather than referenced by pointer. It's an O(n) operation, of course, but I'm not sure that would be a huge obstacle. Likewise the pointer can be checked against the list of alive interpreters to check for validity. Would we need more than that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is safe to assume that there may be numerous race conditions still remaining in the finalization logic. In particular I'd assume that calling Py_Initialize again after Py_FinalizeEx could definitely result in a lot of problems. I had put that in the category of things not to be done in production code. At least for the single interpreter case, single call to Py_Initialize, I think such bugs could likely be fixed without further API changes, though.

I am unclear on how sub-interpreters should be handled. Checking if the PyInterpreterState* is valid by checking if it is in the list of alive interpreters could fail if the interpreter is freed and then another allocated again at the same address, unless something is done to prevent that. In addition, if a given C/C++ extension only finds out afterwards that an interpreter has been destroyed, it is too late for it to free any PyObjects it has, so it would likely end up leaking memory. Therefore an atexit callback would seem to be more appropriate.

For the single-interpreter case we don't care about leaking memory because the program is about to exit anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are some API questions to resolve here, one option may be to split out the change to make threads hang rather than terminate, which can go in right away, and I expect will be sufficient for almost all single-interpreter use cases. The finalize block API, or other API changes to safely support multi-threading without leaks in the presence of interpreters stopping, could then be added later without blocking the fix for the common case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed @jbms, can you make a PR splitting that part out?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, see #105805


Attempts to acquire a block on Python finalization.

While the *finalize block* is held, the Python interpreter will block before
it begins finalization. Holding a finalize block ensures that the
:term:`GIL` can be safely acquired without the risk of hanging the thread.
Refer to :ref:`cautions-regarding-runtime-finalization` for more details.

If successful, returns 1. If the interpreter is already finalizing, or about
to begin finalization and waiting for all previously-acquired finalize blocks
to be released, returns 0 without acquiring a finalize block.

Every successful call must be paired with a call to
:c:func:`PyThread_ReleaseFinalizeBlock`.

This function may be safely called with or without holding the :term:`GIL`.
jbms marked this conversation as resolved.
Show resolved Hide resolved

.. versionadded:: 3.12

.. c:function:: void PyThread_ReleaseFinalizeBlock()

Releases a finalize block acquired by a prior successful call to
:c:func:`PyThread_AcquireFinalizeBlock` (return value of 1).

.. versionadded:: 3.12

.. _sub-interpreter-support:

Expand Down Expand Up @@ -2007,4 +2150,3 @@ be used in new code.
.. c:function:: void* PyThread_get_key_value(int key)
.. c:function:: void PyThread_delete_key_value(int key)
.. c:function:: void PyThread_ReInitTLS()

4 changes: 4 additions & 0 deletions Doc/data/stable_abi.dat

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 13 additions & 0 deletions Include/internal/pycore_runtime.h
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,19 @@ typedef struct pyruntimestate {
to access it, don't access it directly. */
_Py_atomic_address _finalizing;

/* Tracks the finalize blocks.

Bit 0 is set to 1 by `Py_FinalizeEx` to indicate it is waiting to set `_finalizing`.

The remaining bits are a count of the number of finalize blocks that are
currently held. Once bit 0 is set to 1, the number of finalize blocks is
not allowed to increase.

Protected by `ceval.gil.mutex`, and `ceval.gil.cond` must be broadcast
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a per-interpreter GIL now but - I think what needed to happen is to switch to using the main interpreters GIL as protection for this. So that's what I've committed and pushed.

Alternatively this finalize_blocks could use a lock of its own?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems like it should apply to each interpreter, rather than just to the whole global runtime (i.e. main interpreter).

when it becomes 1.
*/
uintptr_t finalize_blocks;

struct pyinterpreters {
PyThread_type_lock mutex;
/* The linked list of interpreters, newest first. */
Expand Down
47 changes: 47 additions & 0 deletions Include/pystate.h
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,53 @@ PyAPI_FUNC(void) PyGILState_Release(PyGILState_STATE);
*/
PyAPI_FUNC(PyThreadState *) PyGILState_GetThisThreadState(void);

/* Attempts to acquire a block on interpreter finalization.

Returns 1 on success, or 0 if the interpreter is already waiting to finalize.

While the lock is held, the interpreter will not enter the finalization
state.

Each call that returns 1 must be paired with a subsequent call to
`PyThread_ReleaseFinalizeBlock`.

It is not necessary to hold the GIL. While holding a block on interpreter
finalization, a non-main thread can safely acquire the GIL without risking
becoming permanently blocked.
*/
PyAPI_FUNC(int) PyThread_TryAcquireFinalizeBlock(void);

/* Releases the block acquired by a successful call to
`PyThread_TryAcquireFinalizeBlock`. */
PyAPI_FUNC(void) PyThread_ReleaseFinalizeBlock(void);

typedef enum {
PyGILState_TRY_LOCK_FAILED,
PyGILState_TRY_LOCK_LOCKED,
PyGILState_TRY_LOCK_UNLOCKED
} PyGILState_TRY_STATE;

/* Attempts to acquire a finalize block, and if successful, acquires the GIL.

This is a simple convenience interface that saves having to call
`PyThread_TryAcquireFinalizeBlock()` and `PyGILState_Ensure()` separately.

Returns `PyGILState_TRY_LOCK_FAILED` (equal to 0) if the interpreter is
already waiting to finalize. In this case, the GIL is not acquired and
Python C APIs that require the GIL must not be called.

Otherwise, acquires a finalize block and then acquires the GIL.

Each call that is successful (i.e. returns a non-zero `PyGILState_TRY_STATE`
value) must be paired with a subsequent call to
`PyGILState_ReleaseGILAndFinalizeBlock` with the same value returned by this
function. Calling `PyGILState_ReleaseGILAndFinalizeBlock` with the error
value `PyGILState_TRY_LOCK_FAILED` is safe and does nothing. */
PyAPI_FUNC(PyGILState_TRY_STATE) PyGILState_TryAcquireFinalizeBlockAndGIL(void);

/* Releases any locks acquired by the corresponding call to
`PyGILState_TryAcquireFinalizeBlockAndGIL`. */
PyAPI_FUNC(void) PyGILState_ReleaseGILAndFinalizeBlock(PyGILState_TRY_STATE);

#ifndef Py_LIMITED_API
# define Py_CPYTHON_PYSTATE_H
Expand Down
32 changes: 31 additions & 1 deletion Include/pythread.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,37 @@ typedef enum PyLockStatus {

PyAPI_FUNC(void) PyThread_init_thread(void);
PyAPI_FUNC(unsigned long) PyThread_start_new_thread(void (*)(void *), void *);
PyAPI_FUNC(void) _Py_NO_RETURN PyThread_exit_thread(void);
/* Terminates the current thread.
*
* WARNING: This function is only safe to call if all functions in the full call
* stack are written to safely allow it. Additionally, the behavior is
* platform-dependent. This function should be avoided, and is no longer called
* by Python itself. It is retained only for compatibility with existing C
* extension code.
*
* With pthreads, calls `pthread_exit` which attempts to unwind the stack and
* call C++ destructors. If a `noexcept` function is reached, the program is
* terminated.
*
* On Windows, calls `_endthreadex` which kills the thread without calling C++
* destructors.
*
* In either case there is a risk of invalid references remaining to data on the
* thread stack.
*/
Py_DEPRECATED(3.12) PyAPI_FUNC(void) _Py_NO_RETURN PyThread_exit_thread(void);
gpshead marked this conversation as resolved.
Show resolved Hide resolved

#ifndef Py_LIMITED_API
/* Hangs the thread indefinitely without exiting it.
*
* bpo-42969: There is no safe way to exit a thread other than returning
* normally from its start function. This is used during finalization in lieu
* of actually exiting the thread. Since the program is expected to terminate
* soon anyway, it does not matter if the thread stack stays around until then.
*/
PyAPI_FUNC(void) _Py_NO_RETURN _PyThread_hang_thread(void);
#endif /* !Py_LIMITED_API */
Comment on lines +40 to +49
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this should probably go in Include/cpython/pythread.h.

Also, why the leading underscore? If it's not meant to public use then put it in Include/internal/pycore_pythread.h. Otherwise either drop the leading underscore or add the "PyUnstable_" prefix. (See https://devguide.python.org/developer-workflow/c-api/#c-api, AKA PEP 689.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #105805


PyAPI_FUNC(unsigned long) PyThread_get_thread_ident(void);

#if (defined(__APPLE__) || defined(__linux__) || defined(_WIN32) \
Expand Down
Loading