-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[subinterpreters] Per-interpreter singletons (None, True, False, etc.) #83692
Comments
The long-term goal of the PEP-554 is to run two Python interpreters in parallel. To achieve this goal, no object must be shared between two interpreters. See for example my article "Pass the Python thread state explicitly" which gives a longer rationale: In bpo-38858, I modified Objects/longobject.c to have per-interpreter small integer singletons: commit 630c8df. This issue is about other singletons like None or Py_True which are currently shared between two interpreters. I propose to add new functions. Example for None:
And add PyInterpreterState.none field: strong reference to the per-interpreter None object. We should do that for each singletons:
GIL issue Py_GetNone() would look like: PyObject* Py_GetNone(void)
{ return _PyThreadState_GET()->interp->none; } Problem: _PyThreadState_GET() returns NULL if the caller function doesn't hold the GIL. Using the Python C API when the GIL is not held is a violation of the API: it is not supported. But it worked previously. One solution is to fail with an assertion error (abort the process) in debug mode, and let Python crash in release mode. Another option is to only fail with an assertion error in debug mode in Python 3.9. In Python 3.9, Py_GetNone() would use PyGILState_GetThisThreadState() function which works even when the GIL is released. In Python 3.10, we would switch to _PyThreadState_GET() and so crash in release mode. One concrete example of such issue can be found in the multiprocessing C code, in semlock_acquire(): Py_BEGIN_ALLOW_THREADS
if (timeout_obj == Py_None) {
res = sem_wait(self->handle);
}
else {
res = sem_timedwait(self->handle, &deadline);
}
Py_END_ALLOW_THREADS Py_None is accessed when the GIL is released. |
Would it not suffice to just make the singletons "immortal"? Without affecting the hotpaths that are Py_INCREF and Py_DECREF, changing _Py_Dealloc to test for objects with a "special" destructor could be used: destructor dealloc = Py_TYPE(op)->tp_dealloc;
if (dealloc == _Py_SingletonSentinel) {
/* reset refcnt so as to not return here too often */
op->ob_refcnt = PY_SSIZE_T_MAX;
}
else {
(*dealloc)(op);
} Even in the presence of multiple mutating threads, the object cannot be destroyed. Worst case, they all call _Py_Dealloc. |
The problem is to make Py_INCREF/Py_DECREF efficient. Last time someone tried to use an atomic variable for ob_refcnt, it was 20% slower if I recall correctly. If many threads start to update such atomic variable, the CPU cacheline of common singletons like None, True and False can quickly become a performance bottleneck. On the other side, if each interpreter has its own objects, there is no need to protect ob_refcnt, the interpreter lock protects it. |
PR 18301 is a WIP showing my intent. I'm not sure if it would be possible to land such change right now in Python. It has different drawbacks described in my previous messages. I don't know the impact on performance neither. |
That is exactly why I didn't propose a change to them. The singletons
Exactly so, hence why I chose the simple solution of effectively
My solution also does not need any protection around ob_refcnt. |
I vaguely recall discussions about immortal Python objects. (*) Instagram gc.freeze()
(*) Python immortal strings
(*) COUNT_ALLOCS
(*) Static types |
Recently, Petr Viktorin proposed immortal singletons in my latest "Pass the Python thread state to internal C functions" thread on python-dev list: In 2004, Jewett, Jim J proposed: "What if a few common (constant, singleton) objects (such as None, -1, 0, 1) were declared immortal at compile-time?" |
Is the sub-interpreter PEP approved? If not, I had thought the plan was to only implement PRs that made clean-ups that would have been necessary anyway. |
Random idea (not carefully thought-out): Would it be simpler to have these objects just ignore their refcount by having dealloc() be a null operation or having it set the refcount back to a positive number). That would let sub-interpreters share the objects without worrying about race-conditions on incref/decref operations. To make this work, the objects can register themselves as permanent, shared, objects; then, during shutdown, we could explicitly call a hard dealloc on those objects. |
+1, obviously, as I came to the same conclusion above (see msg361122) |
On Sun, Feb 2, 2020 at 2:53 PM Raymond Hettinger <[email protected]> wrote:
PEP-554 is not approved yet (and certainly is not guaranteed, though There is no PEP for the effort to isolate subinterpreters and stop I need to do a better job about communicating the difference, as folks
Right. In this case the "cleanup" is how singletons are finalized That said, making the singletons per-interpreter isn't a prerequisite [1] https://github.com/ericsnowcurrently/multi-core-python |
On Sun, Feb 2, 2020 at 3:32 PM Raymond Hettinger <[email protected]> wrote:
This is pretty much one of the two approaches I have been considering. :) Just to be clear, singletons normally won't end up with a refcount of
We would have to special-case refleak checks for singletons, to avoid Also note that currently extension authors (and CPython contributors)
great point |
The other approach is to leave the current static singletons alone and |
Which API should be used in C extensions to be "subinterpreter-safe"? Currently, Py_None is a singleton shared by multiple interpreters. Should suddenly all C extensions use a new Py_GetNone() function which returns the per-interpreter singleton? If yes, that's basically what my PR 18301 does: #define Py_None Py_GetNone() |
I expect issues with negative reference count value. As you wrote, it triggers a fatal error when Python is built in release mode.
Py_None is heavily used. If the reference count is updated by multiple threads with no lock to protect it, there is a significant risk that value zero will be reached soon or later. -- In the Linux kernel, they started to special type for reference counters, to reduce the risk of vulnerability on reference counter underflow or overflow:
The kernel already used atomic_t type. But the issue here is about bugs, since no program is perfect, even the Linux kernel. |
Ah, I also found the idea of immortal None in an old discussion on tagged pointer: Stefan Behnel proposed the idea: "All negative refcounts would have special meanings, such as: this is the immortal None, (...)". |
Aren't there a couple more lurking in the interpreter? E.g. empty tuple, empty frozenset.
That seems like a very good idea! They don't even need to "resurrect" themselves--we just ensure tp_dealloc is a no-op for those special values. If we do that, different threads and different interpreters can change ob_refcnt willy-nilly, there can be unsafe races between threads, the value could no longer make any sense--but as long as we never free the object, it's all totally fine. (Actually: tp_dealloc shouldn't be a no-op--it should add a million to the reference count for these special objects, to forestall future irrelevant calls to tp_dealloc.) This might have minor deleterious effects, e.g. sys.getrefcount() would return misleading results for such objects. I think that's acceptable. |
Consider the case where a thread that doesn't hold the GIL attempts to get a reference on The problem with having a single immortal
|
Mark:
Yeah, I concur with Mark: having one singleton per interpreter should provide better usage of the CPU caches, especially CPU data cache level 1. Mark:
The main drawback of PR 18301 is that accessing "Py_None" means accessing tstate->interp->none. Except that the commonly used _PyThreadState_GET() returns NULL if the thread doesn't hold the GIL. One alternative would be to use PyGILState_GetThisThreadState() but this API doesn't support subinterpreters. Maybe we are moving towards a major backward incompatible changes required to make the subinterpreters implementation more efficient. Maybe CPython should have a backward compatible behavior by default (Py_None can be read without holding the GIL), but running subinterpreters in parallel would change Py_None behavior (cannot be read without holding the GIL). I don't know. |
That's a very reasonable theory. Personally, I find modern CPU architecture bewildering and unpredictable. So I'd prefer it if somebody tests such performance claims, rather than simply asserting them and having that be the final design. |
Having two CPUs write to the same cache line is a well known performance problem. There's nothing special about CPython here. The proper name for it seems to be "cache line ping-pong", but a search for "false sharing" might be more informative. |
I expect that for objects which are not commonly modified by two interpreters "at the same time", it should be fine. But None, True, small integer singletons, latin-1 str single character singletons, etc. objects are likely to be frequently accessed and so can become a bottleneck. Moreover, Python has an infamous feature: write (modify ob_refcnt) on "read-only" access :-D See "Copy-on-Read" feature popularized by Instagram ;-) https://instagram-engineering.com/copy-on-write-friendly-python-garbage-collection-ad6ed5233ddf |
bpo-40255 proposes a concrete implementation of immortal objects: it modifies Py_INCREF and Py_DECREF which makes Python up to 1.17x slower. |
Those numbers are for code without immortal objects. |
after read you [WIP] bpo-39511: Add Py_GetNone() and Py_GetNoneRef() functions bpo-18301. Actually, interp->none shared _Py_NoneStruct variable. when two interperter modify interp->none refcount,will modify _Py_NoneStruct variable.
even if add Py_INCREF(none);.
|
My PR 18301 is a draft to check if we can solve the issue without breaking the C API compatibility. You're right that it doesn't solve the issue, it only checks the C API issue. IMO the PR 18301 proves that the "#define Py_None Py_GetNone()" trick works. -- By the way, when I worked on a tagged pointer experiment: I had to introduce Py_IS_NONE(op) function, since it was no longer possible to compare directly "op == Py_None". static inline int Py_IS_NONE(PyObject *op) {
return (op == &_Py_NoneStruct || op == _Py_TAGPTR_NONE);
} But this is not needed to solve this issue. |
Shouldn't this wait to see if the subinterpreters PEP is approved? Because if it isn't, then no chance should be made. We shouldn't change something this fundamental without good cause. |
Raymond Hettinger: "Shouldn't this wait to see if the subinterpreters PEP is approved? Because if it isn't, then no chance should be made. We shouldn't change something this fundamental without good cause." I agree that we reached a point where a PEP is needed before pushing further "controversial" changes related to subinterpreters and bpo-1635741 (especially converting static types to heap types (bpo-40077). I plan to write multiple PEPs: |
I'm looking very much forward to isolated subinterpreters and thus the per-subinterpreter GIL, as I've been keeping a private exploratory project where I had to make them work. Here are my thoughts:
|
My #62501 PR uses "#define Py_None Py_GetNone()" which is backward compatible in terms of API. Py_GetNone() can have various implementations, it doesn't matter at the API level. |
It seems like most core devs prefer https://peps.python.org/pep-0683/ (even if it's still a draft). I close this issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: