-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it easier to traverse the frame stack for third party tools. #100987
Comments
Initially, I propose to refactor the
Currently The order of |
Let me collect some feedback from maintainers of debuggers and profilers and will comment here the requirements so we can think of solutions. |
…improvement. (GH-100988) Refactor _PyInterpreterFrame a bit, to assist generator improvement.
* main: pythongh-101810: Remove duplicated st_ino calculation (pythonGH-101811) pythongh-92547: Purge sqlite3_enable_shared_cache() detection from configure (python#101873) pythonGH-100987: Refactor `_PyInterpreterFrame` a bit, to assist generator improvement. (pythonGH-100988) pythonGH-87849: Simplify stack effect of SEND and specialize it for generators and coroutines. (pythonGH-101788) Correct trivial grammar in reset_mock docs (python#101861) pythongh-101845: pyspecific: Fix i18n for availability directive (pythonGH-101846) pythongh-89792: Limit test_tools freeze test build parallelism based on the number of cores (python#101841) pythongh-85984: Utilize new "winsize" functions from termios in pty tests. (python#101831) pythongh-89792: Prevent test_tools from copying 1000M of "source" in freeze test (python#101837) Fix typo in test_fstring.py (python#101823) pythonGH-101797: allocate `PyExpat_CAPI` capsule on heap (python#101798) pythongh-101390: Fix docs for `imporlib.util.LazyLoader.factory` to properly call it a class method (pythonGH-101391)
@pablogsal Any feedback? |
We can further improve traversal of the We should rename the typedef struct _PyVMFrame {
PyObject *f_executable;
struct _PyVMFrame *previous;
} PyVMFrame; Although tools and the VM should tolerate any object, we should in practice only allow a few classes:
The tuple form is for tools like Cython, Nanobind, etc. Creating a tuple of C extension can link themselves into the frame stack at the cost of about 4 memory writes, and 3 reads:
We can do this for builtins functions by modifying the vectorcall function assigned to the builtin function/method descriptor. |
I have reached out again to tool authors, give me a couple of days to gather comments. Apologies for the delay |
No problem. |
…PyEval_EvalFrameDefault`. (#102640) * Rename local variables, names and consts, from the interpeter loop. Will allow non-code objects in frames for better introspection of C builtins and extensions. * Remove unused dummy variables.
* main: (50 commits) pythongh-102674: Remove _specialization_stats from Lib/opcode.py (python#102685) pythongh-102660: Handle m_copy Specially for the sys and builtins Modules (pythongh-102661) pythongh-102354: change python3 to python in docs examples (python#102696) pythongh-81057: Add a CI Check for New Unsupported C Global Variables (pythongh-102506) pythonGH-94851: check unicode consistency of static strings in debug mode (python#102684) pythongh-100315: clarification to `__slots__` docs. (python#102621) pythonGH-100227: cleanup initialization of global interned dict (python#102682) doc: Remove a duplicate 'versionchanged' in library/asyncio-task (pythongh-102677) pythongh-102013: Add PyUnstable_GC_VisitObjects (python#102014) pythonGH-102670: Use sumprod() to simplify, speed up, and improve accuracy of statistics functions (pythonGH-102649) pythongh-102627: Replace address pointing toward malicious web page (python#102630) pythongh-98831: Use DECREF_INPUTS() more (python#102409) pythongh-101659: Avoid Allocation for Shared Exceptions in the _xxsubinterpreters Module (pythongh-102659) pythongh-101524: Fix the ChannelID tp_name (pythongh-102655) pythongh-102069: Fix `__weakref__` descriptor generation for custom dataclasses (python#102075) pythongh-98169 dataclasses.astuple support DefaultDict (python#98170) pythongh-102650: Remove duplicate include directives from multiple source files (python#102651) pythonGH-100987: Don't cache references to the names and consts array in `_PyEval_EvalFrameDefault`. (python#102640) pythongh-87092: refactor assemble() to a number of separate functions, which do not need the compiler struct (python#102562) pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102631) ...
@benfred -^ |
I've made a branch that adds "lightweight" frames (just a pointer to a "code" object and a link pointer), and inserts one for each call to a builtin function in the interpreter. The performance impact is negligible and all builtin function and class calls are present in the frame stack. |
We still need the concept of entry frames for tools that merge native and python stacks. Why do you removed |
It's a proof of concept, it was easier to remove |
👍 |
We are also very interested in this proposal from the Cinder JIT perspective. One difference I see with our use case compared to what the draft PR so far aims to support is that we would like to be able to link in "minimal frames" that are still considered "complete": they are fetched by In the tuple form of |
In general the feedback is that this will make introspection tool much harder to implement. Currently the fact that this can only be a code object makes it very easy to traverse the frame stack. If you allow any Python object it makes it harder or in some cases even impossible. If we restrict this to a finite set of possibilities, it still makes it much harder but if we add some kind of enumeration to the frame that tells the tool what's going to be there it makes it a bit easier. In general I don't think that this proposal helps introspection tools, it actually makes the implementation harder and less efficient because more pointers need to be copied and more logic needs to be included. |
Some comments from authors:
|
In general the sentiment is that the more regular the structure is, the easier is for profilers and debuggers to traverse the stack. The more variations and python-isms (as in, using |
Feedback from who? Which operations for which tools become harder? No one is forcing tools to handle all possible frames. They can skip frames that have "executable" objects other than code objects. The presence of additional information that tools ignore cannot be worse than that information not being present in the first place.
Two fields, one pointing to the next frame, and one pointing to the executable object, seems quite regular to me. |
It might be informative to compare this with adding
|
… in `_PyEval_EvalFrameDefault`. (python#102640) * Rename local variables, names and consts, from the interpeter loop. Will allow non-code objects in frames for better introspection of C builtins and extensions. * Remove unused dummy variables.
Authors of profilers and debuggers. For now authors of Austin, py-spy, scygraph and fil, and myself (memray/pystack). I collected feedback from them but if you prefer that they comment here directly individually I can also ask for that.
Getting the Python stack from a remote process reading memory.
Informative how? |
Yes, please.
Is it really? In that case let's drop it now before it causes trouble for 3.13. |
Let's make it clear. The choice isn't between the proposed ABI and the status quo. The choice is between a well defined, if minimal, ABI and no ABI guarantees whatsoever. For example, we need to insert frames for shims at the exit from And don't forget the producers of frames, as well as the consumers. If the ABI I'm suggesting is not a good one, then you need to suggest a better one.
The purpose of adding perf support is that tools can see the mixed C/Python stack. |
The feature has landed already in 3.12 and is already released in alpha. I respect your position but I disagree with it. I suggest that if you want to discuss this, we can do it in a more real-time channel other than a GitHub issue. |
Refocusing the discussion on the original issue at hand. I think that if we add some kind of metadata to the frame that tells the tool what kind of frame is this (so basically what's going to be in the "f_executable" field, that's already a win. As you mention, tools may want to skip some of these frames if it cannot be handled. On the other hand if we support simple structs or simple Python objects (as opposed to custom classes or even dictionaries) in Additionally, I would like if we keep the current structure as preserved as possible (with this I mean that most frames will have a Also, I think having these fields as you propose:
is a big win as tools don't need to update these definitions every single time. |
The "f_executable" field is its own metadata, as Python objects are self-describing. We can restrict the number of classes that are officially supported. If "f_executable" object is one of those, then tools can use that information. It is something else, they can just ignore it. In the minimal case of just supporting code objects: while (frame);
if (frame->f_executable->ob_type == &PyCode_Type) {
do_stuff_with_frame(frame);
}
frame = frame->previous;
} As for what should be supported:
The VM might create frames for other objects, but tools should ignore them. |
I understand, but this makes life for inspection tools harder because it forces to copy much more stuff (the class and the name at least) instead of inspecting an enum. I would like to have what is in
I see what you are coming from, but allowing all these things is going to make implementing these tools a nightmare because supporting all these possibilities is a lot. I would like to restrict this list to just simple stuff like tuples, code objects and maybe some c-like struct that can be used for more exotic stuff. This is basically what you said here:
I am advocating for much tighter restrictions, but I understand that's not the direction that you want to go and I respect that. |
We could use a tagged pointer in In the minimal form, we can leave the low bit 0 to indicate "normal frame, pointer to code object" (then there is also zero overhead in normal interpreter frame creation) and set it to 1 to mean "pointer to something new and different." Then existing tools that want to just handle normal code objects like they already do only need a single bit test to discard frames they don't want to deal with. There are another two bits we could play with if we want to provide streamlined indication of any other common cases (builtin function, tuple form, maybe?). I hope we can make life easier for existing inspection tools by making it really easy to detect the common cases they want to care about, but I also hope (from the Cinder JIT perspective) that at least one of the valid options for |
I'm curious, why would we want frames to hold a pointer to a class (I assume while executing the class body?) rather than to the code object of the class body? |
class C: pass
c = C() # This is a call to a class |
@pablogsal |
@carlmjm |
Not necessarily. These tools need to work sometimes with stripped binaries or core files and requiring the symbol table there can be a huge pain compared with just checking against an integer, as we are vendoring the headers anyway. In particular, as an example (please don't focus too much on this) in core files is a huge pain because the address may not be in the core if is in the Currently, once you get the frame, you access the If we have an integer in the frame that tells what f_executable will have, then we can compare against it directly and know what we are going to find. No extra information or copies are required. If we need to compare with something now we require:
But an enum describing what it contains allows us to KNOW what the pointer will contain and for instance be super sure that the pointer is some custom stuff that we won't be able to understand instead of having to "guess" based on "oh, this pointer is not one of the ones we know about (code, tuples...)) |
I appreciate that having extra information will make life easier for a few tool authors, but it might make things a little bit slower for very many Python users. How do you get the frame without any symbols?
Any class outside of the fixed set (whatever that ends up being) should be ignored, so no "random" classes. |
Find the interpreter state and having the headers vendor so we know the offsets to the pointers in every struct and we know what we are going to find because at the moment is fully determined. The interpreter state can be found because we (cpython) place the runtime structure in its own section so it can be found without symbols: Line 100 in 7703def
Although this is technically not needed because it can be found by finding the cycle interpreter state <-> thread state by scanning the |
Sure, if you have Tagging bits could "make life easier for a few tool authors" in the scenarios @pablogsal is mentioning without "making things slower for very many Python users." EDIT: also, it's not |
Tagging might solve the performance issue. But we need to support 32bit machines, so we only have 2 bits to play with, which is not enough. |
If tools can find the runtime, then we can put an array of pointers there. No runtime overhead at the cost of ~40 bytes. PyObject *callable_types[] = {
&PyCode_Type,
...
}; |
That would be an acceptable compromise I think. |
OK, let's go with that then. FTR, one other reason not to use an enum is this: what happens when the enumeration and the executable don't match? By allowing objects of any class, but designating a small set of "approved" classes, the system is much more robust. |
@pablogsal |
I don't see a problem with that. @pablogsal Would this be OK, or is this too complex for your tastes? |
This would be me, maintainer of Austin. For context, Austin uses system calls like |
Austin uses symbols to locate |
Apologies if I slightly derail the conversation, but I wanted to express the following thought. Based on my experience with Austin, I would regard frame stack unwinding as just one aspect of the more general topic of observability into the Python VM. For example, one other thing that Austin tries to do is to sample the GC state to give an idea of how much CPU time is being spent on GC. Or detect who is holding the GIL to give a better estimate of RSS allocations. Therefore, I would tend to view frame stacks as just a part of what can be observed out of process. So the way I see a tool like Austin extracting this information in the future is by looking into an "observability entry point", much like
|
… in `_PyEval_EvalFrameDefault`. (python#102640) * Rename local variables, names and consts, from the interpeter loop. Will allow non-code objects in frames for better introspection of C builtins and extensions. * Remove unused dummy variables.
…of an internal frame. (GH-105727) * Add table describing possible executable classes for out-of-process debuggers. * Remove shim code object creation code as it is no longer needed. * Make lltrace a bit more robust w.r.t. non-standard frames.
Profilers and debuggers need to traverse the frame stack, but the layout of the stack is an internal implementation detail.
However can make some limited promises to make porting tools between Python versions a bit easier.
In order to traverse the stack, the offset of the
previous
pointer needs to be known. To understand the frame, more information is needed.@pablogsal
@Yhg1s
expressed interest in this.
Linked PRs
_PyInterpreterFrame
a bit, to assist generator improvement. #100988_PyEval_EvalFrameDefault
. #102640The text was updated successfully, but these errors were encountered: