-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We need to change the contract and interface of _PyExecutorObject
and _PyOptimizerObject
#108866
Comments
We haven't implemented optimization on |
We may also need the executor to return the |
I think you meant this makes life harder for the executor? Or maybe it's harder for both. :-) I'm not sure how to implement the proposal except by ensuring in the optimizer that the first Tier 1 instruction it translates doesn't deopt. This is easily checked, since gh-108583 introduced the I wouldn't want the Tier 2 executor to be responsible for executing the deoptimized Tier 1 instruction. I suppose in some cases we could define the "family head" in Tier 1 as a macro that is translatable to Tier 2 (in some cases it may already be translatable, though that isn't the case for
Are you thinking of having Anyway, I see the attractiveness of the idea (you can plunk |
We start optimizing on
Indeed. So don't. That what
No
Being obliged to execute the first instruction would make things complicated, except that there is no obligation to optimize at all. If it is too complicated, just let tier 1 handle it. |
Okay, since the "first instruction" in this case is always |
I just realized I don't think I understood this example before. :-( But maybe another motivation works for me. If However, consider the example you gave, where we want to replace a specialization of So I'm still a little bit stumped. Aren't we introducing a lot of complexity with this requirement? |
Separately, it would be nice if the The special-casing required wouldn't be too terrible, would it? |
That is all true, but easily solved for our tier 2 optimizer; we simply don't insert However, the ability to add an
We want to enter the executor at the end of the loop. It effectively gives us loop peeling for free, which improves type stability.
Only if it makes things measurably better. |
Offline there was considerable skepticism about the feasibility -- e.g. even Point taken about loop peeling. |
There seems to be some reluctance to do this. So let me give the reasons why we need to do this. We want a guarantee of progress when the flow of control becomes obscure, as it does when we are moving from tier 1 to tier 2 and from trace to trace, otherwise the program can get stuck. If executors are not required to make progress, we cannot stitch them together freely as we might end up forming a loop that make no progress. Consider three traces If the requirement that an executor makes progress is were difficult to implement, I would understand the reluctance.
|
The example is too abstract, because I haven't seen the future yet where traces are joined. Is joining the same as stitching? Where would Regarding tricks to force progress, consider |
All right, off line I got a better understanding of stitching. The idea is that (almost) every exit from a trace (as of gh-112045 always through a deopt) has a counter, and if it becomes hot enough, we project a new trace from that point, and change the deopt exit to transfer directly to that new trace. This should give us polymorphism as well (e.g. the deopt from Anyway, I do see that this kind of stitching would be complicated if the target executor makes no progress. One simple way to guarantee that, BTW, would be to just leave certain instructions (like unspecializable |
The contract of
_PyExecutorObject
currently is that it executes zero or more bytecode instructions.We should change that so that it must execute at least one instruction.
The reason for this change is so that we can use
ENTER_EXECUTOR
anywhere, and that we will need to replace arbitrary instructions withENTER_EXECUTOR
(see below for why)If a
_PyExecutorObject
executes zero instructions, thenENTER_EXECUTOR
is responsible for executing the original instruction.If it executes one or more instructions the behavior of the first instruction is handled by the
_PyExecutorObject
soENTER_EXECUTOR
is just a simple, and fast, (tail) call.We also want to change the signature of the
execute
function pointer to take_PyExecutorObject **
instead of_PyExecutorObject *
.See faster-cpython/ideas#621 for details.
We might as well make both changes at once.
I think our only "real" optimizer already executes at least three instructions, so it should be a fairly easy change.
Why do we need insert executors at arbitrary instructions?
Consider a nested if with at least two balanced hot paths, and at least one cold path.
At the join point, we want both paths to continue in optimized code, but as neither represents more than 50% of the flow, they will likely stop at the join point. Ideally, they will both jump into the same, following optimized code. But in order to find it, it needs to be attached to the tier 1 instructions using
ENTER_EXECUTOR
and that join point could be an arbitrary instruction, likely aLOAD_FAST
orLOAD_GLOBAL
.Note that this makes life harder for the optimizer, as it cannot simply exit the optimized code if a guard fails in the first instruction. It is obliged to fully execute that instruction.
Other optimizers might also want to overwrite instructions with
ENTER_EXECUTOR
; PyTorch Dynamo, for example.@gvanrossum
Linked PRs
The text was updated successfully, but these errors were encountered: