-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better specialization of calls, post introduction of PRECALL
.
#267
Comments
The above sequence is still a bit messy, with an unnecessary distinction between calling an attribute, and other calls.
Now The downside is that we are adding yet another instruction into the call sequence. Stats show that However, More uniform and effective specialization should pay this back and plenty more. |
Specialization strategy for all currently specialized types/values and most of the types that we currently fail on.
* If the lower slot on the stack is The only class of failures in https://github.com/faster-cpython/ideas/blob/main/stats.md not covered is "complex parameters" which are Python functions, so would be handled by adding more cases to the specialization of Python functions. |
Looking at the above table, it is clear that we should place Builtin functions and method descriptors with Swapping |
Resulting in this table:
|
In summary, we will:
|
It looks like specializing calls to Python classes is more complex than I allowed for. The three possible approaches we have so far come up with are:
TBH, all these solutions are a bit unsatisfying.
I think we should put this on hold for now. |
I'm not sure how well this would work, but would it be possible to do modifications to the |
Spencer, I toyed with the idea too but ditched it because it means specialization attempts will be very expensive for large Mark, I will bench approach 2 once with pyperformance. If there's an overall speedup, I think it won't matter if functions slow down a little (specialization slows down all non-specialized stuff anyways and we still do it :). However, the fragility will definitely still be a concern even with good pyperformance numbers. |
Ideally our solution would be general enough to also work for other cases where we may want to do some "cleanup" after calling a Python function. A few possible examples:
|
Regarding |
The benchmarks for 2 look promising. |
I hope I'm not repeating after anyone else here, but maybe a fourth approach that merges 2 and 3 would work:
|
Update: The idea there is that apart from the |
Update again: |
I'm still not liking any of the approaches to specializing calls to Python classes much, but the complexity of the other solutions means that I'm inclining to 1. Approach 1 (pushing a temporary frame) is robust and doesn't bulk out the calling sequence even more. Approach 4 seems reasonably robust, but adds an extra cache and instruction to the call sequence, which will have knock on effects. We can make approach 1 reasonably efficient by adding a trampoline function, that calls This def init_trampoline(self, <args>):
res = type(self).__dict__["__init__"](self, <args>)
if res is not None:
raise ...
return self We can implement As @Fidget-Spinner points out when talking about approach 4, this approach has the potential to be generalized to other calls, where we want to substitute one callable for another, to expose better optimization opportunities. |
@Fidget-Spinner thanks for help and perseverance with this issue. |
@markshannon I think you got it right when talking about the pros and cons of 1 vs 4 . I'll reiterate here so I remember in the future:
1's Cons:
4's Pros:
4's Cons:
Personally, I still lean a little towards approach 4. Mainly because it saves us a lot of code. Every time we add a new |
It's time for the next chapter in this ongoing saga... If we choose option 1 above, and make an assumption about the use of bound-methods, then we can remove the The assumption I want to make is this: Given that, we can add a |
Since |
With the new opcodes, the sequence of instructions executed during a call looks like this:
The stats https://github.com/faster-cpython/ideas/blob/main/stats.md#call show that we only have a 72% hit rate on the standard benchmark suite.
Specialization attempts
Each of these failures need a different strategy, so multiple strategies.
Bound methods.
Bounds methods can come from two places, bound methods object explicitly used in the program, and classmethods.
The first should be handled in
PRECALL_FUNCTION
, the second inLOAD_METHOD
.Complex parameters.
This needs more investigation, to see if some of these can be specialized.
Python class
Should be handled in
PRECALL_FUNCTION
which will create theself
object and push a clean-up frame, leaving the__init__
method to be handled byCALL
.Builtin functions and classes using the older caller conventions
These should be fixed by modernizing the callee, not accommodating them in the interpreter.
Mutable classes
Mystery classification. Needs investigation, possibly a bug in the classification.
The text was updated successfully, but these errors were encountered: