-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate more efficient ARM64 prologs/epilogs #88823
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsDuring the investigation of #88292 I found that NativeAOT/ARM64 and R2R never generates frameless methods. A typical app ends up with >30% of methods with simple frame prolog/epilog with no callee saved registers or extra stack space. Most of these methods are very likely to be leaf methods which can be frameless. For example, take this simple method: int Square(int num) { return num * num; } NativeAOT generates the following code: Program:Square(int):int (FullOpts):
stp fp, lr, [sp, #-0x10]!
mov fp, sp
mul w0, w0, w0
ldp fp, lr, [sp], #0x10
ret lr An optimizing C compiler ( square: // @square
mul w0, w0, w0
ret Not only the code size is significantly smaller, but it also saves a lot of space for the unwinding information.
|
Dup of #35274? |
An orthogonal issue would be to optionally generate prologs that are compatible with Apple Compact Unwinding:
An example of Apple compatible prolog is stp x24, x23, [sp, #-0x40]!
stp x22, x21, [sp, #0x10]
stp x20, x19, [sp, #0x20]
stp x29, x30, [sp, #0x30]
add x29, sp, #0x30 ; x29/fp points just below the saved chain The actual instructions and their order are not important, the resulting frame layout is. Cursory observation of JIT output shows that:
|
I suppose it is a duplicate in a way, although I am specifically focusing on NativeAOT here which has slight differences in the GC suspension architecture. Also, the numbers in that issue don't correspond at all to my observations, and they don't take into account the size of unwinding information emitted in the NativeAOT case. |
Inline all the small methods! 🙂 |
Moving to Future because we are past .NET 8 Preview 7 code complete due date. |
During the investigation of #88292 I found that NativeAOT/ARM64 and R2R never generates frameless methods. A typical app ends up with >30% of methods with simple frame prolog/epilog with no callee saved registers or extra stack space. Most of these methods are very likely to be leaf methods which can be frameless.
For example, take this simple method:
NativeAOT generates the following code:
An optimizing C compiler (
clang -O
) generates:Not only the code size is significantly smaller, but it also saves a lot of space for the unwinding information.
The text was updated successfully, but these errors were encountered: