Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign the frame layout to avoid the redundant computation of the stackOffsets #99278

Closed
wants to merge 1 commit into from

Conversation

shushanhf
Copy link
Contributor

@shushanhf shushanhf commented Mar 5, 2024

Redesign the frame layout to avoid the redundant computation of the stackOffsets.
After the lvaAssignVirtualFrameOffsetsToLocals, there is no need to recompute the LclVarDsc's StackOffset within the lvaFixVirtualFrameOffsets.

First I push the LoongArch64's optimization, then I will implement the ARM64's optimization. (Of course including the RISCV64)
After this PR's optimization, the frame layout liking this:

    //      |                       |
    //      |-----------------------|
    //      |  incoming arguments   |
    //      +=======================+ <---- Caller's SP
    //      |  Varargs regs space   | // Only for varargs functions; 64 bytes
    //      |-----------------------|
    //      |    MonitorAcquired    | // 8 bytes; for synchronized methods
    //      |-----------------------|
    //      |        PSP slot       | // 8 bytes (omitted in NativeAOT ABI)
    //      |-----------------------|
    //      |Callee saved registers | // not including FP/LR; multiple of 8 bytes
    //      |-----------------------|
    //      |      Saved LR         | // 8 bytes, For LoongArch64/RISCV64, here is RA.
    //      |-----------------------|
    //      |      Saved FP         | // 8 bytes,  for ARM64, here is 16-aligned.
    //      |-----------------------|
    //      |  possible GS cookie   |
    //      |-----------------------|
    //      | locals, temps, etc.   |
    //      |-----------------------|
    //      |  possible GS cookie   |
    //      |-----------------------|
    //      |   Outgoing arg space  | // multiple of 8 bytes; if required (i.e., #outsz != 0)
    //      |-----------------------| <---- Ambient SP
    //      |       |               |
    //      ~       | Stack grows   ~
    //      |       | downward      |
    //              V

The frame layout will be divided into two parts by the FP slot.
The stackOffsets is positive above the FP slot, while the stackOffsets is negative below the FP slot.
As this PR, the stackOffsets will be finished with the lvaAssignVirtualFrameOffsetsToLocals,
so there is no need to recompute the LclVarDsc's StackOffset within the lvaFixVirtualFrameOffsets.
Besides, the OSR's stackOffset within the Compiler::generatePatchpointInfo() can also be omitted as the offsetAdjust is zero.

Especially for ARM64, the CodeGen::genPushCalleeSavedRegisters will be significantly simplified.
At the same time, there is also no need to use the SetSaveFpLrWithAllCalleeSavedRegisters.


The TARGET_XARCH is not always using the frame pointer.
So I will amend the Compiler::lvaAssignVirtualFrameOffsetsToLocals depending on the frame layout is based on frame pointer or sp, liking this:

// The `Compiler::lvaAssignVirtualFrameOffsetsToLocals`.
if  stackOffset is based on the frame pointer:  // NOTE: the frame pointer is saved on the top where higher address.
{
       // the stackOffset is decreasing one by one.  The LoongArch64 and RISCV64 are always here.
}
else //stackOffset is based on the SP:      //NOTE: include the ARM64's FP is saved the SP+0 which equivalent of based on SP.
{
       // the stackOffset is increasing one by one.
}

The stackOffset will be final value within the Compiler::lvaAssignVirtualFrameOffsetsToLocals.
Then will delete the final computation of stackOffset within the Compiler::lvaFixVirtualFrameOffsets.

Next plans:

  • supporting the ARM64

As the ARM32 and XARCH don't always use the frame pointer, these should be done by a separate PR.

This PR is the first PR and I will push a series of PRs on future to amend related code for ARMARCH/XARCH/LA64/RISCV64.
But don't worry, I will don't increase the ARM64's Prolog/Epilog's asm size and will not disturb the runtime-team to optimize the leaf functions's FP #35274, I can wait to modify ARM64 unitil you finish liking #35274.
I will uniform most for ARMARCH/XARCH/LA64/RISCV64 to ease the long term burden for different implementations.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 5, 2024
@ghost
Copy link

ghost commented Mar 5, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Redesign the frame layout to avoid the redundant computation of the stackOffsets.
After the lvaAssignVirtualFrameOffsetsToLocals, there is no need to recompute the LclVarDsc's StackOffset within the lvaFixVirtualFrameOffsets.

First I push the LoongArch64's optimization, if needed, I can also implement the ARM64's optimization.

Author: shushanhf
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member

What is the optimization/benefit here? It does not seem like a good idea to make LA64 different from the other backends (e.g. I see it introduces more ifdefs in shared code).

@shushanhf
Copy link
Contributor Author

What is the optimization/benefit here?

I want to optimize the computation of the LclVarDsc's StackOffset.
Add this PR, after the lvaAssignVirtualFrameOffsetsToLocals, there is no need to recompute the LclVarDsc's StackOffset within the lvaFixVirtualFrameOffsets.

It does not seem like a good idea to make LA64 different from the other backends (e.g. I see it introduces more ifdefs in shared code).

This PR is the first PR, if this is OK, I can implement the other CPU liking this PR, at least the RISC CPUs.

@jakobbotsch
Copy link
Member

I'm not that familiar with the stack layout stuff, but are you saying that lvaFixVirtualFrameOffsets can be removed on x64/arm64 as well? If so that sounds like a good clean up.
@BruceForstall may be more familiar with this code and may have some opinions here.

@shushanhf
Copy link
Contributor Author

shushanhf commented Mar 5, 2024

I'm not that familiar with the stack layout stuff, but are you saying that lvaFixVirtualFrameOffsets can be removed on x64/arm64 as well?

At least for RISC-ISA CPUs liking the LA64/ARM64/RISCV64.

For LA64, I had tested it OK.

@BruceForstall
Copy link
Member

I don't understand why LA needs to be different here. It doesn't seem like there is any reason to "optimize" the setting of LclVarDsc StackOffset.

@shushanhf
Copy link
Contributor Author

I don't understand why LA needs to be different here. It doesn't seem like there is any reason to "optimize" the setting of LclVarDsc StackOffset.

I will give detailed explanation later.

@shushanhf
Copy link
Contributor Author

shushanhf commented Mar 7, 2024

I don't understand why LA needs to be different here. It doesn't seem like there is any reason to "optimize" the setting of LclVarDsc StackOffset.

Hi, @BruceForstall
The computation of the LclVarDsc StackOffset are mainly two steps:

  • lvaAssignVirtualFrameOffsetsToLocals is just setting an initial value (that is a media value only) and compute the compLclFrameSize;
  • During the lvaFixVirtualFrameOffsets, the LclVarDsc StackOffset will be determined one by one by the delta value. Here is just adding a value delta through a for-loop. After finished these, the LclVarDsc StackOffset is the last relative offset.

In fact, the two steps can be merged together by optimization liking this PR.
During the lvaAssignVirtualFrameOffsetsToLocals by this PR, the LclVarDsc StackOffset is finished(here the offset is the real relative offset) while there is no need to add a delta within the lvaFixVirtualFrameOffsets by a for-loop.
Besides, the OSR's stackOffset within the Compiler::generatePatchpointInfo() can also be omitted as the offsetAdjust is zero.

@shushanhf
Copy link
Contributor Author

I don't understand why LA needs to be different here. It doesn't seem like there is any reason to "optimize" the setting of LclVarDsc StackOffset.

Hi, @BruceForstall The computation of the LclVarDsc StackOffset are mainly two steps:

  • lvaAssignVirtualFrameOffsetsToLocals is just setting an initial value (that is a media value only) and compute the compLclFrameSize;
  • During the lvaFixVirtualFrameOffsets, the LclVarDsc StackOffset will be determined one by one by the delta value. Here is just adding a value delta through a for-loop. After finished these, the LclVarDsc StackOffset is the last relative offset.

In fact, the two steps can be merged together by optimization liking this PR. During the lvaAssignVirtualFrameOffsetsToLocals by this PR, the LclVarDsc StackOffset is finished(here the offset is the real relative offset) while there is no need to add a delta within the lvaFixVirtualFrameOffsets by a for-loop. Besides, the OSR's stackOffset within the Compiler::generatePatchpointInfo() can also be omitted as the offsetAdjust is zero.

Hi, @BruceForstall
What should I add some supplementary instruction?
I think this PR also has reference value for other CPUs and if need I can also implment for ARM and RISCV.

@shushanhf shushanhf changed the title [LoongArch64] redesign the frame layout to avoid the redundant computation of the stackOffsets Redesign the frame layout to avoid the redundant computation of the stackOffsets Mar 13, 2024
@shushanhf
Copy link
Contributor Author

@BruceForstall @jakobbotsch
Now I will implement this PR to ARM64 and RISCV64.

@BruceForstall
Copy link
Member

@BruceForstall @jakobbotsch Now I will implement this PR to ARM64 and RISCV64.

Can you give us a little more time to think about this before spending more time on this?

@shushanhf
Copy link
Contributor Author

@BruceForstall @jakobbotsch Now I will implement this PR to ARM64 and RISCV64.

Can you give us a little more time to think about this before spending more time on this?

OK.
In fact, I had do more tests during my local runtime. I think this is OK for LA64, arm64 and riscv64.

@shushanhf
Copy link
Contributor Author

shushanhf commented Mar 19, 2024

Just modified the Compiler::lvaAssignVirtualFrameOffsetToArg within the UNIX_AMD64_ABI.
Later should modify the other Compiler::lvaAssignVirtualFrameOffsetToArg for the stack's arg-offset.

Besides, I should confirm whehter the TARGET_XARCH is always using the frame pointer ?

@shushanhf
Copy link
Contributor Author

@jakobbotsch Could you please review this PR?

Hi, @jakobbotsch
The riscv64 is ok within CI.
Now these failed are related sending Helix failed.
Could you review this PR ?

@jakobbotsch
Copy link
Member

The failures look related to this PR. For example:
https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-99278-merge-f1c2fd6bc6074f53a3/Regression_2/1/console.6de85eb2.log?helixlogtype=result

Assert failure(PID 23 [0x00000017], Thread: 23 [0x0017]): Assertion failed 'frameSize < -2040' in 'Runtime_66089:TestEntryPoint()' during 'Generate code' (IL size 441; hash 0x76f50dc2; FullOpts)

    File: /__w/1/s/src/coreclr/jit/codegenarm64.cpp:1227
    Image: /root/helix/work/correlation/corerun

Regardless, as we mentioned above, we are still not sure that we want to accept this PR. From what I see this PR makes it harder for us to implement future optimizations like #35274. Making some backends work differently to other backends also makes maintenance harder for us in the future. It is important for us to keep the backends as uniform as possible so that we can make changes that work the same way for all targets, without having to keep several different models in mind. What exactly is the justification for this change?

@shushanhf
Copy link
Contributor Author

shushanhf commented Apr 9, 2024

The failures look related to this PR. For example: https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-99278-merge-f1c2fd6bc6074f53a3/Regression_2/1/console.6de85eb2.log?helixlogtype=result

Assert failure(PID 23 [0x00000017], Thread: 23 [0x0017]): Assertion failed 'frameSize < -2040' in 'Runtime_66089:TestEntryPoint()' during 'Generate code' (IL size 441; hash 0x76f50dc2; FullOpts)

    File: /__w/1/s/src/coreclr/jit/codegenarm64.cpp:1227
    Image: /root/helix/work/correlation/corerun

I can fix this.

Regardless, as we mentioned above, we are still not sure that we want to accept this PR. From what I see this PR makes it harder for us to implement future optimizations like #35274.

(1) This PR is also to optimization.
(2) We can optimize the code base on this PR. And the future optimization is not only #35274.
The optimizations are one step by step. This PR is also the one step, I can and will push other optimization on future.
This PR and #35274 and future's optimization are not conflicting.

Making some backends work differently to other backends also makes maintenance harder for us in the future. It is important for us to keep the backends as uniform as possible so that we can make changes that work the same way for all targets, without having to keep several different models in mind.

This PR is including LoongArch64, RISCV64 and ARM64 and uniforms some places. I think this PR shared more codes than before.

What exactly is the justification for this change?

As this PR's description above,
The stackOffsets will be finished with the lvaAssignVirtualFrameOffsetsToLocals,
so there is no need to recompute the LclVarDsc's StackOffset within the lvaFixVirtualFrameOffsets.
Besides, the OSR's stackOffset within the Compiler::generatePatchpointInfo() can also be omitted as the offsetAdjust is zero.

Especially for ARM64, the CodeGen::genPushCalleeSavedRegisters will be significantly simplified.

#99278 (comment)

Of course, I can update the docs about the frame layout and if needed I can add some might next steps for future optimization.

@jakobbotsch
Copy link
Member

So this PR is primarily meant as a throughput optimization? Do you have numbers on how much this improves throughput?
If I had to guess this two pass computation of stack local offsets is very cheap already compared to everything that is happening. And as mentioned above, the two pass computation is a fundamental requirement for any platform where we might want to not have frame pointers.

@shushanhf
Copy link
Contributor Author

shushanhf commented Apr 9, 2024

So this PR is primarily meant as a throughput optimization? Do you have numbers on how much this improves throughput?

This PR optimize now's two steps to one step. I didn't rewrite this part and the optimization is very noticeable.
Of course, I can add some testing results by performance tests ?

And as mentioned above, the two pass computation is a fundamental requirement for any platform where we might want to not have frame pointers.

For the case which not used frame pointer, this PR is the first PR and just optimize two steps to one step because now the ARM64/LoongArch64/RISCV64 are always using FP.

For optimization of not used frame pointer for ARM64/LoongArch64/RISCV64 is the next step and that is not conlicting with this PR, and can also implement based on this PR by a seperate PR.

@jakobbotsch
Copy link
Member

Of course, I can add some testing results by performance tests ?

We have ASM diffs and throughput measurements automatically run for arm64 in CI. You can see them by looking at "runtime-coreclr superpmi-diffs", clicking "View more details on Azure Pipelines" and then opening the "Extensions" tab.
Currently this PR has extremely large regressions on arm64: https://dev.azure.com/dnceng-public/public/_build/results?buildId=635657&view=ms.vss-build-web.run-extensions-tab
The regressions are both in codegen and throughput.

Example codegen regression:
image

Throughput regression:
image

For the case which not used frame pointer, this PR is the first PR and just optimize two steps to one step because now the ARM64/LoongArch64/RISCV64 are always using FP.

As you say arm64 always uses FP today. #35274 is about optimizing it so that leaf methods do not use frame pointers. There are currently issues around EE suspension, but I know that @filipnavara has been looking into this. I also would expect LA64 and RISCV64 to be able to use the same optimization if we manage to solve the EE suspension issue. x64/x86 already implement this optimization.

I like the LOC diff of this change -- +678/-1913 -- but it is hard to say anything about that given that it regresses arm64 by a lot.

@shushanhf
Copy link
Contributor Author

shushanhf commented Apr 9, 2024

Of course, I can add some testing results by performance tests ?

We have ASM diffs and throughput measurements automatically run for arm64 in CI. You can see them by looking at "runtime-coreclr superpmi-diffs", clicking "View more details on Azure Pipelines" and then opening the "Extensions" tab. Currently this PR has extremely large regressions on arm64: https://dev.azure.com/dnceng-public/public/_build/results?buildId=635657&view=ms.vss-build-web.run-extensions-tab The regressions are both in codegen and throughput.

Example codegen regression: image

Throughput regression: image

For the case which not used frame pointer, this PR is the first PR and just optimize two steps to one step because now the ARM64/LoongArch64/RISCV64 are always using FP.

As you say arm64 always uses FP today. #35274 is about optimizing it so that leaf methods do not use frame pointers. There are currently issues around EE suspension, but I know that @filipnavara has been looking into this. I also would expect LA64 and RISCV64 to be able to use the same optimization if we manage to solve the EE suspension issue. x64/x86 already implement this optimization.

I like the LOC diff of this change -- +678/-1913 -- but it is hard to say anything about that given that it regresses arm64 by a lot.

If just for ARM64 within this PR, I agree that this PR will increase the Prolog/Epilog's size because this PR doesn't use liking the STP fp,lr, [SP, imm!], although this PR also significantly simplified the calleeSavedRegs's code within Codegen.
BTW, although this can optimize further. I agree with you that ARM64's optimization should be done by alone.

If you can accept this PR only for RISCV64 and LoongArch64 but not ARM64, I can revert for ARM64.
The optimization of not useing FP for LoongArch64 and RISCV64 is future optimization.
And the future optimization by other ways for ARM64/LA64/RISCV64 should be done by other PRs.
In fact, I also have some other ideas to further opitimizations after this PR.

@jakobbotsch
Copy link
Member

I will have to discuss with @BruceForstall what we think. But yes, it seems likely we won't be able to change arm64 as part of this PR, but again that makes LA64/RISCV64 different from ARM64 and introduces long term maintenance burden for us.

@shushanhf
Copy link
Contributor Author

shushanhf commented Apr 9, 2024

I will have to discuss with @BruceForstall what we think. But yes, it seems likely we won't be able to change arm64 as part of this PR,

OK.
First, I just push LA64/RISCV64 within this PR.

but again that makes LA64/RISCV64 different from ARM64 and introduces long term maintenance burden for us.

I think this is not conflicting with future. we can optimize step by step and we can uniform these in future and I believe we can.

@shushanhf shushanhf force-pushed the LA_frameLayout branch 2 times, most recently from 07586f8 to b812c5f Compare April 10, 2024 00:54
@shushanhf
Copy link
Contributor Author

shushanhf commented Apr 10, 2024

I will have to discuss with @BruceForstall what we think. But yes, it seems likely we won't be able to change arm64 as part of this PR,

OK. First, I just push LA64/RISCV64 within this PR.

but again that makes LA64/RISCV64 different from ARM64 and introduces long term maintenance burden for us.

I think this is not conflicting with future. we can optimize step by step and we can uniform these in future and I beleive we can.

Hi, @jakobbotsch
I'm very honor to be a contributor of dotnet and very happy learning so much from you and your team!
The advices from you and the dotnet team's are quite useful for me.
As we have a common purpose to make the dotnet better and better, I believe we can resolve different problem/issues step by step.

This PR is the first PR and I will push a series of PRs on future to amend related code for ARMARCH/XARCH/LA64/RISCV64, some ideas or PRs are not pushed within this PR.
That is this PR is not a lonely PR.
But don't worry, I will don't increase the ARM64's Prolog/Epilog's asm size and will not disturb the runtime-team to optimize the leaf functions's FP #35274, I can wait to modify ARM64 unitil you finish liking #35274. Of course it will be suite for the leaf function's optimization for not using the FP.
I will uniform most for ARMARCH/XARCH/LA64/RISCV64 to ease the long term burden for different implementations.
All these PRs will be pushed one by one and each one is a specific issue.

avoid the redundant computation of the stackOffsets.

After the `lvaAssignVirtualFrameOffsetsToLocals`, there is no need to recompute
the LclVarDsc's StackOffset within the `lvaFixVirtualFrameOffsets`.
@shushanhf
Copy link
Contributor Author

shushanhf commented Apr 12, 2024

I will have to discuss with @BruceForstall what we think. But yes, it seems likely we won't be able to change arm64 as part of this PR,

OK. First, I just push LA64/RISCV64 within this PR.

but again that makes LA64/RISCV64 different from ARM64 and introduces long term maintenance burden for us.

I think this is not conflicting with future. we can optimize step by step and we can uniform these in future and I beleive we can.

Hi, @jakobbotsch I'm very honor to be a contributor of dotnet and very happy learning so much from you and your team! The advices from you and the dotnet team's are quite useful for me. As we have a common purpose to make the dotnet better and better, I believe we can resolve different problem/issues step by step.

This PR is the first PR and I will push a series of PRs on future to amend related code for ARMARCH/XARCH/LA64/RISCV64, some ideas or PRs are not pushed within this PR. That is this PR is not a lonely PR. But don't worry, I will don't increase the ARM64's Prolog/Epilog's asm size and will not disturb the runtime-team to optimize the leaf functions's FP #35274, I can wait to modify ARM64 unitil you finish liking #35274. Of course it will be suite for the leaf function's optimization for not using the FP. I will uniform most for ARMARCH/XARCH/LA64/RISCV64 to ease the long term burden for different implementations. All these PRs will be pushed one by one and each one is a specific issue.

Hi, @jakobbotsch
Could you please review this PR and give me your advices ?
Thanks

@jakobbotsch
Copy link
Member

@shushanhf I need to establish with Bruce whether we want to go in this direction (Bruce and Kunal are generally the owners of the JIT backends, not me, even though I often review the LA64/RISCV64 PRs). Mainly we have to discuss the points highlighted above, where we're concerned about uniformity in the backends.
Unfortunately Bruce is out on a health related leave for a couple of weeks. I don't want to merge the PR without his approval, so this will have to wait until he's back and we can discuss it. I'm sorry about the delay.

@shushanhf
Copy link
Contributor Author

shushanhf commented Apr 12, 2024

@shushanhf I need to establish with Bruce whether we want to go in this direction (Bruce and Kunal are generally the owners of the JIT backends, not me, even though I often review the LA64/RISCV64 PRs). Mainly we have to discuss the points highlighted above, where we're concerned about uniformity in the backends. Unfortunately Bruce is out on a health related leave for a couple of weeks. I don't want to merge the PR without his approval, so this will have to wait until he's back and we can discuss it. I'm sorry about the delay.

OK, thanks.
What we are controversial is void Compiler::lvaFixVirtualFrameOffsets().
How about I don't modify this to optimization for LA64/RISCV64 within this PR, can you merge this PR?

I want to delay this PR's optimization to future.
I just ajust the frame Layout within this PR but not optimize the void Compiler::lvaFixVirtualFrameOffsets() where leave the delta is 0.
The optimization about avoiding the redundant computation of the stackOffsets within the void Compiler::lvaFixVirtualFrameOffsets(), we can discuss after Bruce back.

I want to merge this PR as you can accept way because I have some other PRs to push after this PR.
Maybe the best way is suspending judgment and continue other PRs.

@shushanhf
Copy link
Contributor Author

@shushanhf I need to establish with Bruce whether we want to go in this direction (Bruce and Kunal are generally the owners of the JIT backends, not me, even though I often review the LA64/RISCV64 PRs). Mainly we have to discuss the points highlighted above, where we're concerned about uniformity in the backends. Unfortunately Bruce is out on a health related leave for a couple of weeks. I don't want to merge the PR without his approval, so this will have to wait until he's back and we can discuss it. I'm sorry about the delay.

OK, thanks. What we are controversial is void Compiler::lvaFixVirtualFrameOffsets(). How about I don't modify this to optimization for LA64/RISCV64 within this PR, can you merge this PR?

I want to delay this PR's optimization to future. I just ajust the frame Layout within this PR but not optimize the void Compiler::lvaFixVirtualFrameOffsets() where leave the delta is 0. The optimization about avoiding the redundant computation of the stackOffsets within the void Compiler::lvaFixVirtualFrameOffsets(), we can discuss after Bruce back.

I want to merge this PR as you can accept way because I have some other PRs to push after this PR. Maybe the best way is suspending judgment and continue other PRs.

How about I keep this PR and create a new PR which I just modify the frame layout without modifying the redundant computation of the stackOffsets within the void Compiler::lvaFixVirtualFrameOffsets().

@jakobbotsch
Copy link
Member

jakobbotsch commented Apr 12, 2024

@shushanhf I need to establish with Bruce whether we want to go in this direction (Bruce and Kunal are generally the owners of the JIT backends, not me, even though I often review the LA64/RISCV64 PRs). Mainly we have to discuss the points highlighted above, where we're concerned about uniformity in the backends. Unfortunately Bruce is out on a health related leave for a couple of weeks. I don't want to merge the PR without his approval, so this will have to wait until he's back and we can discuss it. I'm sorry about the delay.

OK, thanks. What we are controversial is void Compiler::lvaFixVirtualFrameOffsets(). How about I don't modify this to optimization for LA64/RISCV64 within this PR, can you merge this PR?
I want to delay this PR's optimization to future. I just ajust the frame Layout within this PR but not optimize the void Compiler::lvaFixVirtualFrameOffsets() where leave the delta is 0. The optimization about avoiding the redundant computation of the stackOffsets within the void Compiler::lvaFixVirtualFrameOffsets(), we can discuss after Bruce back.
I want to merge this PR as you can accept way because I have some other PRs to push after this PR. Maybe the best way is suspending judgment and continue other PRs.

How about I keep this PR and create a new PR which I just modify the frame layout without modifying the redundant computation of the stackOffsets within the void Compiler::lvaFixVirtualFrameOffsets().

The title of this PR implies it is purely an optimization. But are you saying that actually this PR is fundamentally changing stack layout, and that's the change you want to make?

There are constraints on the exact expected frame layout from several sources I know of: unwinding, ETW on Windows, EnC support and pinvokes. I'm sure there are constraints from other sources I'm not familiar with. I would not be very comfortable taking fundamental changes to frame layout without doing a lot of due diligence.

For example: https://learn.microsoft.com/en-us/cpp/build/arm64-exception-handling?view=msvc-170#arm64-stack-frame-layout

@shushanhf
Copy link
Contributor Author

@shushanhf I need to establish with Bruce whether we want to go in this direction (Bruce and Kunal are generally the owners of the JIT backends, not me, even though I often review the LA64/RISCV64 PRs). Mainly we have to discuss the points highlighted above, where we're concerned about uniformity in the backends. Unfortunately Bruce is out on a health related leave for a couple of weeks. I don't want to merge the PR without his approval, so this will have to wait until he's back and we can discuss it. I'm sorry about the delay.

OK, thanks. What we are controversial is void Compiler::lvaFixVirtualFrameOffsets(). How about I don't modify this to optimization for LA64/RISCV64 within this PR, can you merge this PR?
I want to delay this PR's optimization to future. I just ajust the frame Layout within this PR but not optimize the void Compiler::lvaFixVirtualFrameOffsets() where leave the delta is 0. The optimization about avoiding the redundant computation of the stackOffsets within the void Compiler::lvaFixVirtualFrameOffsets(), we can discuss after Bruce back.
I want to merge this PR as you can accept way because I have some other PRs to push after this PR. Maybe the best way is suspending judgment and continue other PRs.

How about I keep this PR and create a new PR which I just modify the frame layout without modifying the redundant computation of the stackOffsets within the void Compiler::lvaFixVirtualFrameOffsets().

The title of this PR implies it is purely an optimization. But are you saying that actually this PR is fundamentally changing stack layout, and that's the change you want to make?

There are constraints on the exact expected frame layout from several sources I know of: unwinding, ETW on Windows, EnC support and pinvokes. I'm sure there are constraints from other sources I'm not familiar with. I would not be very comfortable taking fundamental changes to frame layout without doing a lot of due diligence.

OK.
Thanks, I think I just suspend this PR now until Bruce back.
The other thinks I create a new PR to discuss.

@JulieLeeMSFT JulieLeeMSFT added this to the 10.0.0 milestone Aug 12, 2024
@JulieLeeMSFT
Copy link
Member

We don't have time to work on this for .NET 9, so we will review it in .NET 10.
If you believe this should be done in .NET 9, please ping me. We will reconsider.

@shushanhf
Copy link
Contributor Author

I will close this PR.
If needed, I will do it based on next runtime release version.

@shushanhf shushanhf closed this Aug 13, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Sep 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants