Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arm64] Planned JIT work in .NET 6 #43629

Closed
3 of 29 tasks
echesakov opened this issue Oct 20, 2020 · 2 comments
Closed
3 of 29 tasks

[Arm64] Planned JIT work in .NET 6 #43629

echesakov opened this issue Oct 20, 2020 · 2 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Bottom Up Work Not part of a theme, epic, or user story User Story A single user-facing feature. Can be grouped under an epic.
Milestone

Comments

@echesakov
Copy link
Contributor

echesakov commented Oct 20, 2020

Background

In .NET 5, the .NET team made a non-trivial effort to bring parity between Arm64 and X86 platforms support. As an example, we added 384 methods to System.Runtime.Intrinsics.Arm allowing our customers to use Advanced SIMD instructions on Arm64, optimized libraries code using these intrinsics, and made the Arm64 targeted performance improvements in the CodeGen.

In .NET 6 we will continue the effort. In particular, as a part of .NET 6 planning the JIT team identified the following items as our next short-term goals:

Conditional instructions/branch elimination

One of the examples of such code transformations can be found in LLVM that transforms cbz/cbnz/tbz/tbnz instructions into a conditional branch (b.cond). For example, you can compare the outputs of the latest clang compiling the C++ snippet

void TransformsIntoCondBr(int& op1, int& op2) {
    if (op1 & op2) {
        op1 = op2;
    } else {
        op2 = op1;
    }
}

with such optimization disabled
-O2 -mllvm -aarch64-enable-cond-br-tune=false

TransformsIntoCondBr(int&, int&):           // @TransformsIntoCondBr(int&, int&)
        ldr     w8, [x0]
        ldr     w9, [x1]
        and     w10, w9, w8
        cbz     w10, .LBB0_2
        str     w9, [x0]
        ret
.LBB0_2:
        str     w8, [x1]
        ret

and with the optimization enabled
-O2 -mllvm -aarch64-enable-cond-br-tune=true

TransformsIntoCondBr(int&, int&):           // @TransformsIntoCondBr(int&, int&)
        ldr     w8, [x0]
        ldr     w9, [x1]
        tst     w9, w8
        b.eq    .LBB0_2
        str     w9, [x0]
        ret
.LBB0_2:
        str     w8, [x1]
        ret

and w10, w9, w8; cbz w10, .LBB0_2 has been replaced with tst w9, w8; b.eq .LBB0_2 that freed w10 register.

The JIT team will research the optimization area and make decision on what optimizations can be implemented in .NET 6.

Some related issues:

Presumably, some parts of the analysis can be implemented in platform agnostic way and benefit both Arm64 and X86 platforms.

Next steps:

  • Identify the optimizations and estimate their potential impact
  • See what could be implemented in platform agnostic way and do this as a next step
  • Implement Arm64 specific optimizations

Hardware Intrinsics on Arm64

  1. We need to address the known inefficiencies/suboptimal code generation:
  1. Implementation of new APIs is also on the table. The following are some instances of the proposed work:

Atomic instructions

Currently, JIT emits ARMv8.1-LSE atomic instructions in the following cases:

Another potential work is to support ARMv8.4-LSE atomic instructions in the JIT.

Examples of Arm64 specific JIT backlog issues

Stretch goal

Note: For all the above peephole work items, there is a pre-requisite work-item that is needed to enable the codegen to update previously emitted instruction. There is no separate tracking issue for it, and one of the first optimization we do will have to do that infrastructure work first.

@dotnet/jit-contrib @TamarChristinaArm @tannergooding

category:planning
theme:planning
skill-level:expert
cost:large

@echesakov echesakov added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Team Epic labels Oct 20, 2020
@echesakov echesakov added this to the 6.0.0 milestone Oct 20, 2020
@echesakov echesakov self-assigned this Oct 20, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Oct 20, 2020
@echesakov echesakov removed the untriaged New issue has not been triaged by the area owner label Oct 20, 2020
@JulieLeeMSFT JulieLeeMSFT added User Story A single user-facing feature. Can be grouped under an epic. Bottom Up Work Not part of a theme, epic, or user story and removed Team Epic labels Nov 16, 2020
@JulieLeeMSFT
Copy link
Member

Performance improvemnet work on the TechEmpower Cached Queries benchmark:
#46970

@echesakov
Copy link
Contributor Author

echesakov commented Jul 8, 2021

Closing the epic as we are getting closer to .NET 6 feature complete date.

I opened #55364 and #55365 to track down future work for the following two sets of items specified here - "Conditional instructions/branch elimination" and "Peephole optimization opportunities".

I also made sure that all the hardware intrinsics work items mentioned here belong to Hardware Intrinsics GitHub project.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Bottom Up Work Not part of a theme, epic, or user story User Story A single user-facing feature. Can be grouped under an epic.
Projects
Archived in project
Development

No branches or pull requests

4 participants
@echesakov @Dotnet-GitSync-Bot @JulieLeeMSFT and others