Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: empty array enumerator opt #109237

Merged
merged 1 commit into from
Oct 28, 2024

Conversation

AndyAyersMS
Copy link
Member

For empty arrays, GetEnumerator returns a static instance rather than a new instance. This hinders the JIT's ability to optimize when it is able to stack allocate the new instance.

Detect when a stack allocation comes from an array GetEnumerator and fold the branch in the inlined enumerator method to always take the new instance path (since it is now cheap, allocation free, and is functionally equivalent).

Contributes to #108913.

For empty arrays, `GetEnumerator` returns a static instance rather
than a new instance. This hinders the JIT's ability to optimize when
it is able to stack allocate the new instance.

Detect when a stack allocation comes from an array `GetEnumerator`
and fold the branch in the inlined enumerator method to always take
the new instance path (since it is now cheap, allocation free, and
is functionally equivalent).

Contributes to dotnet#108913.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 25, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@AndyAyersMS
Copy link
Member Author

This is admittedly inelegant, but removes one of the 2 possible enumerator definitions (when the JIT knows early that the collection is array) and one of the 3 (if when the JIT learns later and/or uses GDV to determine the type).

@EgorBo PTAL
cc @dotnet/jit-contrib

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Oct 25, 2024

@EgorBot

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

public class Bench
{
    static int[] s_array = new int[512];
    [Benchmark]
    public int Test() 
    {
        IEnumerable<int> e = s_array;
        int sum = 0;
        foreach (int i in e) sum += i;
        return sum;
    }
}

BenchmarkDotNet v0.14.0, Ubuntu 24.04 LTS (Noble Numbat)
AMD EPYC 9R14, 1 CPU, 8 logical and 8 physical cores
.NET SDK 9.0.100-rc.2.24474.11
[Host] : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Job-PFTHSE : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Job-ZREUDE : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Method Toolchain Mean Error Ratio
Test /core_root_base/corerun 470.0 ns 1.83 ns 1.00
Test /core_root_diff/corerun 157.7 ns 0.01 ns 0.34

@AndyAyersMS
Copy link
Member Author

This builds on #109190 and #109182 and allows promotion and elimination of the array enumerator object in a few more cases.

The loop in the test above is still not quite in the canonical foreach shape (likely needs the non-lexical loop inversion from #107749).

       jmp      SHORT G_M36362_IG04
       align    [0 bytes for IG03]
						;; size=26 bbWeight=1 PerfScore 6.75
G_M36362_IG03:  ;; offset=0x0028
       mov      r8d, r8d
       add      eax, dword ptr [rcx+4*r8+0x10]
						;; size=8 bbWeight=512.00 PerfScore 1664.00
G_M36362_IG04:  ;; offset=0x0030
       inc      r8d
       cmp      r8d, edx
       jb       SHORT G_M36362_IG03

canonical shape is strength reduced, top entry, and down counted:

G_M13713_IG04:  ;; offset=0x0020
       add      eax, dword ptr [rcx]
       add      rcx, 4
       dec      edx
       jne      SHORT G_M13713_IG04

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess changes in C# impl of GetEnumerator may break this logic silently. I wonder if it's better to introduce some internal [Intrinsic] such as RuntimeHelpers.EscapeAnalysisEnabled => false and then do

length == 0 &&  !RuntimeHelpers.EscapeAnalysisEnabled  ? Empy : ..

@AndyAyersMS
Copy link
Member Author

I guess changes in C# impl of GetEnumerator may break this logic silently. I wonder if it's better to introduce some internal [Intrinsic] such as RuntimeHelpers.EscapeAnalysisEnabled => false and then do

length == 0 &&  !RuntimeHelpers.EscapeAnalysisEnabled  ? Empy : ..

Maybe, though I think these methods don't tend to change. We will know if this breaks at it will cause perf regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants