List Add is way slower (almost 3 times) in `net9.0 preview 3` than with `net8.0` #101437

linkdotnet · 2024-04-23T16:46:30Z

With the latest preview 3 (net9) there seems to be a major performance regression on MacOS 14.4 (M2 Pro) with lists. Given that they are one of the most used types, it should receive special treatment.

Benchmark

[Benchmark]
public List<int> ListAdd10000()
{
    var list = new List<int>();
    for (var i = 0; i < 10_000; i++)
    {
        list.Add(i);
    }

    return list;
}

[Benchmark]
public List<int> ListAdd10000PreAlloc()
{
    var list = new List<int>(10_000);
    for (var i = 0; i < 10_000; i++)
    {
        list.Add(i);
    }

    return list;
}

Results:

BenchmarkDotNet v0.13.12, macOS Sonoma 14.4.1 (23E224) [Darwin 23.4.0]
Apple M2 Pro, 1 CPU, 12 logical and 12 physical cores
.NET SDK 9.0.100-preview.3.24204.13
  [Host]   : .NET 9.0.0 (9.0.24.17209), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.2 (8.0.224.6711), Arm64 RyuJIT AdvSIMD
  .NET 9.0 : .NET 9.0.0 (9.0.24.17209), Arm64 RyuJIT AdvSIMD


| Method               | Runtime  | Mean      | Error     | StdDev    | Ratio | 
|--------------------- |--------- |----------:|----------:|----------:|------:|-
| ListAdd10000         | .NET 8.0 |  9.739 us | 0.0236 us | 0.0209 us |  1.00 | 
| ListAdd10000         | .NET 9.0 | 25.646 us | 0.1531 us | 0.1432 us |  2.63 | 
|                      |          |           |           |           |       | 
| ListAdd10000PreAlloc | .NET 8.0 |  6.701 us | 0.0175 us | 0.0146 us |  1.00 | 
| ListAdd10000PreAlloc | .NET 9.0 | 22.562 us | 0.1429 us | 0.1336 us |  3.37 |

Interestingly even the pre-allocated list is comparably slow.

`dotnet --info`

Output

dotnet --info .NET SDK: Version: 9.0.100-preview.3.24204.13 Commit: 81f61d8290 Workload version: 9.0.100-manifests.77bb7ba9 MSBuild version: 17.11.0-preview-24178-16+7ca3c98fa

Runtime Environment:
OS Name: Mac OS X
OS Version: 14.4
OS Platform: Darwin
RID: osx-arm64
Base Path: /usr/local/share/dotnet/sdk/9.0.100-preview.3.24204.13/

.NET workloads installed:
There are no installed workloads to display.

Host:
Version: 9.0.0-preview.3.24172.9
Architecture: arm64
Commit: 9e6ba1f

.NET SDKs installed:
6.0.417 [/usr/local/share/dotnet/sdk]
7.0.306 [/usr/local/share/dotnet/sdk]
7.0.404 [/usr/local/share/dotnet/sdk]
8.0.100 [/usr/local/share/dotnet/sdk]
8.0.101 [/usr/local/share/dotnet/sdk]
8.0.200 [/usr/local/share/dotnet/sdk]
9.0.100-preview.2.24121.2 [/usr/local/share/dotnet/sdk]
9.0.100-preview.2.24157.14 [/usr/local/share/dotnet/sdk]
9.0.100-preview.3.24204.13 [/usr/local/share/dotnet/sdk]

.NET runtimes installed:
Microsoft.AspNetCore.App 6.0.25 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 7.0.14 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 8.0.0 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 8.0.1 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 8.0.2 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 9.0.0-preview.2.24120.6 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 9.0.0-preview.2.24128.4 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 9.0.0-preview.3.24172.13 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 6.0.25 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 7.0.14 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 8.0.0 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 8.0.1 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 8.0.2 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 9.0.0-preview.2.24120.11 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 9.0.0-preview.2.24128.5 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 9.0.0-preview.3.24172.9 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]

Other architectures found:
x64 [/usr/local/share/dotnet/x64]

Environment variables:
Not set

global.json file:
Not found

Learn more:
https://aka.ms/dotnet/info

Download .NET:
https://aka.ms/dotnet/download

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2024-04-23T16:49:01Z

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

EgorBo · 2024-04-23T17:14:44Z

Doesn't repro for me on x64-windows:


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3447/23H2/2023Update/SunValley3)
AMD Ryzen 9 7950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.100-preview.3.24204.13
  [Host]   : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  .NET 8.0 : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  .NET 9.0 : .NET 9.0.0 (9.0.24.17209), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI


| Method               | Job      | Runtime  | Mean     | Error     | StdDev    |
|--------------------- |--------- |--------- |---------:|----------:|----------:|
| ListAdd10000         | .NET 8.0 | .NET 8.0 | 7.339 us | 0.0598 us | 0.0559 us |
| ListAdd10000PreAlloc | .NET 8.0 | .NET 8.0 | 5.002 us | 0.0988 us | 0.1508 us |
| ListAdd10000         | .NET 9.0 | .NET 9.0 | 7.472 us | 0.1401 us | 0.2597 us |
| ListAdd10000PreAlloc | .NET 9.0 | .NET 9.0 | 4.992 us | 0.0987 us | 0.1730 us |

don't have an macos-arm64 machine to test

vcsjones · 2024-04-23T17:24:53Z

@EgorBo I can repro on my M1.

BenchmarkDotNet v0.13.12, macOS Sonoma 14.4.1 (23E224) [Darwin 23.4.0]
Apple M1 Ultra, 1 CPU, 20 logical and 20 physical cores
.NET SDK 9.0.100-preview.3.24204.13
[Host] : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD
Job-OISOJI : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD
Job-YEXJFM : .NET 9.0.0 (9.0.24.17209), Arm64 RyuJIT AdvSIMD

Method	Runtime	Mean	Error	StdDev	Ratio
ListAdd10000	.NET 8.0	10.488 us	0.0102 us	0.0080 us	1.00
ListAdd10000	.NET 9.0	28.429 us	0.0501 us	0.0469 us	2.71

ListAdd10000PreAlloc	.NET 8.0	7.137 us	0.0147 us	0.0138 us	1.00
ListAdd10000PreAlloc	.NET 9.0	24.054 us	0.0877 us	0.0732 us	3.37

linkdotnet · 2024-04-23T17:34:09Z

The same applies to 9.0.100-preview.2.24157.14 as well - so the regression might be since preview.1 or preview.2.

EgorBo · 2024-04-23T17:34:30Z

Interesting! Reproduces for me on M2 Max as well (just found it 🙂):

BenchmarkDotNet v0.13.12, macOS Sonoma 14.4.1 (23E224) [Darwin 23.4.0]
Apple M2 Max, 1 CPU, 12 logical and 12 physical cores
.NET SDK 9.0.100-preview.3.24204.13
  [Host]   : .NET 8.0.4 (8.0.424.16909), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.4 (8.0.424.16909), Arm64 RyuJIT AdvSIMD
  .NET 9.0 : .NET 9.0.0 (9.0.24.17209), Arm64 RyuJIT AdvSIMD


| Method               | Job      | Runtime  | Mean      | Error     | StdDev    |
|--------------------- |--------- |--------- |----------:|----------:|----------:|
| ListAdd10000         | .NET 8.0 | .NET 8.0 |  9.633 us | 0.0906 us | 0.0848 us |
| ListAdd10000PreAlloc | .NET 8.0 | .NET 8.0 |  6.688 us | 0.0389 us | 0.0364 us |
| ListAdd10000         | .NET 9.0 | .NET 9.0 | 24.315 us | 0.1284 us | 0.1201 us |
| ListAdd10000PreAlloc | .NET 9.0 | .NET 9.0 | 21.207 us | 0.0854 us | 0.0757 us |

tannergooding · 2024-04-23T17:41:29Z

Didn't we change Arm64 to start using the newer Arm64 barrier intstructions for the GC if they existed around that point?

IIRC, @kunalspathak did some of that work.

tannergooding · 2024-04-23T17:47:46Z

Not finding the barrier PR I was remembering, but there was #97953 too

EgorBo · 2024-04-23T17:48:01Z

Doesn't repro on Linux-arm64

EgorBo · 2024-04-23T17:57:47Z

codegen diff: https://www.diffchecker.com/6tDgYQ8w/ - the only difference is ldp on net9 vs 2 ldr on net8. Hard to imagine it being a culprit 🤔

tannergooding · 2024-04-23T18:09:08Z

Could be something more subtle has changed and is impacting the general measurements or iteration count?

ldr and ldp are both supposed to be 3 cycle, 3 latency. But alternatively perhaps there's some weird data dependency or other issue here causing the delay?

I don't think Apple or Arm64 in general has anything like Intel VTune or AMD uProf, so seeing where the stalls are happening isn't as easy, unfortunately.

EgorBo · 2024-04-23T18:17:29Z

Could be something more subtle has changed and is impacting the general measurements or iteration count?

ldr and ldp are both supposed to be 3 cycle, 3 latency. But alternatively perhaps there's some weird data dependency or other issue here causing the delay?

I don't think Apple or Arm64 in general has anything like Intel VTune or AMD uProf, so seeing where the stalls are happening isn't as easy, unfortunately.

I've just checked it locally - I've built a runtime that exactly matches 9.0-preview3 and removed the Ldr->Ldp optimization - it has fixed the perf 😐

EgorBo · 2024-04-23T18:21:07Z

Here is the asm diff for ReplaceLdrStrWithPairInstr being there (left) and removed (right) on the same runtime: https://www.diffchecker.com/8C38Yoor/

EgorBo · 2024-04-23T18:56:06Z

After some brainstorming in our Community Discord:

The loop alignment is the same in both cases - we even tried to put an nop after ldp so the overall codegen's (and loop) size matches -- no change
The issue was introduced in JIT: Reorder indirs on arm64 to make them amenable to ldp optimization #92768 in .NET 9.0. That PR allows pre-existing ReplaceLdrStrWithPairInstr optimization to recognize more pairs of loads as ldp.
Linux-arm64 Ampere is not affected it seems (if I correctly measured) so for now it's macOS arm64 specific. Perhaps someone can try it on some other non-mac arm64 machine? The benchmark is:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;

[SimpleJob(runtimeMoniker: RuntimeMoniker.Net80)]
[SimpleJob(runtimeMoniker: RuntimeMoniker.Net90)]
public class MyClass
{
    static void Main(string[] args)
    {
        BenchmarkSwitcher.FromAssembly(typeof(MyClass).Assembly).Run(args);
    }

    [Benchmark]
    public List<int> ListAdd10000PreAlloc()
    {
        var list = new List<int>(10_000);
        for (var i = 0; i < 10_000; i++)
            list.Add(i);
        return list;
    }
}

(with <TargetFrameworks>net8.0;net9.0</TargetFrameworks> in the csproj)

tannergooding · 2024-04-23T18:59:40Z

It could potentially be related to behavioral changes that exist for LDP when LSE2 is supported.

In particular

If FEAT_LSE2 is implemented, LDP, LDNP, and STP instructions that load or store two 64-bit registers are single-copy
atomic when all of the following conditions are true:
• The overall memory access is aligned to 16 bytes.
• Accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memory.

If FEAT_LSE2 is implemented, LDP, LDNP, and STP instructions that access fewer than 16 bytes are single-copy
atomic when all of the following conditions are true:
• All bytes being accessed are within a 16-byte quantity aligned to 16 bytes.
• Accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memor

The actual operation (or a snippet of it) is then described loosely as:

if HaveLSE2Ext() && !signed then
    bits(2*datasize) full_data;
    full_data = Mem[address, 2*dbytes, accdesc, TRUE];
    if BigEndian(accdesc.acctype) then
        data2 = full_data<(datasize-1):0>;
        data1 = full_data<(2*datasize-1):datasize>;
    else
        data1 = full_data<(datasize-1):0>;
        data2 = full_data<(2*datasize-1):datasize>;
else
    data1 = Mem[address, dbytes, accdesc];
    data2 = Mem[address+dbytes, dbytes, accdesc];

So it may be beneficial to test this on a Linux machine with LSE2 to see if that makes a difference or not.

vcsjones · 2024-04-23T19:23:51Z

Graviton 3:

Method	Job	Runtime	Mean	Error	StdDev
ListAdd10000PreAlloc	.NET 8.0	.NET 8.0	23.84 us	0.422 us	0.414 us
ListAdd10000PreAlloc	.NET 9.0	.NET 9.0	23.75 us	0.270 us	0.252 us

To @tannergooding's point though, either the hardware or the kernel I am using do not support LSE2 based on dmesg output.

EgorBo · 2024-04-23T19:24:10Z

this change fixes the perf too. the loads we're looking at are _size and _version inside List<T> so we're likely dealing with some sort of stall here probably caused by LSE2 that Tanner mentioned

tannergooding · 2024-04-23T19:33:52Z

My guess, given the above, is then that since LSE2 makes ldp atomic for loads that exist within a 16-byte window (and we know these will be since the they'll be 8-byte aligned), that the write to _version is placing a stall against the read for _size, which wouldn't exist for two independent ldr.

EgorBo · 2024-04-23T19:39:24Z

I like that guess. Although, the fact that on Graviton 3 we're seeing 23 us for both cases (and I am seeing similar numbers on Ampere-linux-arm64) while x64 and apple m1 show 7us for the best case, might be hinting something else

EgorBo · 2024-04-23T19:54:16Z

a dummy field between _size and _version in List<> fixes the issue as well (since it breaks the ldp optimization too)

EgorBo · 2024-04-24T02:17:54Z

Minimal repro:

using System.Diagnostics;
using System.Runtime.CompilerServices;

public class MyClass
{
    static void Main()
    {
        var mc = new MyClass();
        Stopwatch sw = Stopwatch.StartNew();
        while (true)
        {
            sw.Restart();
            for (int i = 0; i < 10000; i++)
                mc.Test();
            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);
        }
    }

    int field1;
    int field2;

    [MethodImpl(MethodImplOptions.NoInlining)]
    public void Test()
    {
        for (int i = 0; i < 10000; i++)
        {
            field1++;
            field2++;
        }
    }
}

Codegen diff for Test(): https://www.diffchecker.com/72No1VJ6/
Left - Main, ~191 ms per iteration
Right - LDP optimization disabled, ~67 ms per iteration (~3x faster)

EgorBo · 2024-04-24T13:22:29Z

cc @a74nh @TamarChristinaArm Perhaps, you know why we could see such a terrible perf hit from ldp on Apple arm64 CPUs

EgorBo · 2024-04-24T16:35:10Z

The fun part that this benchmark is 3x faster under x64 emulation (Rosetta), e.g.:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using System.Runtime.Intrinsics;

BenchmarkRunner.Run<MyClass>();

[InProcess]
public class MyClass
{
    [Benchmark]
    public List<int> ListAdd10000PreAlloc()
    {
        var list = new List<int>(10_000);
        for (var i = 0; i < 10_000; i++)
            list.Add(i);
        return list;
    }
}

Then:

dotnet publish --sc -f net9.0 -r osx-x64 
dotnet publish --sc -f net9.0 -r osx-arm64

Native (arm64):

Apple M2 Max, 1 CPU, 12 logical and 12 physical cores
  [Host] : .NET 9.0.0 (9.0.24.17209), Arm64 RyuJIT AdvSIMD

| Method               | Mean     | Error    | StdDev   |
|--------------------- |---------:|---------:|---------:|
| ListAdd10000PreAlloc | 22.07 us | 0.258 us | 0.229 us |

Rosetta (x64) on the same hw

Apple M2 Max 2.40GHz, 1 CPU, 12 logical and 12 physical cores
  [Host] : .NET 9.0.0 (9.0.24.17209), X64 RyuJIT SSE4.2

| Method               | Mean     | Error     | StdDev    |
|--------------------- |---------:|----------:|----------:|
| ListAdd10000PreAlloc | 7.724 us | 0.0862 us | 0.0764 us |

linkdotnet · 2024-04-24T18:54:41Z

On an arm64 RPi 4B there is no regression as well:

BenchmarkDotNet v0.13.12, Debian GNU/Linux 12 (bookworm)
Unknown processor
.NET SDK 9.0.100-preview.3.24204.13
[Host] : .NET 9.0.0 (9.0.24.17209), Arm64 RyuJIT AdvSIMD
.NET 8.0 : .NET 8.0.4 (8.0.424.16909), Arm64 RyuJIT AdvSIMD
.NET 9.0 : .NET 9.0.0 (9.0.24.17209), Arm64 RyuJIT AdvSIMD

Method	Job	Runtime	Mean	Error	StdDev
ListAdd10000PreAlloc	.NET 8.0	.NET 8.0	74.73 us	0.164 us	0.128 us
ListAdd10000PreAlloc	.NET 9.0	.NET 9.0	64.74 us	0.468 us	0.415 us

EgorBo · 2024-04-27T17:42:28Z

So we have a good guess (discussed internally) what's going on here, but it's not clear how to detect bad patterns and disable ldr->ldp optimization. The PRs which added ldr->ldp optimization only have a very few improvement reports attached to them (except ldp for SIMD regs), so we can, probably, just conservatively disable it for Apple platforms (we've confirmed that this problem is Apple specific at this point).

I think the first thing we have to do is to add macOS-arm64 platform to our dotnet/performance runs (it's currently Linux and Windows only) before we make any changes.

marcan · 2024-04-28T00:40:21Z

Reminder that Linux on Apple Silicon is a thing, both virtualized and bare metal, and virtual Windows ARM64 on Apple Silicon is also a thing. Please do not gate this on macOS only.

Given the LSE2 theory, I think it makes sense to treat this as a performance issue potentially affecting more CPU platforms (especially future ones), given that there is a plausible explanation for the mechanism and why the code is, for at least some logical implementations, actually sub-optimal. If the gains of this peephole opt are otherwise minimal, and it's not easy to detect the problem code sequences (the ones where it triggers extra stalls) then I think the logical thing to do is just disable it across the board. Alternatively, if you think this is rare enough and the List.Add case is an outlier, then just work around it there (e.g. with the version change).

But please don't use OS platform as a gate for this, since that makes no sense given multiple OSes exist (you'd have to use CPU implementer instead, and I assume that's not practical given ahead-of-time compilation?).

neon-sunset · 2024-04-28T11:34:48Z

It looks like part of the discussion was marked off-topic.

Just wanted to re-iterate that this particular regression is still solvable by removing _version when users target Release altogether, and it might be a good opportunity to finally explore this option as described in #81523

linkdotnet · 2024-04-28T11:37:08Z

It looks like part of the discussion was marked off-topic.

Just wanted to re-iterate that this particular regression is still solvable by removing _version when users target Release altogether, and it might be a good opportunity to finally explore this option as described in #81523

Wouldn’t that only solve the problem for List?
Leaving an area for potential other regressions now and the future!?

linkdotnet · 2024-04-28T11:38:08Z

Don’t get me wrong - one can still lead this discussion but I do feel it should be handled separately (and not only because of performance characteristics)

EgorBo · 2024-04-28T22:35:37Z

Minimal repro for the Apple issue using Clang and inline asm: https://gist.github.com/EgorBo/88196a218559ec93a197a7d1d5600548

ldp makes this "benchmark" ~4-5x slower. An interesting thing is that if we have ldp and stp - it's still slow (we thought that the problem reproduces only when we mix str with ldp and stp with ldp would be fine)

mqudsi · 2024-04-29T16:27:48Z

Wouldn't just moving the _version write as @EgorBo demonstrated just be the least controversial change/fix here (fixes this particular performance regression, avoids arguing about the bigger questions involved with removing version checks altogether, keeps version checks in place even for List.Add())?

The performance regressions can be tackled by monitoring macOS benchmark results (the point about Apple Silicon != macOS is well taken, but also somewhat besides the point since Linux (OS) performance is tracked separately and monitoring macOS (OS + hw) performance would have caught this for both newer Macs (OS + hw) and Linux-on-Apple-silicon — you do not need "idealized" performance monitoring where only one factor changes between each monitored configuration) and figuring out where other notable regressions occur?

You probably don't want to blanket disable ldr to ldp optimizations, as this was a pathological case where there was a false data dependency on two adjacent fields but there likely are cases where compiler optimizations can take advantage of the single copy atomicity improvements to net performance gains.

(But I must confess I am confused why this would problem would not manifest on non-Apple silicon aarch64 w/ FEAT_LSE2 present, as reported earlier in this thread, unless the cpu wasn't taking advantage of the available instruction-level parallelism before so there is no regression now?)

Developer-Ecosystem-Engineering · 2024-04-29T21:14:03Z

We are reviewing this issue thanks for raising it!

Developer-Ecosystem-Engineering · 2024-05-10T18:41:25Z

The use of LDP instructions is generally encouraged. In this specific case, the structure of the loop (that includes a memory dependence between stores and loads), coupled with the use of LDP instructions prevents certain hardware optimizations from engaging. The small and tight nature of the loop also exacerbates the impact.

Please also see section 4.6.11 of the Apple Silicon CPU optimization guide.

…dr -> ldp Very targeted fix for dotnet#93401 and dotnet#101437: before reordering two indirections, check if there is a potential store in the same loop that looks like it could end up being a candidate for store-to-load forwarding into one of those indirections. Some hardware does not handle store-to-load forwarding with the same fidelity when `stp`/`ldp` is involved compared to multiple `str`/`ldr`. If we detect the situation then avoid doing the reordering.

linkdotnet added the tenet-performance Performance related issue label Apr 23, 2024

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 23, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 23, 2024

vcsjones added area-System.Collections and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 23, 2024

vcsjones added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Collections labels Apr 23, 2024

This comment was marked as off-topic.

Sign in to view

jkotas mentioned this issue Apr 27, 2024

[API Proposal]: Process.TryGetProcessById() #101582

Open

EgorBo self-assigned this Apr 27, 2024

EgorBo added this to the Future milestone Apr 27, 2024

EgorBo removed the untriaged New issue has not been triaged by the area owner label Apr 27, 2024

jakobbotsch mentioned this issue Jul 30, 2024

JIT: Check for potential store-to-load forwarding before reordering ldr -> ldp #105695

Merged

jakobbotsch assigned jakobbotsch and unassigned EgorBo Jul 30, 2024

jakobbotsch modified the milestones: Future, 9.0.0 Jul 30, 2024

jakobbotsch closed this as completed in #105695 Aug 1, 2024

jakobbotsch closed this as completed in 69c9bf8 Aug 1, 2024

dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Aug 1, 2024

github-actions bot locked and limited conversation to collaborators Sep 1, 2024

List Add is way slower (almost 3 times) in net9.0 preview 3 than with net8.0 #101437

List Add is way slower (almost 3 times) in net9.0 preview 3 than with net8.0 #101437

Comments

linkdotnet commented Apr 23, 2024

Benchmark

dotnet --info

dotnet-policy-service bot commented Apr 23, 2024

EgorBo commented Apr 23, 2024

vcsjones commented Apr 23, 2024

linkdotnet commented Apr 23, 2024

EgorBo commented Apr 23, 2024 • edited Loading

tannergooding commented Apr 23, 2024

tannergooding commented Apr 23, 2024

EgorBo commented Apr 23, 2024

EgorBo commented Apr 23, 2024

tannergooding commented Apr 23, 2024

EgorBo commented Apr 23, 2024

EgorBo commented Apr 23, 2024

EgorBo commented Apr 23, 2024 • edited Loading

tannergooding commented Apr 23, 2024

vcsjones commented Apr 23, 2024 • edited Loading

EgorBo commented Apr 23, 2024

tannergooding commented Apr 23, 2024

EgorBo commented Apr 23, 2024 • edited Loading

EgorBo commented Apr 23, 2024

EgorBo commented Apr 24, 2024 • edited Loading

EgorBo commented Apr 24, 2024 • edited Loading

EgorBo commented Apr 24, 2024

linkdotnet commented Apr 24, 2024 • edited Loading

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

EgorBo commented Apr 27, 2024 • edited Loading

marcan commented Apr 28, 2024 • edited Loading

neon-sunset commented Apr 28, 2024 • edited Loading

linkdotnet commented Apr 28, 2024

linkdotnet commented Apr 28, 2024

EgorBo commented Apr 28, 2024 • edited Loading

mqudsi commented Apr 29, 2024

Developer-Ecosystem-Engineering commented Apr 29, 2024

Developer-Ecosystem-Engineering commented May 10, 2024

List Add is way slower (almost 3 times) in `net9.0 preview 3` than with `net8.0` #101437

List Add is way slower (almost 3 times) in `net9.0 preview 3` than with `net8.0` #101437

`dotnet --info`

EgorBo commented Apr 23, 2024 •

edited

Loading

EgorBo commented Apr 23, 2024 •

edited

Loading

vcsjones commented Apr 23, 2024 •

edited

Loading

EgorBo commented Apr 23, 2024 •

edited

Loading

EgorBo commented Apr 24, 2024 •

edited

Loading

EgorBo commented Apr 24, 2024 •

edited

Loading

linkdotnet commented Apr 24, 2024 •

edited

Loading

EgorBo commented Apr 27, 2024 •

edited

Loading

marcan commented Apr 28, 2024 •

edited

Loading

neon-sunset commented Apr 28, 2024 •

edited

Loading

EgorBo commented Apr 28, 2024 •

edited

Loading