Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regressions in System.Tests.Perf_Type #59704

Closed
performanceautofiler bot opened this issue Sep 28, 2021 · 12 comments
Closed

Regressions in System.Tests.Perf_Type #59704

performanceautofiler bot opened this issue Sep 28, 2021 · 12 comments
Assignees
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@performanceautofiler
Copy link

Run Information

Architecture x64
OS Windows 10.0.19042
Baseline 5181e1cd5b3fde9da15fe3f8d3eceaa9d3f64e01
Compare bfaa457c804efb9247f495b7e04444b2bf216308
Diff Diff

Regressions in System.Tests.Perf_Type

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
op_Equality - Duration of single invocation 1.38 ns 3.08 ns 2.24 0.13 True

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Tests.Perf_Type*'

Payloads

Baseline
Compare

Histogram

System.Tests.Perf_Type.op_Equality


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS Windows 10.0.19042
Baseline 5181e1cd5b3fde9da15fe3f8d3eceaa9d3f64e01
Compare bfaa457c804efb9247f495b7e04444b2bf216308
Diff Diff
@kunalspathak
Copy link
Member

Type equality is related to #59499

@kunalspathak
Copy link
Member

kunalspathak commented Sep 28, 2021

area of improvement: Align methods that are recursion - Ackermann benchmark? Double check if we set fgHasLoops flag when we convert a recursive method to loops

@kunalspathak kunalspathak changed the title [Perf] Changes at 9/23/2021 7:00:05 AM Regressions in System.Tests.Perf_Type Sep 28, 2021
@kunalspathak kunalspathak transferred this issue from dotnet/perf-autofiling-issues Sep 28, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Sep 28, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@kunalspathak kunalspathak added arch-x64 os-windows tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels Sep 28, 2021
@kunalspathak
Copy link
Member

Caused by #59499

@kunalspathak
Copy link
Member

Also - dotnet/perf-autofiling-issues#1577

@EgorBo
Copy link
Member

EgorBo commented Sep 28, 2021

I guess the native implementation could potentially benefit from Native PGO? since it's a question of nanoseconds (or maybe it's Managed PGO that made C# impl worse). As a quick fix we can mark operator== as AggressiveInlining if it worth it

cc @jkotas

Benchmark: https://github.com/dotnet/performance/blob/d7dac8a7ca12a28d099192f8a901cf8e30361384/src/benchmarks/micro/libraries/System.Runtime/Perf.Type.cs

operator == should exit at if (left is RuntimeType || right is RuntimeType) check

@jkotas
Copy link
Member

jkotas commented Sep 28, 2021

The regression is caused by several factors:

  • Tiered JIT overhead. Managed->managed calls go through an extra indirection. This extra indirection was not there for the FCall implementation. Not sure what we can do about it.
  • Unnecessary register shuffling. Current code:
mov     rax,rcx
mov     r8,qword ptr [rax]
mov rax,7FF9CBD49280h
cmp     r8,rax

This can be just:

mov     r8,7FF9CBD49280h
cmp     [rcx],r8

@EgorBo How hard would be to fix this in the JIT?

  • Extra jump due to duplicated epilogs. I will submit PR to fix that.

As a quick fix we can mark operator== as AggressiveInlining

I think it would break late recognition of the operator== intrinsic. (Also, the AggressiveInlining does not feel like the right tradeoff here.)

@jkotas jkotas added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 29, 2021
@ghost
Copy link

ghost commented Sep 29, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture x64
OS Windows 10.0.19042
Baseline 5181e1cd5b3fde9da15fe3f8d3eceaa9d3f64e01
Compare bfaa457c804efb9247f495b7e04444b2bf216308
Diff Diff

Regressions in System.Tests.Perf_Type

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
op_Equality - Duration of single invocation 1.38 ns 3.08 ns 2.24 0.13 True

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Tests.Perf_Type*'

Payloads

Baseline
Compare

Histogram

System.Tests.Perf_Type.op_Equality


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Run Information

Architecture x64
OS Windows 10.0.19042
Baseline 5181e1cd5b3fde9da15fe3f8d3eceaa9d3f64e01
Compare bfaa457c804efb9247f495b7e04444b2bf216308
Diff Diff
Author: performanceautofiler[bot]
Assignees: -
Labels:

os-windows, tenet-performance, tenet-performance-benchmarks, arch-x64, area-CodeGen-coreclr, untriaged

Milestone: -

@jkotas
Copy link
Member

jkotas commented Sep 29, 2021

The remaining actionable problem here is the unnecessary register shuffling from is RuntimeType check (see above).

@EgorBo
Copy link
Member

EgorBo commented Sep 29, 2021

@EgorBo How hard would be to fix this in the JIT?

@jkotas from a quick look I guess it's because of the way we inline cast/isinst - isinst in this case is only used for a bool expression but we save its result into a temp always.

STMT00006 (IL 0x003...  ???)
N009 ( 18, 22) [000040] -A---O------              *  JTRUE     void  
N008 ( 16, 20) [000010] JA---O?N----              \--*  EQ        int    $c2
N006 ( 13,  9) [000052] -A---O------                 +--*  COMMA     long   $102
N004 ( 10,  7) [000050] -A---O--R---                 |  +--*  ASG       long   $VN.Void
N003 (  3,  2) [000049] D------N----                 |  |  +--*  LCL_VAR   long   V03 cse0         d:1 $102
N002 (  6,  4) [000009] #----O?-----                 |  |  \--*  IND       long   $102
N001 (  3,  2) [000008] ------?-----                 |  |     \--*  LCL_VAR   ref    V02 tmp1         u:1 $80
N005 (  3,  2) [000051] ------------                 |  \--*  LCL_VAR   long   V03 cse0         u:1 $102
N007 (  2, 10) [000007] H-----?-----                 \--*  CNS_INT(h) long   0x7ffdc8eaf480 class $140

so in theory in this specific case there should not be any COMMA/ASG, just JTRUE(EQ(IND...

there was an issue for it already somewhere.

Simple repro:

using System;
using System.Runtime.CompilerServices;

class Program
{
    static void Main()
    {
        Test("hello");
    }

    [MethodImpl(MethodImplOptions.NoInlining | 
                MethodImplOptions.AggressiveOptimization)]
    public static int Test(object left)
    {
        if (left is null || left is string)
            return 0;
        return left.GetHashCode();
    }
}

@JulieLeeMSFT
Copy link
Member

Setting this to .NET 7. Please update it if you think it needs to be addressed in .NET 6.
Assigning to @EgorBo since inlining related work remains.

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Sep 30, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Sep 30, 2021
@ghost ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels Jun 13, 2022
@EgorBo
Copy link
Member

EgorBo commented Jul 10, 2022

Closing as the regression is no more:

image

@EgorBo EgorBo closed this as completed Jul 10, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants