Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: more general value class devirtualization #52210

Merged
merged 8 commits into from
May 6, 2021

Conversation

AndyAyersMS
Copy link
Member

When devirtualization knows that the this object is a boxed value class,
it now attempts to update the call to invoke the unboxed entry (when there
is one).

This extends an existing optimization that worked on calls where the box was
local and only fed the call. We now handle calls that dispatch on boxed value
classes more generally, even if their creation is not local or is local but
in a form the existing opt could not handle.

These new cases either come from failed local box removal opts (say multi-use
boxes) or from guarded devirtualization.

The "boxed" entry for value class methods is an un-inlineable VM stub. This
transformation effectively inlinines the stub and unblocks inlining of the
underlying method.

When devirtualization knows that the `this` object is a boxed value class,
it now attempts to update the call to invoke the unboxed entry (when there
is one).

This extends an existing optimization that worked on calls where the box was
local and only fed the call. We now handle calls that dispatch on boxed value
classes more generally,  even if their creation is not local or is local but
in a form the existng opt could not handle.

These new cases either come from failed local box removal opts (say multi-use
boxes) or from guarded devirtualization.

The "boxed" entry for value class methods is an un-inlineable VM stub. This
transformation effectively inlinines the stub and unblocks inlining of the
underlying method.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 3, 2021
@AndyAyersMS
Copy link
Member Author

@EgorBo PTAL
cc @dotnet/jit-contrib

The back-and-forth between devirtualization, unboxing opts, guarded devirtualization, and inlining is getting a bit convoluted. I will look into streamlining this but don't have any great ideas just yet.

Diffs for this change are tricky to assess, since it changes the jit interface traffic SPMI is not very useful, and since it's often enabled by PGO, PMI is also not very useful. There are some PMI diffs, mainly cases where the jit now invokes the unboxed entry point; generally this causes small code size increases as we're basically inlining the unboxing wrapper.

Total bytes of base: 50885075
Total bytes of diff: 50890874
Total bytes of delta: 5799 (0.01% of base)
    diff is a regression.

140 total methods with Code Size differences (14 improved, 126 regressed), 251096 unchanged.
Top file regressions (bytes):
        2342 : System.Private.Xml.dasm (0.07% of base)
        1033 : System.Data.Common.dasm (0.07% of base)
         775 : System.Data.OleDb.dasm (0.26% of base)
         470 : System.Data.Odbc.dasm (0.25% of base)
         185 : xunit.assert.dasm (0.13% of base)
         164 : System.Private.CoreLib.dasm (0.01% of base)
         132 : xunit.core.dasm (0.18% of base)
         132 : xunit.execution.dotnet.dasm (0.05% of base)
          78 : ILCompiler.Reflection.ReadyToRun.dasm (0.04% of base)
          76 : FSharp.Core.dasm (0.00% of base)
          73 : System.Net.Http.dasm (0.01% of base)
          71 : System.Numerics.Tensors.dasm (0.03% of base)
          64 : System.ComponentModel.TypeConverter.dasm (0.02% of base)
          59 : Newtonsoft.Json.dasm (0.01% of base)
          49 : CommandLine.dasm (0.01% of base)
          43 : Microsoft.CodeAnalysis.dasm (0.00% of base)
          30 : System.Speech.dasm (0.01% of base)
          23 : System.Configuration.ConfigurationManager.dasm (0.01% of base)

18 total files with Code Size differences (0 improved, 18 regressed), 252 unchanged.

Top method regressions (bytes):
         268 (24.75% of base) : System.Private.Xml.dasm - QilScopedVisitor:BeforeVisit(QilNode):this
         268 (24.75% of base) : System.Private.Xml.dasm - QilScopedVisitor:AfterVisit(QilNode):this
         173 (13.07% of base) : System.Private.Xml.dasm - XmlILVisitor:VisitStrConcat(QilStrConcat):QilNode:this
         119 ( 9.54% of base) : System.Data.Common.dasm - ADP:SelectAdapterRows(DataTable,bool):ref
         119 ( 8.65% of base) : System.Private.Xml.dasm - XmlILVisitor:Function(QilFunction):this
         114 (14.75% of base) : System.Data.Odbc.dasm - DbMetaDataFactory:CloneAndFilterCollection(String,ref):DataTable:this
         114 (14.75% of base) : System.Data.OleDb.dasm - DbMetaDataFactory:CloneAndFilterCollection(String,ref):DataTable:this
          93 (10.85% of base) : System.Data.Common.dasm - DataColumn:IsMaxLengthViolated():bool:this
          92 ( 9.50% of base) : System.Data.Common.dasm - DataColumn:CheckNotAllowNull():this
          88 ( 8.35% of base) : System.Data.Odbc.dasm - DbMetaDataFactory:GetParameterName(String,int):String:this
          88 ( 7.52% of base) : System.Data.OleDb.dasm - DbMetaDataFactory:GetParameterName(String,int):String:this
          84 (18.79% of base) : System.Data.Common.dasm - DataTable:EvaluateExpressions():this
          83 ( 6.57% of base) : System.Private.CoreLib.dasm - Tuple`2:ToString(StringBuilder):String:this (8 methods)
          81 (12.86% of base) : System.Private.CoreLib.dasm - Tuple`1:ToString(StringBuilder):String:this (7 methods)
          81 ( 8.48% of base) : System.Data.Odbc.dasm - OdbcMetaDataFactory:NewDataTableFromReader(IDataReader,byref,String):DataTable:this
          80 (18.48% of base) : System.Data.Common.dasm - DataColumn:CheckMaxLength():bool:this
          80 ( 4.31% of base) : System.Data.OleDb.dasm - OleDbDataReader:AppendSchemaUniqueIndexAsKey(Hashtable,ref):this
          80 ( 3.94% of base) : System.Data.OleDb.dasm - OleDbMetaDataFactory:PrepareCollection(String,ref,DbConnection):DataTable:this
          78 (12.94% of base) : ILCompiler.Reflection.ReadyToRun.dasm - GcInfo:UpdateTransitionCodeOffset(List`1):Dictionary`2:this
          78 (15.32% of base) : System.Data.Common.dasm - DataTable:Copy():DataTable:this

Top method improvements (bytes):
         -15 (-4.53% of base) : System.Data.Common.dasm - SqlStringStorage:ConvertXmlToObject(String):Object:this
         -12 (-3.85% of base) : System.Data.Common.dasm - SqlDateTimeStorage:ConvertXmlToObject(String):Object:this
         -12 (-3.81% of base) : System.Data.Common.dasm - SqlDoubleStorage:ConvertXmlToObject(String):Object:this
         -12 (-3.87% of base) : System.Data.Common.dasm - SqlInt16Storage:ConvertXmlToObject(String):Object:this
         -12 (-3.85% of base) : System.Data.Common.dasm - SqlInt64Storage:ConvertXmlToObject(String):Object:this
         -12 (-3.85% of base) : System.Data.Common.dasm - SqlMoneyStorage:ConvertXmlToObject(String):Object:this
         -12 (-3.85% of base) : System.Data.Common.dasm - SqlByteStorage:ConvertXmlToObject(String):Object:this
         -11 (-3.64% of base) : System.Data.Common.dasm - SqlInt32Storage:ConvertXmlToObject(String):Object:this
         -11 (-3.53% of base) : System.Data.Common.dasm - SqlSingleStorage:ConvertXmlToObject(String):Object:this
         -10 (-2.26% of base) : Microsoft.CodeAnalysis.dasm - SymbolDisplayExtensions:ToDisplayString(ImmutableArray`1):String
          -9 (-2.85% of base) : System.Data.Common.dasm - SqlDecimalStorage:ConvertXmlToObject(String):Object:this
          -6 (-1.77% of base) : System.Configuration.ConfigurationManager.dasm - ValidatorUtils:ValidateRangeImpl(ubyte,ubyte,ubyte,bool)
          -5 (-1.45% of base) : System.Configuration.ConfigurationManager.dasm - ValidatorUtils:ValidateRangeImpl(short,short,short,bool)
          -4 (-1.05% of base) : System.Data.Common.dasm - SqlCharsStorage:ConvertXmlToObject(String):Object:this

Top method regressions (percentages):
          31 (44.93% of base) : CommandLine.dasm - <>c__21`1:<ToDelimitedString>b__21_0(StringBuilder,double):StringBuilder:this
          65 (34.76% of base) : System.Private.Xml.dasm - QilTypeChecker:CheckFunctionList(QilList):XmlQueryType:this
          65 (34.76% of base) : System.Private.Xml.dasm - QilTypeChecker:CheckGlobalVariableList(QilList):XmlQueryType:this
          65 (34.76% of base) : System.Private.Xml.dasm - QilTypeChecker:CheckGlobalParameterList(QilList):XmlQueryType:this
          65 (34.76% of base) : System.Private.Xml.dasm - QilTypeChecker:CheckFormalParameterList(QilList):XmlQueryType:this
          65 (34.76% of base) : System.Private.Xml.dasm - QilTypeChecker:CheckSortKeyList(QilList):XmlQueryType:this
          63 (30.00% of base) : System.Private.Xml.dasm - XmlILOptimizerVisitor:HasNestedSequence(QilNode):bool:this
          63 (29.17% of base) : System.Private.Xml.dasm - XmlILOptimizerVisitor:AreLiteralArgs(QilNode):bool:this
         268 (24.75% of base) : System.Private.Xml.dasm - QilScopedVisitor:BeforeVisit(QilNode):this
         268 (24.75% of base) : System.Private.Xml.dasm - QilScopedVisitor:AfterVisit(QilNode):this
          65 (24.34% of base) : System.Private.Xml.dasm - TailCallAnalyzer:Analyze(QilExpression)
          84 (18.79% of base) : System.Data.Common.dasm - DataTable:EvaluateExpressions():this
          64 (18.60% of base) : System.Private.Xml.dasm - XmlILVisitor:VisitSequence(QilList):QilNode:this
          80 (18.48% of base) : System.Data.Common.dasm - DataColumn:CheckMaxLength():bool:this
          70 (17.46% of base) : System.Private.Xml.dasm - QilGenerator:CompileTextAvt(String):QilNode:this
          78 (15.32% of base) : System.Data.Common.dasm - DataTable:Copy():DataTable:this
         114 (14.75% of base) : System.Data.Odbc.dasm - DbMetaDataFactory:CloneAndFilterCollection(String,ref):DataTable:this
         114 (14.75% of base) : System.Data.OleDb.dasm - DbMetaDataFactory:CloneAndFilterCollection(String,ref):DataTable:this
          74 (13.31% of base) : System.Data.Common.dasm - DbDataReaderExtensions:GetColumnSchemaCompatibility(DbDataReader):ReadOnlyCollection`1
         173 (13.07% of base) : System.Private.Xml.dasm - XmlILVisitor:VisitStrConcat(QilStrConcat):QilNode:this

Top method improvements (percentages):
         -15 (-4.53% of base) : System.Data.Common.dasm - SqlStringStorage:ConvertXmlToObject(String):Object:this
         -12 (-3.87% of base) : System.Data.Common.dasm - SqlInt16Storage:ConvertXmlToObject(String):Object:this
         -12 (-3.85% of base) : System.Data.Common.dasm - SqlDateTimeStorage:ConvertXmlToObject(String):Object:this
         -12 (-3.85% of base) : System.Data.Common.dasm - SqlInt64Storage:ConvertXmlToObject(String):Object:this
         -12 (-3.85% of base) : System.Data.Common.dasm - SqlMoneyStorage:ConvertXmlToObject(String):Object:this
         -12 (-3.85% of base) : System.Data.Common.dasm - SqlByteStorage:ConvertXmlToObject(String):Object:this
         -12 (-3.81% of base) : System.Data.Common.dasm - SqlDoubleStorage:ConvertXmlToObject(String):Object:this
         -11 (-3.64% of base) : System.Data.Common.dasm - SqlInt32Storage:ConvertXmlToObject(String):Object:this
         -11 (-3.53% of base) : System.Data.Common.dasm - SqlSingleStorage:ConvertXmlToObject(String):Object:this
          -9 (-2.85% of base) : System.Data.Common.dasm - SqlDecimalStorage:ConvertXmlToObject(String):Object:this
         -10 (-2.26% of base) : Microsoft.CodeAnalysis.dasm - SymbolDisplayExtensions:ToDisplayString(ImmutableArray`1):String
          -6 (-1.77% of base) : System.Configuration.ConfigurationManager.dasm - ValidatorUtils:ValidateRangeImpl(ubyte,ubyte,ubyte,bool)
          -5 (-1.45% of base) : System.Configuration.ConfigurationManager.dasm - ValidatorUtils:ValidateRangeImpl(short,short,short,bool)
          -4 (-1.05% of base) : System.Data.Common.dasm - SqlCharsStorage:ConvertXmlToObject(String):Object:this

I also did a special SPMI collection over asp.net with this change; replaying that vs the baseline jit showed no diffs.

One simple example where this kicks in is the following method:

    public static int Sum(IEnumerable<int> data)
    {
        int r = 0;
        foreach (int x in data)
        {  
            r += x;
        }
        return r;
    }

when called with List<int> and run with PGO, the jit is now able to invoke the list's struct enumerator methods directly (under guard) and inline the simpler methods. MoveNext is nearly inlineable, and I will consider a follow-on change giving methods with GTF_CALL_M_UNBOXED an inlining boost, especially when the associated struct type is promotable.

@AndyAyersMS
Copy link
Member Author

Looks like something about the back and forth between boxed and unboxed entries is confusing crossgen2. Debugging.

@AndyAyersMS
Copy link
Member Author

The issue is that we invoke resolveVirtualMethod twice and expect the same result both times (RVM in the debug spew below). The first time is when we are considering guarded devirt -- we want to know if, given the class we're guessing for, we can figure out the method that will be invoked, and whether or not that method can be inlined.

Considering guarded devirtualization at IL offset 650 (0x28a)
Likely class for 400000000046E560 (System.ISpanFormattable) is 4000000000420048 (System.Int32) [likelihood:67 classes seen:3]
RVM-G: method=400000000046E558 objclass=4000000000420048 context=400000000046E561
interface call would invoke method TryFormat
Marking call [000311] as guarded devirtualization candidate; will guess for class System.Int32
    ... class is a value class, method is 400000000047FD88, looking for unboxed entry
    ... updating GDV candidate with info from unboxed entry 400000000047FD90
Saving stub addr 400000000046E580 in candidate info

That succeeds, and we subsequently do the guarded devirt transform. During that we call back into impDevirtulizeCall to finish the job of devirtualizing. This operates on more or less the same info:

Inline candidacy was established for unboxed entry 400000000047FD90. For devirt, swapping back to boxed entry 400000000047FD88

impDevirtualizeCall: Trying to devirtualize virtual call:
    class for 'this' is System.Int32 [exact] (attrib 20010010)
    base method is System.Int32::TryFormat
RVM: method=400000000047FD88 objclass=4000000000420048 context=400000000046E561
--- no derived method
    Class not final or exact, and method not final
No guarded devirt during late devirtualization
C:\repos\runtime0\src\coreclr\jit\indirectcalltransformer.cpp:779
Assertion failed '!call->IsVirtual()' 

So the second time around we pass in the derived method, not the base method; evidently crossgen2 doesn't handle that, and fails to devirtualize, and so we assert because we didn't expect it to fail.

Seems simple enough to also keep track of the base method here, and pass that back instead of the derived method.

@EgorBo
Copy link
Member

EgorBo commented May 4, 2021

Minimal repro for the CI failure (JIT, TC=0):

using System;
using System.Collections.Generic;

class Program
{
    public static void GetEnumerator_TypeProperties<T>()
    {
        var arraySegment = new ArraySegment<T>(new T[1], 0, 1);
        var ienumerableoft = (IEnumerable<T>)arraySegment;
        Console.WriteLine(ienumerableoft.GetEnumerator().GetType());
    }

    static void Main(string[] args)
    {
        GetEnumerator_TypeProperties<string>();
        Console.WriteLine("done");
    }
}

it prints:

System.ArraySegment`1+Enumerator[System.__Canon]

instead of:

System.ArraySegment`1+Enumerator[System.String]

@AndyAyersMS
Copy link
Member Author

Thanks @EgorBo. The issue is that we can't pass a "shared" MT as the generic context. Instead we'd have to pass the actual runtime method table (which is doable, but more complex). I'll just defer on that case for now.

The other surprise here is that devirtualization is enabled at Tier0. I'd been intending to experiment with this anyways as a way of cutting down on the volume of class profile data, but I guess we're already doing that.

@AndyAyersMS
Copy link
Member Author

To handle the more general case we'd need to spill the this tree out of the call and assign it to a temp, since we'll need two uses at the call; that is we want to transform from

m_boxed(this, ...)

to

t = this;
m_unboxed(t + <ptrsize>, [t], ...)

we might be able to do this with a comma form temp, but we'd need to be careful to make sure the comma was evaluated in the right order.

@AndyAyersMS
Copy link
Member Author

Still one last test failure to track down, only happens on arm32.

@AndyAyersMS
Copy link
Member Author

Still debugging the failure, but CSE is doing something quite odd:

STMT00002 (IL   ???...  ???)
N019 ( 53, 49)              [000015] --CXG--------             *  RETURN    ref    $185
N018 ( 52, 48) CSE #01 (use)[000025] --CXG--------             \--*  COMMA     ref    $2c1
N006 ( 25, 19)              [000012] --CXG--------                +--*  CALL      void   ILStubClass.IL_STUB_StoreTailCallArgs $VN.Void
N005 (  9,  8)              [000014] ------------- arg0 in r0     |  \--*  ADD       byref  $200
N003 (  7,  5)              [000011] -------------                |     +--*  BOX       ref    $1c0
N002 (  1,  1)              [000010] -------------                |     |  \--*  LCL_VAR   ref    V03 tmp1         u:1 (last use) $1c0
N004 (  1,  2)              [000013] -------------                |     \--*  CNS_INT   int    4 $42
N017 ( 27, 29) CSE #01 (def)[000024] --CXG--------                \--*  COMMA     ref    $2c1
N015 ( 24, 27)              [000017] --CXG--------                   +--*  CALL      void   System.Runtime.CompilerServices.RuntimeHelpers.DispatchTailCalls $VN.Void
N011 (  3,  3)              [000023] ------------- arg0 in r0        |  +--*  ADDR      int    $241
N010 (  3,  2)              [000022] ----G--N-----                   |  |  \--*  LCL_VAR   int   (AX) V05 ReturnAddress $280
N013 (  3,  3)              [000019] ------------- arg2 in r2        |  +--*  ADDR      int    $242
N012 (  3,  2)              [000018] ----G--N-----                   |  |  \--*  LCL_VAR   ref   (AX) V04 tmp2          $2c0
N014 (  2,  8)              [000021] H------------ arg1 in r1        |  \--*  CNS_INT(h) int    -0xB1D9887 ftn $141
N016 (  3,  2)              [000020] ----G--------                   \--*  LCL_VAR   ref   (AX) V04 tmp2          $2c1

I suppose it works out (?) but it seems completely unnecessary:

optValnumCSE morphed tree:
N023 ( 54, 50)              [000015] -ACXG--------             *  RETURN    ref    $185
N022 ( 53, 49)              [000036] -ACXG--------             \--*  COMMA     ref    $2c1
N020 ( 52, 48)              [000035] -ACXG--------                +--*  COMMA     void   $VN.Void
N006 ( 25, 19)              [000012] --CXG--------                |  +--*  CALL      void   ILStubClass.IL_STUB_StoreTailCallArgs $VN.Void
N005 (  9,  8)              [000014] ------------- arg0 in r0     |  |  \--*  ADD       byref  $200
N003 (  7,  5)              [000011] -------------                |  |     +--*  BOX       ref    $1c0
N002 (  1,  1)              [000010] -------------                |  |     |  \--*  LCL_VAR   ref    V03 tmp1         u:1 (last use) $1c0
N004 (  1,  2)              [000013] -------------                |  |     \--*  CNS_INT   int    4 $42
N019 ( 27, 29)              [000031] -ACXG---R----                |  \--*  ASG       ref    $VN.Void
N018 (  1,  1)              [000030] D------N-----                |     +--*  LCL_VAR   ref    V06 cse0         d:1 $2c1
N017 ( 27, 29)              [000024] --CXG--------                |     \--*  COMMA     ref    $2c1
N015 ( 24, 27)              [000017] --CXG--------                |        +--*  CALL      void   System.Runtime.CompilerServices.RuntimeHelpers.DispatchTailCalls $VN.Void
N011 (  3,  3)              [000023] ------------- arg0 in r0     |        |  +--*  ADDR      int    $241
N010 (  3,  2)              [000022] ----G--N-----                |        |  |  \--*  LCL_VAR   int   (AX) V05 ReturnAddress $280
N013 (  3,  3)              [000019] ------------- arg2 in r2     |        |  +--*  ADDR      int    $242
N012 (  3,  2)              [000018] ----G--N-----                |        |  |  \--*  LCL_VAR   ref   (AX) V04 tmp2          $2c0
N014 (  2,  8)              [000021] H------------ arg1 in r1     |        |  \--*  CNS_INT(h) int    -0xB1D9887 ftn $141
N016 (  3,  2)              [000020] ----G--------                |        \--*  LCL_VAR   ref   (AX) V04 tmp2          $2c1
N021 (  1,  1)              [000034] -------------                \--*  LCL_VAR   ref    V06 cse0         u:1 $2c1

@AndyAyersMS
Copy link
Member Author

The issue is that for explicit tail calls we have a side data structure (TailCallSiteInfo) -- that is getting out of sync, It needs to be updated if we switch the call to the unboxed entry, or helper based tail calls will not be handled properly.

It also tracks the kind of call being made, so it really should be updated generally when we devirtualize.

@AndyAyersMS
Copy link
Member Author

Above is the right idea but the ordering of things doesn't work out. In the importer we

  • get method sig and resolved token
  • devirtualize
  • set up tail call info

The "updates" done by devirtualization don't flow back to the caller, so the subsequent tail call setup still uses stale info.

Simplest thing at this point is just to forgo updating call target for explicit tail calls...

@AndyAyersMS
Copy link
Member Author

Installer build failure looks like a CI hiccup / timing issue. Will retry.

Downloading artifacts for build: 1123593
##[error]Artifact CoreCLRProduct___Linux_x64_release not found for build 1123593. Please ensure you have published artifacts in any previous phases of the current build.
Finishing: Download CoreCLR artifacts

@AndyAyersMS
Copy link
Member Author

@EgorBo all issues should be resolved, please take a look.

@AndyAyersMS
Copy link
Member Author

Looks like for some reason the publish step for the linux x64 release didn't publish anything. So rerunning the consuming task is not going to resolve anything. And there doesn't seem to be a way to rerun a task that claims to have passed.

So am going to ignore this failure.

@runfoapp runfoapp bot mentioned this pull request May 5, 2021
@EgorBo
Copy link
Member

EgorBo commented May 6, 2021

I like the fact it improves codegen even with TC=0, e.g.:

public static int Sum()
{
    IFoo o = new Foo {a = 42};
    Console.WriteLine(o);
    return o.GetV();
}

public interface IFoo
{
    int GetV();
}

public struct Foo : IFoo
{
    public int a;
    public int GetV() => 42;
}

Codegen diff: https://www.diffchecker.com/pMwW2QFI (GetV is inlined)

@EgorBo
Copy link
Member

EgorBo commented May 6, 2021

I played with your branch locally and it looks good as far as I can say, I'm going to take a closer look at impDevirtCall later when I'll be working on #50915 (comment) where basically we do nothing for:

CALLV MyMethod()
  \__ CALL COREINFO_NULLABLE_BOX_HELPER

@AndyAyersMS AndyAyersMS merged commit 33a3ba9 into dotnet:main May 6, 2021
AndyAyersMS added a commit that referenced this pull request May 7, 2021
#52210 requires class handle embedding that crossgen does not support.

Remove an assert as this is now an area where crossgen2 and crossgen diverge.

Resolves #52450.
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jun 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants