Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update how OSR and PGO interact #61453

Merged
merged 8 commits into from
Nov 13, 2021
Merged

Update how OSR and PGO interact #61453

merged 8 commits into from
Nov 13, 2021

Conversation

AndyAyersMS
Copy link
Member

When both OSR and PGO are enabled:

  • Enable instrumenting OSR methods, so that the combined profile data from
    Tier0 plus any OSR variants provide a full picture for subsequent Tier1
    optimization.
  • Use block profiles for both Tier0 methods that are likely to have patchpoints
    and OSR methods.

The updates on the runtime side are to pass BBINSTR to OSR methods, and to
handle the (typical) case where the OSR method instrumentation schema is a subset
of the Tier0 method schema.

We are still allowing OSR methods to read the profile data. So they are both
profile instrumented and profile optimized. Not clear if this is going to work
well as the Tier0 data will be incomplete and optimization quality may be poor.
Something to revisit down the road.

When both OSR and PGO are enabled:
* Enable instrumenting OSR methods, so that the combined profile data from
Tier0 plus any OSR variants provide a full picture for subsequent Tier1
optimization.
* Use block profiles for both Tier0 methods that are likely to have patchpoints
and OSR methods.

The updates on the runtime side are to pass BBINSTR to OSR methods, and to
handle the (typical) case where the OSR method instrumentation schema is a subset
of the Tier0 method schema.

We are still allowing OSR methods to read the profile data. So they are both
profile instrumented and profile optimized. Not clear if this is going to work
well as the Tier0 data will be incomplete and optimization quality may be poor.
Something to revisit down the road.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 11, 2021
@ghost
Copy link

ghost commented Nov 11, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

When both OSR and PGO are enabled:

  • Enable instrumenting OSR methods, so that the combined profile data from
    Tier0 plus any OSR variants provide a full picture for subsequent Tier1
    optimization.
  • Use block profiles for both Tier0 methods that are likely to have patchpoints
    and OSR methods.

The updates on the runtime side are to pass BBINSTR to OSR methods, and to
handle the (typical) case where the OSR method instrumentation schema is a subset
of the Tier0 method schema.

We are still allowing OSR methods to read the profile data. So they are both
profile instrumented and profile optimized. Not clear if this is going to work
well as the Tier0 data will be incomplete and optimization quality may be poor.
Something to revisit down the road.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

cc @davidwrighton @dotnet/jit-contrib

Smoke tested on a few simple examples. No SPMI diffs (which includes a fair number of Tier0 instrumentation instances). I am enabling some new experimental legs which I'll run here.

Example of an OSR method with instrumentation probes:

    public static int Looper(B b)
    {
        int[] a = new int[1000];
        a[555] = 1;

        int result = 0;

        for (int i = 0; i < 1000; i++)
        {
            for (int j = i; j < 1000; j++)
            {
                result += a[j];
            }
        }

        // b.F() will only GDV if OSR method has the right class profile
        return result - b.F();
    }
; Assembly listing for method Example:Looper(B):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; OSR variant for entry point 0x1d
; optimized code
; instrumented for collecting profile data
; rsp based frame
; fully interruptible
; with PGO: edge weights are valid, and fgCalledCount is 11
; Final local variable assignments
;
;  V00 arg0         [V00,T03] (  4,   2   )     ref  ->  rsi         class-hnd single-def
;  V01 loc0         [V01,T02] (  2,1816   )     ref  ->  rdx         class-hnd
;  V02 loc1         [V02,T01] (  3,1816   )     int  ->  rdi
;  V03 loc2         [V03,T04] (  4,   4   )     int  ->  rax
;  V04 loc3         [V04,T00] (  6,4541   )     int  ->  rcx
;  V05 OutArgs      [V05    ] (  1,   1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;* V06 tmp1         [V06    ] (  0,   0   )     ref  ->  zero-ref    single-def "class profile tmp"
;  V07 tmp2         [V07,T05] (  3,   0   )     ref  ->  rcx         single-def "argument with side effect"
;
; Lcl frame size = 40

G_M20617_IG01:              ;; offset=0000H
       57                   push     rdi
       56                   push     rsi
       4883EC28             sub      rsp, 40
       488BB424A0000000     mov      rsi, gword ptr [rsp+A0H]
       488B942488000000     mov      rdx, gword ptr [rsp+88H]
       8BBC2484000000       mov      edi, dword ptr [rsp+84H]
       8B842480000000       mov      eax, dword ptr [rsp+80H]
       8B4C247C             mov      ecx, dword ptr [rsp+7CH]
                                                ;; bbWeight=1    PerfScore 12.25
G_M20617_IG02:              ;; offset=0028H
       EB0E                 jmp      SHORT G_M20617_IG04
                                                ;; bbWeight=1    PerfScore 2.00
G_M20617_IG03:              ;; offset=002AH
       FF05DC483C00         inc      dword ptr [(reloc 0x7ffa0c561dcc)]
       8BC8                 mov      ecx, eax
       FF05DC483C00         inc      dword ptr [(reloc 0x7ffa0c561dd4)]
                                                ;; bbWeight=1    PerfScore 6.25
G_M20617_IG04:              ;; offset=0038H
       FF05D2483C00         inc      dword ptr [(reloc 0x7ffa0c561dd0)]
       3B4A08               cmp      ecx, dword ptr [rdx+8]
       7364                 jae      SHORT G_M20617_IG08
       448BC1               mov      r8d, ecx
       42037C8210           add      edi, dword ptr [rdx+4*r8+16]
       FFC1                 inc      ecx
       FF05C1483C00         inc      dword ptr [(reloc 0x7ffa0c561dd4)]
       81F9E8030000         cmp      ecx, 0x3E8
       7CDD                 jl       SHORT G_M20617_IG04
                                                ;; bbWeight=908    PerfScore 13393.00
G_M20617_IG05:              ;; offset=005BH
       FF05B7483C00         inc      dword ptr [(reloc 0x7ffa0c561dd8)]
       FFC0                 inc      eax
       FF05B3483C00         inc      dword ptr [(reloc 0x7ffa0c561ddc)]
       3DE8030000           cmp      eax, 0x3E8
       7CBA                 jl       SHORT G_M20617_IG03
                                                ;; bbWeight=1    PerfScore 7.50
G_M20617_IG06:              ;; offset=0070H
       FF05AA483C00         inc      dword ptr [(reloc 0x7ffa0c561de0)]
       488BCE               mov      rcx, rsi
       48BAE81D560CFA7F0000 mov      rdx, 0x7FFA0C561DE8
       E898A1685F           call     CORINFO_HELP_CLASSPROFILE32
       488BCE               mov      rcx, rsi
       488B01               mov      rax, qword ptr [rcx]
       488B4048             mov      rax, qword ptr [rax+72]
       FF5020               call     [rax+32]B:F():int:this
       8BD7                 mov      edx, edi
       2BD0                 sub      edx, eax
       8BC2                 mov      eax, edx
                                                ;; bbWeight=0    PerfScore 0.00
G_M20617_IG07:              ;; offset=009BH
       4883C428             add      rsp, 40
       5E                   pop      rsi
       5F                   pop      rdi
       4883C458             add      rsp, 88
       5D                   pop      rbp
       C3                   ret
                                                ;; bbWeight=0    PerfScore 0.00
G_M20617_IG08:              ;; offset=00A7H
       E8F438685F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3
                                                ;; bbWeight=0    PerfScore 0.00

; Total bytes of code 173, prolog size 40, PerfScore 13438.30, instruction count 45, allocated bytes for code 173 (MethodHash=9b03af76) for method Example:Looper(B):int

@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Nov 11, 2021
@AndyAyersMS
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

Addresses issues from #47942.

// Return true if schemaTable entries are a subset of the schema described by pByte, with matching entries in the same order.
// Also updates offset of the matching entries in schemaTable to those of the pByte schema.
//
inline bool ComparePgoSchemaCompatible(const uint8_t *pByte, size_t cbDataMax, ICorJitInfo::PgoInstrumentationSchema* schemaTable, size_t cSchemas)
Copy link
Member

@jakobbotsch jakobbotsch Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little surprised that this mutates the schema. Maybe rename it to something like SetOffsetsForSchemaSubsequence or similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about CheckIfPgoSchemaIsCompatibleAndSetOffsets?


auto handler = [schemaTable, &nSchema, &nMatched, &nUnmatched](const ICorJitInfo::PgoInstrumentationSchema& schema)
{
const size_t iSchemaAdj = nSchema - nUnmatched;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as nMatched?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And thinking a little bit more about it, that means this might index off the end of schemaTable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, yeah.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@AndyAyersMS
Copy link
Member Author

Handful of OSR+PGO tests failing with

Assert failure(PID 916 [0x00000394], Thread: 3056 [0x0bf0]): Assertion failed 'firstILBlock->hasProfileWeight()' in 'EHTest:f1(long,long,long,int,long):int' during 'Compute edge weights (1, false)' (IL size 150)

File: D:\a\_work\1\s\src\coreclr\jit\fgprofile.cpp Line: 3317

Will investigate.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Nov 12, 2021

Actually all the novel failures are with OSR+PGO+PartialCompilation.

Turns out to be related to the mid-try entry OSR step block transformation introduced in #59784 -- when we have that one cannot simply walk from fgFirstBB to the "true" entry by skipping all BBF_INTERNAL blocks.


auto handler = [schemaTable, &nSchema, &nMatched, &nUnmatched](const ICorJitInfo::PgoInstrumentationSchema& schema)
{
if ((schema.InstrumentationKind != schemaTable[nMatched].InstrumentationKind)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we still need to guard on nMatched < cSchemas before this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Reworked this some more.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

we think we might instrument and the number of things we end up instrumenting
can differ.

Also improve the DumpJittedMethod output for OSR, and allow selective dumping
of a particular OSR variant by specifying its IL offset.
@AndyAyersMS
Copy link
Member Author

Still ironing out PGO+PartialCompilation issues. Hopefully that's the last round of updates needed.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

jit-experimental failures are now down to just the "known" subset from osr_stress (happens because crossgen2 runs using the .NET installed for the build, which is 6.0RC2, and missing a key OSR fix).

So the newly added CI legs for PGO+OSR, etc are passing pri0 tests.

@AndyAyersMS
Copy link
Member Author

Seeing random-looking failures in runtime tests:

;; System.Net.Security.Tests linux x64

Process terminated. Assertion failed.
OpenSsl error queue is not empty, run: 'openssl errstr 1409442E' for original error.
   at Interop.OpenSsl.Decrypt(SafeSslHandle context, Span`1 buffer, SslErrorCode& errorCode) in /_/src/libraries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.OpenSsl.cs:line 449
   at System.Net.Security.SslStreamPal.DecryptMessage(SafeDeleteSslContext securityContext, Span`1 buffer, Int32& offset, Int32& count) in 
;; only output from runtime test on windows arm64

Console log: 'JIT.CodeGenBringUpTests' from job 18293551-446a-473e-a4ca-8caa62c0bdb6 workitem 74425fd5-3ef1-4bdc-8940-692e95a45685 (windows.10.arm64v8.open) executed on machine DDARM64-155

going to rerun...

@AndyAyersMS
Copy link
Member Author

@jakobbotsch I made some more changes, so can you take another look?

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AndyAyersMS AndyAyersMS merged commit 26a6f55 into dotnet:main Nov 13, 2021
@AndyAyersMS AndyAyersMS deleted the OSR+PGO branch November 13, 2021 19:17
@EgorBo
Copy link
Member

EgorBo commented Nov 16, 2021

Probably improved dotnet/perf-autofiling-issues#2376 on windows-x64

@AndyAyersMS
Copy link
Member Author

Possibly from changing the vm side schema to support a subset rather than an exact match?

MichalStrehovsky added a commit to MichalStrehovsky/runtime that referenced this pull request Nov 20, 2021
We started doing devir in unoptimized builds. I assume that was an unintended consequence of dotnet#61453.
AndyAyersMS pushed a commit that referenced this pull request Nov 20, 2021
We started doing devir in unoptimized builds. I assume that was an unintended consequence of #61453.
MichalStrehovsky added a commit to MichalStrehovsky/runtime that referenced this pull request Dec 9, 2021
We started doing devir in unoptimized builds. I assume that was an unintended consequence of dotnet#61453.
@ghost ghost locked as resolved and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants