Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most recent MIBC optimization data contains a combo of block and edge counts crashing JIT in Crossgen2 #84446

Closed
trylek opened this issue Apr 6, 2023 · 5 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@trylek
Copy link
Member

trylek commented Apr 6, 2023

Platform: Windows x64 / arm

Configuration: Checked (the issue reproes neither in Debug nor in Release build mode)

Example run:

https://dev.azure.com/dnceng-public/public/_build/results?buildId=230101&view=logs&jobId=90c514f6-7aa0-5543-420a-962bd12368f6

Diagnostics (x64):

  Generating native image of System.Private.CoreLib for windows.x64.Checked. Logging to 
  D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\*.dll --targetarch:x64 --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\PDB
  D:\a\_work\1\s\.dotnet
  D:\a\_work\1\s\src\coreclr\jit\fgprofile.cpp:2614
  Assertion failed '!haveBlockCounts || !haveEdgeCounts' in 'System.RuntimeType:GetMethodBase(System.RuntimeType,System.RuntimeMethodHandleInternal):System.Reflection.MethodBase' during 'Profile incorporation' (IL size 480; hash 0xfa19acd9; FullOpts)
  
D:\a\_work\1\s\src\coreclr\crossgen-corelib.proj(106,5): error MSB3073: The command "D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\*.dll --targetarch:x64 --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\PDB" exited with code 57005.
##[error]src\coreclr\crossgen-corelib.proj(106,5): error MSB3073: (NETCORE_ENGINEERING_TELEMETRY=Build) The command "D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\*.dll --targetarch:x64 --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\PDB" exited with code 57005.

Interestingly enough, on arm there are more functions hitting this:

  Generating native image of System.Private.CoreLib for windows.arm.Checked. Logging to 
  D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\x64\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\*.dll --targetarch:arm --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\PDB
  D:\a\_work\1\s\.dotnet
  D:\a\_work\1\s\src\coreclr\jit\fgprofile.cpp:2614
  Assertion failed '!haveBlockCounts || !haveEdgeCounts' in 'System.RuntimeType:GetMethodBase(System.RuntimeType,System.RuntimeMethodHandleInternal):System.Reflection.MethodBase' during 'Profile incorporation' (IL size 480; hash 0xfa19acd9; FullOpts)
  
  D:\a\_work\1\s\src\coreclr\jit\fgprofile.cpp:2614
  Assertion failed '!haveBlockCounts || !haveEdgeCounts' in 'System.RuntimeType:GetField(System.String,int):System.Reflection.FieldInfo:this' during 'Profile incorporation' (IL size 222; hash 0xac73620c; FullOpts)
  
  D:\a\_work\1\s\src\coreclr\jit\fgprofile.cpp:2614
  Assertion failed '!haveBlockCounts || !haveEdgeCounts' in 'System.RuntimeType:GetNestedType(System.String,int):System.Type:this' during 'Profile incorporation' (IL size 116; hash 0x79875506; FullOpts)
  
D:\a\_work\1\s\src\coreclr\crossgen-corelib.proj(106,5): error MSB3073: The command "D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\x64\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\*.dll --targetarch:arm --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\PDB" exited with code -2147483645.
##[error]src\coreclr\crossgen-corelib.proj(106,5): error MSB3073: (NETCORE_ENGINEERING_TELEMETRY=Build) The command "D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\x64\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\*.dll --targetarch:arm --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\PDB" exited with code -2147483645.

Build FAILED.

According to discussion on the PR thread

#83624

that triggered this issue in lab testing, it seems likely that after a change made in January 2023 we're now aggregating optimization data that contains both flavors of information (block / edge counts). Based on @AndyAyersMS' advice I'm about to put up a PR disabling the assertion check

assert(!haveBlockCounts || !haveEdgeCounts);

(as the invariant apparently no longer holds) to unblock code flow from darc with a reference to this issue. Its purpose is to follow up on consolidation of the JIT and MIBC data collection logic to put them back in sync.

Thanks

Tomas

/cc @dotnet/jit-contrib, @dotnet/crossgen-contrib

@trylek trylek added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 6, 2023
@trylek trylek added this to the 8.0.0 milestone Apr 6, 2023
@ghost
Copy link

ghost commented Apr 6, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Platform: Windows x64 / arm

Example run:

https://dev.azure.com/dnceng-public/public/_build/results?buildId=230101&view=logs&jobId=90c514f6-7aa0-5543-420a-962bd12368f6

Diagnostics (x64):

  Generating native image of System.Private.CoreLib for windows.x64.Checked. Logging to 
  D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\*.dll --targetarch:x64 --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\PDB
  D:\a\_work\1\s\.dotnet
  D:\a\_work\1\s\src\coreclr\jit\fgprofile.cpp:2614
  Assertion failed '!haveBlockCounts || !haveEdgeCounts' in 'System.RuntimeType:GetMethodBase(System.RuntimeType,System.RuntimeMethodHandleInternal):System.Reflection.MethodBase' during 'Profile incorporation' (IL size 480; hash 0xfa19acd9; FullOpts)
  
D:\a\_work\1\s\src\coreclr\crossgen-corelib.proj(106,5): error MSB3073: The command "D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\*.dll --targetarch:x64 --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\PDB" exited with code 57005.
##[error]src\coreclr\crossgen-corelib.proj(106,5): error MSB3073: (NETCORE_ENGINEERING_TELEMETRY=Build) The command "D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\*.dll --targetarch:x64 --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\PDB" exited with code 57005.

Interestingly enough, on arm there are more functions hitting this:

  Generating native image of System.Private.CoreLib for windows.arm.Checked. Logging to 
  D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\x64\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\*.dll --targetarch:arm --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\PDB
  D:\a\_work\1\s\.dotnet
  D:\a\_work\1\s\src\coreclr\jit\fgprofile.cpp:2614
  Assertion failed '!haveBlockCounts || !haveEdgeCounts' in 'System.RuntimeType:GetMethodBase(System.RuntimeType,System.RuntimeMethodHandleInternal):System.Reflection.MethodBase' during 'Profile incorporation' (IL size 480; hash 0xfa19acd9; FullOpts)
  
  D:\a\_work\1\s\src\coreclr\jit\fgprofile.cpp:2614
  Assertion failed '!haveBlockCounts || !haveEdgeCounts' in 'System.RuntimeType:GetField(System.String,int):System.Reflection.FieldInfo:this' during 'Profile incorporation' (IL size 222; hash 0xac73620c; FullOpts)
  
  D:\a\_work\1\s\src\coreclr\jit\fgprofile.cpp:2614
  Assertion failed '!haveBlockCounts || !haveEdgeCounts' in 'System.RuntimeType:GetNestedType(System.String,int):System.Type:this' during 'Profile incorporation' (IL size 116; hash 0x79875506; FullOpts)
  
D:\a\_work\1\s\src\coreclr\crossgen-corelib.proj(106,5): error MSB3073: The command "D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\x64\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\*.dll --targetarch:arm --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\PDB" exited with code -2147483645.
##[error]src\coreclr\crossgen-corelib.proj(106,5): error MSB3073: (NETCORE_ENGINEERING_TELEMETRY=Build) The command "D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\x64\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\*.dll --targetarch:arm --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\PDB" exited with code -2147483645.

Build FAILED.

According to discussion on the PR thread

#83624

that triggered this issue in lab testing, it seems likely that after a change made in January 2023 we're now aggregating optimization data that contains both flavors of information (block / edge counts). Based on @AndyAyersMS' advice I'm about to put up a PR disabling the assertion check

assert(!haveBlockCounts || !haveEdgeCounts);

(as the invariant apparently no longer holds) to unblock code flow from darc with a reference to this issue. Its purpose is to follow up on consolidation of the JIT and MIBC data collection logic to put them back in sync.

Thanks

Tomas

/cc @dotnet/jit-contrib, @dotnet/crossgen-contrib

Author: trylek
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: 8.0.0

@JulieLeeMSFT
Copy link
Member

@AndyAyersMS PTAL.

trylek added a commit to trylek/runtime that referenced this issue Apr 6, 2023
The most recent aggregated runtime MIBC optimization data contains
a combination of block and edge counts, possibly after a change
from January 2023 that switched MIBC logic over from using block
counts to edge counts. The offending assertion check wasn't expecting
it and started crashing Crossgen2 during compilation of
System.Private.CoreLib on Windows x64 / arm. Based on Andy Ayers'
advice I'm proposing to comment out the assertion check; I have
created the tracking issue dotnet#84446 to follow up on consolidation of
JIT and MIBC production logic in this respect.

Thanks

Tomas
trylek added a commit that referenced this issue Apr 7, 2023
…84449)

The most recent aggregated runtime MIBC optimization data contains
a combination of block and edge counts, possibly after a change
from January 2023 that switched MIBC logic over from using block
counts to edge counts. The offending assertion check wasn't expecting
it and started crashing Crossgen2 during compilation of
System.Private.CoreLib on Windows x64 / arm. Based on Andy Ayers'
advice I'm proposing to comment out the assertion check; I have
created the tracking issue #84446 to follow up on consolidation of
JIT and MIBC production logic in this respect.

Thanks

Tomas
@AndyAyersMS
Copy link
Member

AndyAyersMS commented Apr 7, 2023

If you dump out the 5 input mibcs (say for windows x64) and look at the method that caused the assert, we see that in hello.mibc there are both edge and block counts at offset 18. This is not something the jit was expecting to see.

      "Method": "[S.P.CoreLib]System.RuntimeType.GetMethodBase(RuntimeType,RuntimeMethodHandleInternal)",
      "InstrumentationData": [
        {
          "ILOffset": 0,
          "InstrumentationKind": "BasicBlockIntCount",
          "Other": 0,
          "Data": 7225
        },
        {
          "ILOffset": 0,
          "InstrumentationKind": "NumRuns",
          "Other": 6
        },
        {
          "ILOffset": 8,
          "InstrumentationKind": "BasicBlockIntCount",
          "Other": 0,
          "Data": 0
        },
        {
          "ILOffset": 18,
          "InstrumentationKind": "BasicBlockIntCount",
          "Other": 0,
          "Data": 0
        },
        {
          "ILOffset": 18,
          "InstrumentationKind": "EdgeIntCount",
          "Other": 0,
          "Data": 0
        },

This can only happen by mixing old and new profiles. In fact most of the

I wonder if we did this deliberately to try and smooth out the profile updates or something. I will have to go drill into the process that generates these mibcs.

Worst case we can just tolerate this in the jit, though the fix is likely to prefer edge counts over block counts.

@trylek
Copy link
Member Author

trylek commented Apr 9, 2023

Thanks Andy for the detailed analysis. I must admit I know very little about the MIBC logic, basically just what I learned from @davidwrighton, I was under the impression that the training runs and MIBC production is owned by the perf team i.e. people around @DrewScoggins but I may be mistaken. For preferring edge counts over block counts, I guess that on the JIT side it should suffice to swap the conditional blocks around line 2618,

        if (haveBlockCounts)
        {
            dataIsGood = fgIncorporateBlockCounts();
        }
        else if (haveEdgeCounts)
        {
            dataIsGood = fgIncorporateEdgeCounts();
        }

but I agree that as a first step we need to understand whether the mixture of edge and block counts is intentional / expected (e.g. it's a natural consequence of some latencies in collecting the MIBC data or whatnot) or whether it's just a plain bug that should be fixed.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Apr 26, 2023
If there are both edge and block counts for a method, prefer to use the
edge counts (block counts are no longer the default, so are more likely
to be stale).

Sometimes we decide not to use count data because of correlation or
solver issues. When this happens, keep the class profile data viable
and let the code that deals with class profiles handle the possibility
of stale or mismatched data.

Addresses some aspects of dotnet#84446, though it's still not clear why we see
static profiles with both block and edge counts.
AndyAyersMS added a commit that referenced this issue Apr 27, 2023
…85406)

If there are both edge and block counts for a method, prefer to use the
edge counts (block counts are no longer the default, so are more likely
to be stale).

Sometimes we decide not to use count data because of correlation or
solver issues. When this happens, keep the class profile data viable
and let the code that deals with class profiles handle the possibility
of stale or mismatched data.

Addresses some aspects of #84446, though it's still not clear why we see
static profiles with both block and edge counts.
@AndyAyersMS
Copy link
Member

Most recent batch of profile data (via #85275) does not have this problem -- all the counters are edge based.

And the jit is now properly set up to deal with it if it recurs.

@ghost ghost locked as resolved and limited conversation to collaborators May 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

3 participants