-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Most recent MIBC optimization data contains a combo of block and edge counts crashing JIT in Crossgen2 #84446
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsPlatform: Windows x64 / arm Example run: Diagnostics (x64): Generating native image of System.Private.CoreLib for windows.x64.Checked. Logging to D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\*.dll --targetarch:x64 --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\PDB D:\a\_work\1\s\.dotnet D:\a\_work\1\s\src\coreclr\jit\fgprofile.cpp:2614 Assertion failed '!haveBlockCounts || !haveEdgeCounts' in 'System.RuntimeType:GetMethodBase(System.RuntimeType,System.RuntimeMethodHandleInternal):System.Reflection.MethodBase' during 'Profile incorporation' (IL size 480; hash 0xfa19acd9; FullOpts) D:\a\_work\1\s\src\coreclr\crossgen-corelib.proj(106,5): error MSB3073: The command "D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\*.dll --targetarch:x64 --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\PDB" exited with code 57005. ##[error]src\coreclr\crossgen-corelib.proj(106,5): error MSB3073: (NETCORE_ENGINEERING_TELEMETRY=Build) The command "D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\*.dll --targetarch:x64 --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Checked\PDB" exited with code 57005. Interestingly enough, on arm there are more functions hitting this: Generating native image of System.Private.CoreLib for windows.arm.Checked. Logging to D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\x64\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\*.dll --targetarch:arm --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\PDB D:\a\_work\1\s\.dotnet D:\a\_work\1\s\src\coreclr\jit\fgprofile.cpp:2614 Assertion failed '!haveBlockCounts || !haveEdgeCounts' in 'System.RuntimeType:GetMethodBase(System.RuntimeType,System.RuntimeMethodHandleInternal):System.Reflection.MethodBase' during 'Profile incorporation' (IL size 480; hash 0xfa19acd9; FullOpts) D:\a\_work\1\s\src\coreclr\jit\fgprofile.cpp:2614 Assertion failed '!haveBlockCounts || !haveEdgeCounts' in 'System.RuntimeType:GetField(System.String,int):System.Reflection.FieldInfo:this' during 'Profile incorporation' (IL size 222; hash 0xac73620c; FullOpts) D:\a\_work\1\s\src\coreclr\jit\fgprofile.cpp:2614 Assertion failed '!haveBlockCounts || !haveEdgeCounts' in 'System.RuntimeType:GetNestedType(System.String,int):System.Type:this' during 'Profile incorporation' (IL size 116; hash 0x79875506; FullOpts) D:\a\_work\1\s\src\coreclr\crossgen-corelib.proj(106,5): error MSB3073: The command "D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\x64\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\*.dll --targetarch:arm --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\PDB" exited with code -2147483645. ##[error]src\coreclr\crossgen-corelib.proj(106,5): error MSB3073: (NETCORE_ENGINEERING_TELEMETRY=Build) The command "D:\a\_work\1\s\dotnet.cmd D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\x64\crossgen2\crossgen2.dll -o:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\System.Private.CoreLib.dll -r:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\*.dll --targetarch:arm --targetos:windows -m:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\StandardOptimizationData.mibc --embed-pgo-data -O --verify-type-and-field-layout D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\IL\System.Private.CoreLib.dll --pdb --pdb-path:D:\a\_work\1\s\artifacts\bin\coreclr\windows.arm.Checked\PDB" exited with code -2147483645. Build FAILED. According to discussion on the PR thread that triggered this issue in lab testing, it seems likely that after a change made in January 2023 we're now aggregating optimization data that contains both flavors of information (block / edge counts). Based on @AndyAyersMS' advice I'm about to put up a PR disabling the assertion check runtime/src/coreclr/jit/fgprofile.cpp Line 2614 in 983ff47
(as the invariant apparently no longer holds) to unblock code flow from darc with a reference to this issue. Its purpose is to follow up on consolidation of the JIT and MIBC data collection logic to put them back in sync. Thanks Tomas /cc @dotnet/jit-contrib, @dotnet/crossgen-contrib
|
@AndyAyersMS PTAL. |
The most recent aggregated runtime MIBC optimization data contains a combination of block and edge counts, possibly after a change from January 2023 that switched MIBC logic over from using block counts to edge counts. The offending assertion check wasn't expecting it and started crashing Crossgen2 during compilation of System.Private.CoreLib on Windows x64 / arm. Based on Andy Ayers' advice I'm proposing to comment out the assertion check; I have created the tracking issue dotnet#84446 to follow up on consolidation of JIT and MIBC production logic in this respect. Thanks Tomas
…84449) The most recent aggregated runtime MIBC optimization data contains a combination of block and edge counts, possibly after a change from January 2023 that switched MIBC logic over from using block counts to edge counts. The offending assertion check wasn't expecting it and started crashing Crossgen2 during compilation of System.Private.CoreLib on Windows x64 / arm. Based on Andy Ayers' advice I'm proposing to comment out the assertion check; I have created the tracking issue #84446 to follow up on consolidation of JIT and MIBC production logic in this respect. Thanks Tomas
If you dump out the 5 input mibcs (say for windows x64) and look at the method that caused the assert, we see that in hello.mibc there are both edge and block counts at offset 18. This is not something the jit was expecting to see.
This can only happen by mixing old and new profiles. In fact most of the I wonder if we did this deliberately to try and smooth out the profile updates or something. I will have to go drill into the process that generates these mibcs. Worst case we can just tolerate this in the jit, though the fix is likely to prefer edge counts over block counts. |
Thanks Andy for the detailed analysis. I must admit I know very little about the MIBC logic, basically just what I learned from @davidwrighton, I was under the impression that the training runs and MIBC production is owned by the perf team i.e. people around @DrewScoggins but I may be mistaken. For preferring edge counts over block counts, I guess that on the JIT side it should suffice to swap the conditional blocks around line 2618,
but I agree that as a first step we need to understand whether the mixture of edge and block counts is intentional / expected (e.g. it's a natural consequence of some latencies in collecting the MIBC data or whatnot) or whether it's just a plain bug that should be fixed. |
If there are both edge and block counts for a method, prefer to use the edge counts (block counts are no longer the default, so are more likely to be stale). Sometimes we decide not to use count data because of correlation or solver issues. When this happens, keep the class profile data viable and let the code that deals with class profiles handle the possibility of stale or mismatched data. Addresses some aspects of dotnet#84446, though it's still not clear why we see static profiles with both block and edge counts.
…85406) If there are both edge and block counts for a method, prefer to use the edge counts (block counts are no longer the default, so are more likely to be stale). Sometimes we decide not to use count data because of correlation or solver issues. When this happens, keep the class profile data viable and let the code that deals with class profiles handle the possibility of stale or mismatched data. Addresses some aspects of #84446, though it's still not clear why we see static profiles with both block and edge counts.
Most recent batch of profile data (via #85275) does not have this problem -- all the counters are edge based. And the jit is now properly set up to deal with it if it recurs. |
Platform: Windows x64 / arm
Configuration: Checked (the issue reproes neither in Debug nor in Release build mode)
Example run:
https://dev.azure.com/dnceng-public/public/_build/results?buildId=230101&view=logs&jobId=90c514f6-7aa0-5543-420a-962bd12368f6
Diagnostics (x64):
Interestingly enough, on arm there are more functions hitting this:
According to discussion on the PR thread
#83624
that triggered this issue in lab testing, it seems likely that after a change made in January 2023 we're now aggregating optimization data that contains both flavors of information (block / edge counts). Based on @AndyAyersMS' advice I'm about to put up a PR disabling the assertion check
runtime/src/coreclr/jit/fgprofile.cpp
Line 2614 in 983ff47
(as the invariant apparently no longer holds) to unblock code flow from darc with a reference to this issue. Its purpose is to follow up on consolidation of the JIT and MIBC data collection logic to put them back in sync.
Thanks
Tomas
/cc @dotnet/jit-contrib, @dotnet/crossgen-contrib
The text was updated successfully, but these errors were encountered: