-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linux ARM64 dotnet7 regression preview 4 to preview 5, 6 and 7 #72645
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
area-crossgen2-coreclr |
This might be a NativeAOT bug. @RobertHenry6bev any chance you could get a dump file for the crash? |
On ARM64 Linux ubuntu 22.04 with dotnet7 preview5, I set environment variables I set ulimit -c unlimited I am unable to attach a zip of the corefile, since that is too large. The zip has been transmitted to agocke out-of-band. |
I'm trying to build and run a dotnet runtime using gcc with the -fsanitize=thread (tsan), but I'm getting dynamic linking failures preventing me from making more progress |
Didn't realize we have this bug. Copying from #72831 that I'll close as a dup: Very easy to hit on a Raspberry Pi 4 on AArch64 Ubuntu 22.04. We also have a customer report for hitting this on a beefier Ampere device (there it requires running as The repro I just tried on Ubuntu:
The stack traces are all over the place:
I have a crashdump that I was looking at that has:
We've seen a bunch of apparent corruptions around |
I hit a gc assert when I tried to enable Frozen Segments for CoreCLR #49576 (comment) (doesn't repro when I disable gc regions) can it be related since NativeAOT also uses frozen segments? |
We couldn't enable regions yet in NativeAOT because of some bug. Possibly the bug you found. So we don't run with region in NativeAOT yet. We have a 7.0 issue to get them running. |
Filed #73110. There is another bug I've just found - it seems when you register a frozen segment you should not set |
This issue is still here with yesterday's preview release: 7.0.100-preview.7.22377.5 |
@MichalStrehovsky, is your team working on the bug fix for this issue? Who is the owner of this issue? It would be good to have an assignee. @RobertHenry6bev, as I explained in Teams chat, we try to resolve blocking-release issues before GA release. |
I raised it to Jeff today that we need an owner. There is another crossgen2 crash (without NativeAOT this time) on Windows ARM64: #71702. It's not clear if this is codegen, GC, VM, or what, or even whether they are related. I looked at a crashdump - it is a corruption. |
cc @mangod9. |
@RobertHenry6bev just checking whether you are able to get it to publish from an x64 host, as in you can always publish for |
I have not tried this suggestion. Given the kinds and locations of
machines we run this workload on, everything needs to work out of the box.
…On Thu, Aug 11, 2022 at 4:41 PM Manish Godse ***@***.***> wrote:
It is still blocking forward progress.
@RobertHenry6bev <https://github.com/RobertHenry6bev> just checking
whether you are able to get it to publish from an x64 host, as in you can
always publish for lin-arm64 from a x64 sdk (from any OS)?
—
Reply to this email directly, view it on GitHub
<#72645 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABBLLY6PCAU7KN2FORS6UV3VYWFT3ANCNFSM54JESG7Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I am able to repro it too, this stack consistently seems to be showing up on recent builds:
|
Lets us make dotnet#72645 a non-blocking-release issue. We also set NativeAotSupported to false for Mac on the line above. Crossgen2 will still ship NativeAOT compiled on x64 Linux and Windows. R2R+SingleFile+Trimmed elsewhere.
I can repro this reliable crash (reliably crashes in random spots) on ARM64 Windows as well, which is actually a good news because debugging on Windows is much better than on Linux (can do time travel debugging, etc.). I'm submitting #74211 to not compile crossgen2 with NativeAOT on arm64. I'll remove the blocking release label from this one. We have a bunch of other stress bugs on CoreCLR-hosted crossgen2 but it crashes a lot less often without NativeAOT and might be actually usable. I'd like to port that to RC1. |
Lets us make #72645 a non-blocking-release issue. We also set NativeAotSupported to false for Mac on the line above. Crossgen2 will still ship NativeAOT compiled on x64 Linux and Windows. R2R+SingleFile+Trimmed elsewhere.
Lets us make #72645 a non-blocking-release issue. We also set NativeAotSupported to false for Mac on the line above. Crossgen2 will still ship NativeAOT compiled on x64 Linux and Windows. R2R+SingleFile+Trimmed elsewhere.
Do we have random crashes on CoreClr+arm64 as well? |
Yeah we have seen regular crossgen2 crashes on arm64, but unfortunately dumps are not being captured. |
Lets us make #72645 a non-blocking-release issue. We also set NativeAotSupported to false for Mac on the line above. Crossgen2 will still ship NativeAOT compiled on x64 Linux and Windows. R2R+SingleFile+Trimmed elsewhere. Co-authored-by: Michal Strehovský <[email protected]>
What are the errors? |
Yes, you can hit an intermittent failure building the repo like the one below:
The issue is that we are using the non-fixed |
That's the one I'm hitting, but it's every time (HiSilicon D05 with Ubuntu 18.04)
Is there an easy workaround to force the build to use the live version? Do you know roughly when this will be fixed properly? |
Either that or disable using server GC ( |
I am not sure that will work. The choice of GC mode is a compile-time thing in NativeAOT. |
Guess we should revert #74617 till we update the repo to use RC1. |
I do not think reverting back and forth the way crossgen2 is published has any effect on the crossgen that is actually in use, which is from Prevew 7 Nuget package. Replacing the binaries in the nuget cache with a locally built copy might work, but is a fragile solution. (I am also annoyed by the failures, my solution is basically try again, sometimes a few times) |
Yes, it works:
|
Note that we actually need to check whether the host architecture is ARM64 instead of checking whether the target architecture is ARM64. If you target ARM64 on an x64 machine, you don't need that workaround. |
Ah, I see, SVR build is a superset of WKS, so it is possible to force crossgen2 into WKS mode. This would be a nice workaround! |
Yep, and you can disable it in the config file just for this binary (e.g., |
I don't think reverting that one has a connection to the crossgen2 crash while crossgenning ILC that Anton described (#72645 (comment)). That pull request doesn't affect the crossgen that is in use to crossgen ILC. It affects future crossgen in a future SDK. I've submitted #74856 to work around.
NativeAOT doesn't read runtimeconfig.json so this is likely just placebo effect. We were generating unused runtimeconfig and deps json files with NativeAOT in the past but that has been fixed. |
Thank you for the corrections. I have never actually tried that, just assumed that if we produce a file, it should probably work. :) Setting |
Description
When I attempt to build my application, there is an error coming from crossgen2. I get a different error with dotnet7 preview 5 vs preview 6. Both errors smell like SIGSEGV, and both smell of a corrupted heap. Perhaps other dotnet7 previews have this corruption, but it isn't noticed.
/home/msft/dotnet_all/dotnet7.0/sdk/7.0.100-preview.5.22307.18/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.CrossGen.targets(464,5): error : Process terminated. Access Violation: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. The application will be terminated since this platform does not support throwing an AccessViolationException. [/mnt/msft/dotnet_techempower/aspnet/benchmarks/src/BenchmarksApps/Kestrel/PlatformBenchmarks/PlatformBenchmarks.csproj]
Reproduction Steps
Linux jammy on ARM64, using dotnet7 preview 5 or 6
Running under gdb, the top few stack frame looks like:
Switching to Thread 0xfffff7ff36a0 (LWP 2813)]
0x0000aaaaaae57160 in ILCompiler_TypeSystem_Internal_NativeFormat_TypeHashingAlgorithms__ComputeNameHashCode ()
(gdb) where
#0 0x0000aaaaaae57160 in ILCompiler_TypeSystem_Internal_NativeFormat_TypeHashingAlgorithms__ComputeNameHashCode ()
#1 0x0000aaaaaae5b214 in ILCompiler_TypeSystem_Internal_TypeSystem_InstantiatedType__GetMethod ()
#2 0x0000aaaaaae757e4 in ILCompiler_TypeSystem_Internal_TypeSystem_Ecma_EcmaModule__ResolveMemberReference ()
Expected behavior
no internal errors
Actual behavior
heap corruption
Regression?
No response
Known Workarounds
No response
Configuration
linux (ubuntu jammy 22.04) arm64 dotnet7 preview 4, 5 and 6 (4 OK[sic])
The error's presence and location seem to move around depending on the number of cores available for the job. 3 cores seemed to tickle this problem.
Other information
No response
The text was updated successfully, but these errors were encountered: