Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure GC\\Features\\HeapExpansion\\plug\\plug.cmd #63854

Closed
VincentBu opened this issue Jan 17, 2022 · 13 comments
Closed

Test failure GC\\Features\\HeapExpansion\\plug\\plug.cmd #63854

VincentBu opened this issue Jan 17, 2022 · 13 comments
Labels
arch-arm64 area-GC-coreclr blocking-outerloop Blocking the 'runtime-coreclr outerloop' and 'runtime-libraries-coreclr outerloop' runs os-linux Linux OS (any supported distro) os-windows untriaged New issue has not been triaged by the area owner

Comments

@VincentBu
Copy link
Contributor

Run: runtime-coreclr outerloop 20220116.2

Failed test:

CoreCLR windows arm64 Checked no_tiered_compilation @ Windows.10.Arm64v8.Open
- GC\\Features\\HeapExpansion\\plug\\plug.cmd

CoreCLR Linux arm64 Checked @ (Ubuntu.1804.Arm64.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm64v8-20210531091519-97d8652
- GC/Scenarios/ServerModel/servermodel/servermodel.sh

CoreCLR Linux arm64 Checked no_tiered_compilation @ (Alpine.314.Arm64.Open)[email protected]/dotnet-buildtools/prereqs:alpine-3.14-helix-arm64v8-20210910135810-8a6f4f3
- GC/Scenarios/ServerModel/servermodel/servermodel.sh

Error messgae:

Return code:      1
Raw output file:      D:\h\w\BC070A55\w\AED00960\uploads\Reports\GC.Features\HeapExpansion\plug\plug.output.txt
Raw output:
BEGIN EXECUTION
"D:\h\w\BC070A55\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false"  plug.dll
Running 100 iterations
Expected: 100
Actual: -2147483645
END EXECUTION - FAILED
FAILED
Test Harness Exitcode is : 1
To run the test:

set CORE_ROOT=D:\h\w\BC070A55\p
D:\h\w\BC070A55\w\AED00960\e\GC\Features\HeapExpansion\plug\plug.cmd
Expected: True
Actual:   False


Stack trace
   at GC_Features._HeapExpansion_plug_plug_._HeapExpansion_plug_plug_cmd()

or

createdump: /__w/1/s/src/coreclr/pal/src/libunwind/src/dwarf/Gfind_proc_info-lsb.c:929: int _Uaarch64_dwarf_search_unwind_table(unw_addr_space_t, unw_word_t, unw_dyn_info_t *, unw_proc_info_t *, int, void *): Assertion `ip >= di->start_ip && ip < di->end_ip' failed.
/root/helix/work/workitem/e/GC/Scenarios/ServerModel/servermodel/servermodel.sh: line 418:  4134 Aborted                 (core dumped) $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"

Return code:      1
Raw output file:      /root/helix/work/workitem/uploads/Reports/GC.Scenarios/ServerModel/servermodel/servermodel.output.txt
Raw output:
BEGIN EXECUTION
/root/helix/work/correlation/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false servermodel.dll '/numrequests:100'
Using 361542648 as random seed
took 512 iteration to reach steady state
258 reqs/sec
306 reqs/sec
Expected: 100
Actual: 134
END EXECUTION - FAILED
Test Harness Exitcode is : 1
To run the test:

set CORE_ROOT=/root/helix/work/correlation
/root/helix/work/workitem/e/GC/Scenarios/ServerModel/servermodel/servermodel.sh
Expected: True
Actual:   False


Stack trace
   at GC_Scenarios._ServerModel_servermodel_servermodel_._ServerModel_servermodel_servermodel_sh()
@VincentBu VincentBu added arch-arm64 os-linux Linux OS (any supported distro) os-windows labels Jan 17, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added area-GC-coreclr untriaged New issue has not been triaged by the area owner labels Jan 17, 2022
@ghost
Copy link

ghost commented Jan 17, 2022

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Run: runtime-coreclr outerloop 20220116.2

Failed test:

CoreCLR windows arm64 Checked no_tiered_compilation @ Windows.10.Arm64v8.Open
- GC\\Features\\HeapExpansion\\plug\\plug.cmd

CoreCLR Linux arm64 Checked @ (Ubuntu.1804.Arm64.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm64v8-20210531091519-97d8652
- GC/Scenarios/ServerModel/servermodel/servermodel.sh

CoreCLR Linux arm64 Checked no_tiered_compilation @ (Alpine.314.Arm64.Open)[email protected]/dotnet-buildtools/prereqs:alpine-3.14-helix-arm64v8-20210910135810-8a6f4f3
- GC/Scenarios/ServerModel/servermodel/servermodel.sh

Error messgae:

Return code:      1
Raw output file:      D:\h\w\BC070A55\w\AED00960\uploads\Reports\GC.Features\HeapExpansion\plug\plug.output.txt
Raw output:
BEGIN EXECUTION
"D:\h\w\BC070A55\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false"  plug.dll
Running 100 iterations
Expected: 100
Actual: -2147483645
END EXECUTION - FAILED
FAILED
Test Harness Exitcode is : 1
To run the test:

set CORE_ROOT=D:\h\w\BC070A55\p
D:\h\w\BC070A55\w\AED00960\e\GC\Features\HeapExpansion\plug\plug.cmd
Expected: True
Actual:   False


Stack trace
   at GC_Features._HeapExpansion_plug_plug_._HeapExpansion_plug_plug_cmd()

or

createdump: /__w/1/s/src/coreclr/pal/src/libunwind/src/dwarf/Gfind_proc_info-lsb.c:929: int _Uaarch64_dwarf_search_unwind_table(unw_addr_space_t, unw_word_t, unw_dyn_info_t *, unw_proc_info_t *, int, void *): Assertion `ip >= di->start_ip && ip < di->end_ip' failed.
/root/helix/work/workitem/e/GC/Scenarios/ServerModel/servermodel/servermodel.sh: line 418:  4134 Aborted                 (core dumped) $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"

Return code:      1
Raw output file:      /root/helix/work/workitem/uploads/Reports/GC.Scenarios/ServerModel/servermodel/servermodel.output.txt
Raw output:
BEGIN EXECUTION
/root/helix/work/correlation/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false servermodel.dll '/numrequests:100'
Using 361542648 as random seed
took 512 iteration to reach steady state
258 reqs/sec
306 reqs/sec
Expected: 100
Actual: 134
END EXECUTION - FAILED
Test Harness Exitcode is : 1
To run the test:

set CORE_ROOT=/root/helix/work/correlation
/root/helix/work/workitem/e/GC/Scenarios/ServerModel/servermodel/servermodel.sh
Expected: True
Actual:   False


Stack trace
   at GC_Scenarios._ServerModel_servermodel_servermodel_._ServerModel_servermodel_servermodel_sh()
Author: VincentBu
Assignees: -
Labels:

arch-arm64, os-linux, os-windows, area-GC-coreclr, untriaged

Milestone: -

@janvorli
Copy link
Member

I was looking at the libunwind assert on Friday (using the dump from CI), there is a problem where the remote unwinding code doesn't recognize a managed frame as managed and tries to unwind one more time using the libunwind. That results in 0 instruction pointer and the next call to unwinder fails with that assert due to that. Might be a DACization issue, it is not clear yet.

As for the GC issue, here is the call stack at the point of failure, unfortunately all the locals from the interesting frames were optimized out by the compiler.

* thread #1, name = 'corerun', stop reason = signal SIGABRT
  * frame #0: 0x0000007fc500cce0 0x0000007f7938e4f8 libc.so.6`killpg(pgrp=0, sig=-989803256) at killpg.c:34
    frame #1: 0x0000007fc500cce0 0x0000007f7938e484 libc.so.6`__GI_raise(sig=6) at raise.c:46
    frame #2: 0x0000007fc500ce10 0x0000007f7938f8d4 libc.so.6`__GI_abort at abort.c:88
    frame #3: 0x0000007fc500cf50 0x0000007f7923aaa8 libcoreclr.so`::PROCAbort(signal=<unavailable>) at process.cpp:3455:5
    frame #4: 0x0000007fc500cf70 0x0000007f791d687c libcoreclr.so`sigtrap_handler(int, siginfo_t*, void*) [inlined] invoke_previous_action(action=<unavailable>, code=<unavailable>, siginfo=<unavailable>, context=<unavailable>, signalRestarts=<unavailable>) at signal.cpp:414:13
    frame #5: 0x0000007fc500cf70 0x0000007f791d6874 libcoreclr.so`sigtrap_handler(code=<unavailable>, siginfo=<unavailable>, context=<unavailable>) at signal.cpp:664
    frame #6: 0x0000007fc500cfb0 0x0000007f797936c0 linux-vdso.so.1
    frame #7: 0x0000007fc500e210 0x0000007f791d3d18 libcoreclr.so`::DebugBreak() at debug.cpp:406:9
    frame #8: 0x0000007fc500e300 0x0000007f78ffe388 libcoreclr.so`WKS::gc_heap::commit_mark_array_by_seg(WKS::heap_segment*, unsigned int*) [inlined] WKS::FATAL_GC_ERROR() at gcpriv.h:27:5
    frame #9: 0x0000007fc500e300 0x0000007f78ffe384 libcoreclr.so`WKS::gc_heap::commit_mark_array_by_seg(WKS::heap_segment*, unsigned int*) at gc.cpp:32627
    frame #10: 0x0000007fc500e300 0x0000007f78ffe344 libcoreclr.so`WKS::gc_heap::commit_mark_array_by_seg(WKS::heap_segment*, unsigned int*) [inlined] WKS::gc_heap::commit_mark_array_by_range(end=<unavailable>, mark_array_addr=<unavailable>) at gc.cpp:32744
    frame #11: 0x0000007fc500e300 0x0000007f78ffe2f4 libcoreclr.so`WKS::gc_heap::commit_mark_array_by_seg(seg=<unavailable>, mark_array_addr=<unavailable>) at gc.cpp:32789
    frame #12: 0x0000007fc500e330 0x0000007f78fe9bdc libcoreclr.so`WKS::gc_heap::commit_mark_array_bgc_init() at gc.cpp:32843:25
    frame #13: 0x0000007fc500e390 0x0000007f78fe919c libcoreclr.so`WKS::gc_heap::garbage_collect(n=<unavailable>) at gc.cpp:21911:48
    frame #14: 0x0000007fc500e440 0x0000007f78fd713c libcoreclr.so`WKS::GCHeap::GarbageCollectGeneration(this=<unavailable>, gen=0, reason=reason_alloc_soh) at gc.cpp:45097:9
    frame #15: 0x0000007fc500e4b0 0x0000007f78fd9dd4 libcoreclr.so`WKS::gc_heap::try_allocate_more_space(acontext=0x000000558266d708, size=2072, flags=0, gen_number=0) at gc.cpp:17222:21
    frame #16: 0x0000007fc500e520 0x0000007f79005db0 libcoreclr.so`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) [inlined] WKS::gc_heap::allocate_more_space(acontext=0x000000558266d708, size=<unavailable>, flags=0, alloc_generation_number=0) at gc.cpp:17693:18
    frame #17: 0x0000007fc500e520 0x0000007f79005d9c libcoreclr.so`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) at gc.cpp:17724
    frame #18: 0x0000007fc500e520 0x0000007f79005d5c libcoreclr.so`WKS::GCHeap::Alloc(this=0x0000005582646cf0, context=0x000000558266d708, size=2072, flags=0) at gc.cpp:44055
    frame #19: 0x0000007fc500e560 0x0000007f78e3d124 libcoreclr.so`Alloc(size=2072, flags=GC_ALLOC_NO_FLAGS) at gchelpers.cpp:226:48
    frame #20: 0x0000007fc500e5a0 0x0000007f78e3ba00 libcoreclr.so`AllocateSzArray(pArrayMT=0x0000007effffbaa8, cElements=2048, flags=GC_ALLOC_NO_FLAGS) at gchelpers.cpp:0
    frame #21: 0x0000007fc500e620 0x0000007f78e64950 libcoreclr.so`JIT_NewArr1(arrayMT=0x0000007effffbaa8, size=2048) at jithelpers.cpp:2635:16
    frame #22: 0x0000007fc500e820 0x0000007effcd52d4
    frame #23: 0x0000007fc500e830 0x0000007f7900f20c libcoreclr.so`CallDescrWorkerInternal at calldescrworkerarm64.S:71
    frame #24: 0x0000007fc500e890 0x0000007f78dc5150 libcoreclr.so`CallDescrWorkerWithHandler(pCallDescrData=0x0000007fc500ea70, fCriticalCall=<unavailable>) at callhelpers.cpp:67:5
    frame #25: 0x0000007fc500e8e0 0x0000007f78dc5afc libcoreclr.so`MethodDescCallSite::CallTargetWorker(this=0x0000007fc500ec60, pArguments=0x0000007fc500ebf8, pReturnValue=<unavailable>, cbReturnValue=<unavailable>) at callhelpers.cpp:538:9
    frame #26: 0x0000007fc500eb60 0x0000007f78c170ec libcoreclr.so`RunMain(MethodDesc*, short, int*, REF<PtrArray>*) [inlined] MethodDescCallSite::Call_RetArgSlot(this=0x0000007fc500ec60, pArguments=0x0000007fc500ebf8) at callhelpers.h:458:9
    frame #27: 0x0000007fc500eb60 0x0000007f78c1708c libcoreclr.so`RunMain(MethodDesc*, short, int*, REF<PtrArray>*) at assembly.cpp:1474
    frame #28: 0x0000007fc500eb60 0x0000007f78c16e9c libcoreclr.so`RunMain(MethodDesc*, short, int*, REF<PtrArray>*) [inlined] RunMain(this=<unavailable>, pParam=<unavailable>)::$_0::operator()(Param*) const::'lambda'(Param*)::operator()(Param*) const at assembly.cpp:1542
    frame #29: 0x0000007fc500eb60 0x0000007f78c16e9c libcoreclr.so`RunMain(MethodDesc*, short, int*, REF<PtrArray>*) at assembly.cpp:1544
    frame #30: 0x0000007fc500eb60 0x0000007f78c16db4 libcoreclr.so`RunMain(pFD=<unavailable>, numSkipArgs=<unavailable>, piRetVal=<unavailable>, stringArgs=<unavailable>) at assembly.cpp:1544
    frame #31: 0x0000007fc500edb0 0x0000007f78c174e4 libcoreclr.so`Assembly::ExecuteMainMethod(this=<unavailable>, stringArgs=0x0000007fc500f0c0, waitForOtherThreads=YES) at assembly.cpp:1660:18
    frame #32: 0x0000007fc500f070 0x0000007f78c609a4 libcoreclr.so`CorHost2::ExecuteAssembly(this=<unavailable>, dwAppDomainId=<unavailable>, pwzAssemblyPath=<unavailable>, argc=<unavailable>, argv=0x0000000000000000, pReturnValue=0x0000007fc500f2d4) at corhost.cpp:384:39
    frame #33: 0x0000007fc500f1a0 0x0000007f78bfb324 libcoreclr.so`::coreclr_execute_assembly(hostHandle=0x0000005582613c40, domainId=1, argc=0, argv=<unavailable>, managedAssemblyPath=0x00000055826157a0, exitCode=0x0000007fc500f2d4) at unixinterface.cpp:446:24
    frame #34: 0x0000007fc500f250 0x000000556c81d050 corerun`main [inlined] run(config=0x0000007fc500f328) at corerun.cpp:368:18
    frame #35: 0x0000007fc500f250 0x000000556c81bc20 corerun`main(argc=<unavailable>, argv=<unavailable>) at corerun.cpp:563
    frame #36: 0x0000007fc500ff90 0x0000007f7937c720 libc.so.6`__libc_start_main(main=0x0000000000000000, argc=0, argv=0x0000000000000000, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=<unavailable>) at libc-start.c:316
    frame #37: 0x0000007fc50100c0 0x000000556c81b034 corerun`_start + 52

@janvorli
Copy link
Member

@dotnet/gc I have found I can repro the issue reliably locally in a couple of runs of the System.Text.Json.Tests libraries tests on macOS arm64 (I've tried that in checked build so far).

@janvorli
Copy link
Member

It is reproducible in debug build as well, so all locals are preserved. I can see that at the crash time, the mark_array_addr starting at index markw contains multiple non-zero entries. I've dumped a bit of that memory below:

(lldb) x/64wx &mark_array_addr[markw]
0x314bd4140: 0x00000004 0x00000000 0x00000000 0x00000000
0x314bd4150: 0x00000000 0x00000000 0x00000000 0x20000000
0x314bd4160: 0x00000000 0x00000000 0x00000000 0x00000000
0x314bd4170: 0x00000000 0x00000000 0x00000000 0x04000000
0x314bd4180: 0x00000000 0x00000000 0x00000000 0x00000000
0x314bd4190: 0x00000000 0x00000000 0x00000000 0x00000000
0x314bd41a0: 0x00000000 0x00000000 0x00000000 0x00000000
0x314bd41b0: 0x00000000 0x00000000 0x00000000 0x00020000
0x314bd41c0: 0x00000000 0x00000000 0x00000000 0x00000000
0x314bd41d0: 0x00000000 0x00000000 0x00000000 0x00000000
0x314bd41e0: 0x00000000 0x00000000 0x00000000 0x00000000
0x314bd41f0: 0x00000000 0x00000000 0x00000000 0x00000100
0x314bd4200: 0x00000000 0x00000000 0x00000000 0x00000000
0x314bd4210: 0x00000000 0x00000000 0x00000000 0x00000000
0x314bd4220: 0x00000000 0x00000000 0x00000000 0x00000000
0x314bd4230: 0x00000000 0x00000000 0x00000000 0x00000000

@janvorli
Copy link
Member

@dotnet/gc I can share the dump from the debug build run.

@mangod9
Copy link
Member

mangod9 commented Jan 18, 2022

Hi @janvorli, could you please share the debug dump for this? Were you able to repro the failure consistently with regions enabled?

@VincentBu
Copy link
Contributor Author

Failed again in: runtime-coreclr outerloop 20220313.2

Failed test:

CoreCLR windows arm64 Checked @ Windows.10.Arm64v8.Open

- GC\\Features\\HeapExpansion\\plug\\plug.cmd

Error message:

Return code:      1
Raw output file:      D:\h\w\AD550949\w\96AA083D\uploads\Reports\GC.Features\HeapExpansion\plug\plug.output.txt
Raw output:
BEGIN EXECUTION
"D:\h\w\AD550949\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false"  plug.dll
Running 100 iterations
Expected: 100
Actual: -2147483645
END EXECUTION - FAILED
FAILED
Test Harness Exitcode is : 1
To run the test:

set CORE_ROOT=D:\h\w\AD550949\p
D:\h\w\AD550949\w\96AA083D\e\GC\Features\HeapExpansion\plug\plug.cmd
Expected: True
Actual:   False


Stack trace
   at GC_Features._HeapExpansion_plug_plug_._HeapExpansion_plug_plug_cmd()

@VincentBu VincentBu added the blocking-outerloop Blocking the 'runtime-coreclr outerloop' and 'runtime-libraries-coreclr outerloop' runs label Mar 14, 2022
@mangod9
Copy link
Member

mangod9 commented Mar 16, 2022

Should be fixed with this: #66696, will leave it open but hoping we dont see these failures in the next few days.

@mangod9
Copy link
Member

mangod9 commented Mar 21, 2022

Closing since this hasnt recurred since the fix.

@mangod9 mangod9 closed this as completed Mar 21, 2022
@JulieLeeMSFT
Copy link
Member

Seen today Link.

@mangod9
Copy link
Member

mangod9 commented Mar 21, 2022

@Maoni0 fyi.

@Maoni0
Copy link
Member

Maoni0 commented Mar 21, 2022

this was not run with my change. it's still this commit 17e435d6 which was from 7 days ago.
image
if you want to trigger a run with updated src you need to do /azp.

@mangod9
Copy link
Member

mangod9 commented Mar 22, 2022

Closing again.

@mangod9 mangod9 closed this as completed Mar 22, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Apr 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-GC-coreclr blocking-outerloop Blocking the 'runtime-coreclr outerloop' and 'runtime-libraries-coreclr outerloop' runs os-linux Linux OS (any supported distro) os-windows untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

5 participants