assert in ProcessWaitState on Linux arm64 #74795

wfurt · 2022-08-30T06:36:14Z

Disabled tests:

9/8 - Mono Linux arm64 in PR Disable System.Net.Requests.Tests on Mono Linux arm64 #75266 - see assert in ProcessWaitState on Linux arm64 #74795 (comment)
- 9/12 - Test re-enabled in PR load msquic early to avoid interference with RemoteExecutor #75424

Crash in System.Net.Requests.Tests Work Item - Last 7 days in Runfo:

9/8 - 142x failures out of which 115x are on arm64-Mono

Day	Run	Details
9/7	Rolling run 6968	Core Dump - net7.0-Linux-Release-arm64-NativeAOT_Release-(Ubuntu.1804.ArmArch.Open)Ubuntu.1804.ArmArch.Open
9/6	Rolling run 6173	2x Core Dump - net7.0-Linux-Release-arm64-Mono_release-(Ubuntu.1804.ArmArch.Open)Ubuntu.1804.ArmArch.Open & net7.0-Linux-Release-arm64-Mono_release-(Debian.11.Arm64.Open)Ubuntu.1804.Armarch.Open
9/4	Rolling run 4317	Code Rump - net7.0-Linux-Release-arm64-CoreCLR_release-(Ubuntu.2204.Arm64.Open)Ubuntu.1804.ArmArch.Open
8/30	Rolling run 1975580	2x Core Dump - net7.0-Linux-Release-arm64-Mono_release-(Ubuntu.1804.ArmArch.Open)Ubuntu.1804.ArmArch.Open & net7.0-Linux-Release-arm64-Mono_release-(Debian.11.Arm64.Open)Ubuntu.1804.Armarch.Open
8/29	Rolling run 1973318	Code Dump - net7.0-Linux-Release-arm64-Mono_release-(Ubuntu.1804.ArmArch.Open)Ubuntu.1804.ArmArch.Open
8/28	Rolling run 1972808	Core Dump - net7.0-Linux-Release-arm64-Mono_release-(Debian.11.Arm64.Open)Ubuntu.1804.Armarch.Open
8/28	Rolling run 1972465	Core Dump - net7.0-Linux-Release-arm64-Mono_release-(Ubuntu.1804.ArmArch.Open)Ubuntu.1804.ArmArch.Open
8/26	Rolling run 1970381	Core Dump - net7.0-Linux-Release-arm64-CoreCLR_release-(Ubuntu.2204.Arm64.Open)Ubuntu.1804.ArmArch.Open

Related to #69125.
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-aaacaf8e0a7f46c4ad/System.Net.Requests.Tests/1/console.1429bd54.log?%3Fhelixlogtype%3Dresult

/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Net.Requests.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Net.Requests.Tests (found 349 of 367 test cases)
  Starting:    System.Net.Requests.Tests (parallel test collections = on, max threads = 4)
Process terminated. Error while reaping child. errno = 10
   at System.Environment.FailFast(System.String)
   at System.Diagnostics.ProcessWaitState.TryReapChild(Boolean)
   at System.Diagnostics.ProcessWaitState.CheckChildren(Boolean, Boolean)
   at System.Diagnostics.Process.OnSigChild(Int32, Int32)
./RunTests.sh: line 168:    21 Aborted                 (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Net.Requests.Tests.runtimeconfig.json --depsfile System.Net.Requests.Tests.deps.json xunit.console.dll System.Net.Requests.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE

dump: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-aaacaf8e0a7f46c4ad/System.Net.Requests.Tests/1/core.1001.21

errno 10 -> ENOCHILD

> clrstack -a
OS Thread Id: 0x1c (0)
        Child SP               IP Call Site
0000007EBADC5460 0000007fa357f200 [HelperMethodFrame_1OBJ: 0000007ebadc5460] System.Environment.FailFast(System.String)
0000007EBADC55E0 0000007F64861C34 System.Diagnostics.ProcessWaitState.TryReapChild(Boolean) [/_/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessWaitState.Unix.cs @ 589]
    PARAMETERS:
        this (0x0000007EBADC5630) = 0x000000156f599f80
        configureConsole (0x0000007EBADC562C) = 0x0000000000000001
    LOCALS:
        0x0000007EBADC5620 = 0x000000156f599fc8
        0x0000007EBADC5618 = 0x0000000000000001
        0x0000007EBADC5610 = 0x0000000060230000
        0x0000007EBADC560C = 0x00000000ffffffff
        0x0000007EBADC5608 = 0x0000000000000001
        0x0000007EBADC5600 = 0x000000000000000a

0000007EBADC5640 0000007F648610C8 System.Diagnostics.ProcessWaitState.CheckChildren(Boolean, Boolean) [/_/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessWaitState.Unix.cs @ 614]
    PARAMETERS:
        reapAll (0x0000007EBADC5724) = 0x0000000000000000
        configureConsole (0x0000007EBADC5720) = 0x0000000000000001
    LOCALS:
        0x0000007EBADC5718 = 0x000000156f42a280
        0x0000007EBADC5710 = 0x0000000000000001
        0x0000007EBADC570C = 0x0000000000000000
        0x0000007EBADC5708 = 0x000000000000003c
        0x0000007EBADC5700 = 0x000000156f599f80
        0x0000007EBADC56F8 = 0x0000000000000000
        0x0000007EBADC56F0 = 0x0000000000000000
        0x0000007EBADC56E8 = 0x0000000000000000
        0x0000007EBADC56C0 = 0x0000000000000000
        0x0000007EBADC56B0 = 0x0000000000000000
        0x0000007EBADC56A8 = 0x0000000000000000
        0x0000007EBADC5690 = 0x0000000000000000
        0x0000007EBADC5688 = 0x0000000000000000
        0x0000007EBADC5680 = 0x0000000000000000

0000007EBADC5730 0000007F64860B40 System.Diagnostics.Process.OnSigChild(Int32, Int32) [/_/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Unix.cs @ 1104]
    PARAMETERS:
        reapAll = <no data>
        configureConsole (<CLR reg>) = 0x0000000000000001
    LOCALS:
        <CLR reg> = 0x0000000000000001
        <no data>
        <no data>

> dumpobj 0x000000156f599f80
Name:        System.Diagnostics.ProcessWaitState
MethodTable: 0000007f6480b9a8
EEClass:     0000007f64827760
Tracked Type: false
Size:        72(0x48) bytes
File:        /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/System.Diagnostics.Process.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
0000007f62d49320  4000179        8        System.Object  0 instance 000000156f599fc8 _gate
0000007f63b4e388  400017a       20         System.Int32  1 instance               60 _processId
0000007f63b4ae88  400017b       28       System.Boolean  1 instance                1 _isChild
0000007f63b4ae88  400017c       29       System.Boolean  1 instance                1 _usesTerminal
0000007f63e07350  400017d       10 ...eading.Tasks.Task  0 instance 0000000000000000 _waitInProgress
0000007f63b4e388  400017e       24         System.Int32  1 instance                2 _outstandingRefCount
0000007f63b4ae88  400017f       2a       System.Boolean  1 instance                0 _exited
0000007f63c7cc10  4000180       2c ...Private.CoreLib]]  1 instance 000000156f599fac _exitCode
0000007f63f72c08  4000181       38      System.DateTime  1 instance 000000156f599fb8 _exitTime
0000007f645345b0  4000182       18 ....ManualResetEvent  0 instance 000000156f599fe0 _exitedEvent
0000007f6480d118  4000177       60 ...nostics.Process]]  0   static 000000156f42a230 s_processWaitStates
0000007f6480d118  4000178       68 ...nostics.Process]]  0   static 000000156f42a280 s_childProcessWaitStates

and with (some) symbols

> clrstack -i -a



Dumping managed stack and managed variables using ICorDebug.
=============================================================================
Child SP         IP               Call Site
0000007EBADC4A60 0000007fa357f200 [NativeStackFrame]
0000007EBADC5460 (null) [Internal call: 0000007EBADC5460]
0000007EBADC55E0 0000007f64861c34 [DEFAULT] [hasThis] Boolean System.Diagnostics.ProcessWaitState.TryReapChild(Boolean) (/root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/System.Diagnostics.Process.dll)

PARAMETERS:
  + System.Diagnostics.ProcessWaitState this @ 0x156f599f80
  + bool configureConsole  = true

LOCALS:
  + (Error 0x80004005 retrieving local variable 'local_0')
  + (Error 0x80004005 retrieving local variable 'local_1')
  + int exitCode  = 1612906496
  + int waitResult  = -1
  + (Error 0x80004005 retrieving local variable 'local_4')
  + int errorCode  = 10

0000007EBADC5640 0000007f648610c8 [DEFAULT] Void System.Diagnostics.ProcessWaitState.CheckChildren(Boolean,Boolean) (/root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/System.Diagnostics.Process.dll)

PARAMETERS:
  + bool reapAll  = false
  + bool configureConsole  = true

LOCALS:
  + (Error 0x80004005 retrieving local variable 'local_0')
  + (Error 0x80004005 retrieving local variable 'local_1')
  + bool checkAll  = false
  + int pid  = 60
  + System.Diagnostics.ProcessWaitState pws @ 0x156f599f80
  + int errorCode  = 0
  + System.Diagnostics.ProcessWaitState firstToRemove = null
  + System.Collections.Generic.List`1&lt;System.Diagnostics.ProcessWaitState&gt; additionalToRemove = null
  + (Error 0x80004005 retrieving local variable 'local_8')
  + System.Collections.Generic.KeyValuePair`2&lt;int,System.Diagnostics.ProcessWaitState&gt; kv @ 0x7ebadc56b0
  + System.Diagnostics.ProcessWaitState pws = null
  + (Error 0x80004005 retrieving local variable 'local_11')
  + System.Diagnostics.ProcessWaitState pws = null
  + (Error 0x80004005 retrieving local variable 'local_13')

0000007EBADC5730 0000007f64860b40 [DEFAULT] I4 System.Diagnostics.Process.OnSigChild(I4,I4) (/root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/System.Diagnostics.Process.dll)

PARAMETERS:
  + (Error 0x80131304 retrieving parameter 'reapAll')
  + int configureConsole  = 1

LOCALS:
  + bool childrenUsingTerminalPre  = true
  + (Error 0x80004005 retrieving local variable 'childrenUsingTerminalPost')
  + (Error 0x80004005 retrieving local variable 'local_2')

0000007EBADC5790 0000007f60218260 [NativeStackFrame]
Stack walk complete.
=============================================================================

cc: @tmds

{ "ErrorMessage":"Error while reaping child. errno = 10" }

Report

Summary

24-Hour Hit Count	7-Day Hit Count	1-Month Count
0	0	0

The text was updated successfully, but these errors were encountered:

ghost · 2022-08-30T06:36:18Z

Tagging subscribers to this area: @dotnet/area-system-diagnostics-process
See info in area-owners.md if you want to be subscribed.

Issue Details

related to #69125.
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-aaacaf8e0a7f46c4ad/System.Net.Requests.Tests/1/console.1429bd54.log?%3Fhelixlogtype%3Dresult

/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Net.Requests.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Net.Requests.Tests (found 349 of 367 test cases)
  Starting:    System.Net.Requests.Tests (parallel test collections = on, max threads = 4)
Process terminated. Error while reaping child. errno = 10
   at System.Environment.FailFast(System.String)
   at System.Diagnostics.ProcessWaitState.TryReapChild(Boolean)
   at System.Diagnostics.ProcessWaitState.CheckChildren(Boolean, Boolean)
   at System.Diagnostics.Process.OnSigChild(Int32, Int32)
./RunTests.sh: line 168:    21 Aborted                 (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Net.Requests.Tests.runtimeconfig.json --depsfile System.Net.Requests.Tests.deps.json xunit.console.dll System.Net.Requests.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE

dump: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-aaacaf8e0a7f46c4ad/System.Net.Requests.Tests/1/core.1001.21

errno 10 -> ENOCHILD

> clrstack -a
OS Thread Id: 0x1c (0)
        Child SP               IP Call Site
0000007EBADC5460 0000007fa357f200 [HelperMethodFrame_1OBJ: 0000007ebadc5460] System.Environment.FailFast(System.String)
0000007EBADC55E0 0000007F64861C34 System.Diagnostics.ProcessWaitState.TryReapChild(Boolean) [/_/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessWaitState.Unix.cs @ 589]
    PARAMETERS:
        this (0x0000007EBADC5630) = 0x000000156f599f80
        configureConsole (0x0000007EBADC562C) = 0x0000000000000001
    LOCALS:
        0x0000007EBADC5620 = 0x000000156f599fc8
        0x0000007EBADC5618 = 0x0000000000000001
        0x0000007EBADC5610 = 0x0000000060230000
        0x0000007EBADC560C = 0x00000000ffffffff
        0x0000007EBADC5608 = 0x0000000000000001
        0x0000007EBADC5600 = 0x000000000000000a

0000007EBADC5640 0000007F648610C8 System.Diagnostics.ProcessWaitState.CheckChildren(Boolean, Boolean) [/_/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessWaitState.Unix.cs @ 614]
    PARAMETERS:
        reapAll (0x0000007EBADC5724) = 0x0000000000000000
        configureConsole (0x0000007EBADC5720) = 0x0000000000000001
    LOCALS:
        0x0000007EBADC5718 = 0x000000156f42a280
        0x0000007EBADC5710 = 0x0000000000000001
        0x0000007EBADC570C = 0x0000000000000000
        0x0000007EBADC5708 = 0x000000000000003c
        0x0000007EBADC5700 = 0x000000156f599f80
        0x0000007EBADC56F8 = 0x0000000000000000
        0x0000007EBADC56F0 = 0x0000000000000000
        0x0000007EBADC56E8 = 0x0000000000000000
        0x0000007EBADC56C0 = 0x0000000000000000
        0x0000007EBADC56B0 = 0x0000000000000000
        0x0000007EBADC56A8 = 0x0000000000000000
        0x0000007EBADC5690 = 0x0000000000000000
        0x0000007EBADC5688 = 0x0000000000000000
        0x0000007EBADC5680 = 0x0000000000000000

0000007EBADC5730 0000007F64860B40 System.Diagnostics.Process.OnSigChild(Int32, Int32) [/_/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Unix.cs @ 1104]
    PARAMETERS:
        reapAll = <no data>
        configureConsole (<CLR reg>) = 0x0000000000000001
    LOCALS:
        <CLR reg> = 0x0000000000000001
        <no data>
        <no data>

> dumpobj 0x000000156f599f80
Name:        System.Diagnostics.ProcessWaitState
MethodTable: 0000007f6480b9a8
EEClass:     0000007f64827760
Tracked Type: false
Size:        72(0x48) bytes
File:        /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/System.Diagnostics.Process.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
0000007f62d49320  4000179        8        System.Object  0 instance 000000156f599fc8 _gate
0000007f63b4e388  400017a       20         System.Int32  1 instance               60 _processId
0000007f63b4ae88  400017b       28       System.Boolean  1 instance                1 _isChild
0000007f63b4ae88  400017c       29       System.Boolean  1 instance                1 _usesTerminal
0000007f63e07350  400017d       10 ...eading.Tasks.Task  0 instance 0000000000000000 _waitInProgress
0000007f63b4e388  400017e       24         System.Int32  1 instance                2 _outstandingRefCount
0000007f63b4ae88  400017f       2a       System.Boolean  1 instance                0 _exited
0000007f63c7cc10  4000180       2c ...Private.CoreLib]]  1 instance 000000156f599fac _exitCode
0000007f63f72c08  4000181       38      System.DateTime  1 instance 000000156f599fb8 _exitTime
0000007f645345b0  4000182       18 ....ManualResetEvent  0 instance 000000156f599fe0 _exitedEvent
0000007f6480d118  4000177       60 ...nostics.Process]]  0   static 000000156f42a230 s_processWaitStates
0000007f6480d118  4000178       68 ...nostics.Process]]  0   static 000000156f42a280 s_childProcessWaitStates

and with (some) symbols

> clrstack -i -a



Dumping managed stack and managed variables using ICorDebug.
=============================================================================
Child SP         IP               Call Site
0000007EBADC4A60 0000007fa357f200 [NativeStackFrame]
0000007EBADC5460 (null) [Internal call: 0000007EBADC5460]
0000007EBADC55E0 0000007f64861c34 [DEFAULT] [hasThis] Boolean System.Diagnostics.ProcessWaitState.TryReapChild(Boolean) (/root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/System.Diagnostics.Process.dll)

PARAMETERS:
  + System.Diagnostics.ProcessWaitState this @ 0x156f599f80
  + bool configureConsole  = true

LOCALS:
  + (Error 0x80004005 retrieving local variable 'local_0')
  + (Error 0x80004005 retrieving local variable 'local_1')
  + int exitCode  = 1612906496
  + int waitResult  = -1
  + (Error 0x80004005 retrieving local variable 'local_4')
  + int errorCode  = 10

0000007EBADC5640 0000007f648610c8 [DEFAULT] Void System.Diagnostics.ProcessWaitState.CheckChildren(Boolean,Boolean) (/root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/System.Diagnostics.Process.dll)

PARAMETERS:
  + bool reapAll  = false
  + bool configureConsole  = true

LOCALS:
  + (Error 0x80004005 retrieving local variable 'local_0')
  + (Error 0x80004005 retrieving local variable 'local_1')
  + bool checkAll  = false
  + int pid  = 60
  + System.Diagnostics.ProcessWaitState pws @ 0x156f599f80
  + int errorCode  = 0
  + System.Diagnostics.ProcessWaitState firstToRemove = null
  + System.Collections.Generic.List`1&lt;System.Diagnostics.ProcessWaitState&gt; additionalToRemove = null
  + (Error 0x80004005 retrieving local variable 'local_8')
  + System.Collections.Generic.KeyValuePair`2&lt;int,System.Diagnostics.ProcessWaitState&gt; kv @ 0x7ebadc56b0
  + System.Diagnostics.ProcessWaitState pws = null
  + (Error 0x80004005 retrieving local variable 'local_11')
  + System.Diagnostics.ProcessWaitState pws = null
  + (Error 0x80004005 retrieving local variable 'local_13')

0000007EBADC5730 0000007f64860b40 [DEFAULT] I4 System.Diagnostics.Process.OnSigChild(I4,I4) (/root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/System.Diagnostics.Process.dll)

PARAMETERS:
  + (Error 0x80131304 retrieving parameter 'reapAll')
  + int configureConsole  = 1

LOCALS:
  + bool childrenUsingTerminalPre  = true
  + (Error 0x80004005 retrieving local variable 'childrenUsingTerminalPost')
  + (Error 0x80004005 retrieving local variable 'local_2')

0000007EBADC5790 0000007f60218260 [NativeStackFrame]
Stack walk complete.
=============================================================================

cc: @tmds

Author:	wfurt
Assignees:	-
Labels:	`arch-arm64`, `area-System.Diagnostics.Process`, `os-linux`
Milestone:	-

tmds · 2022-08-30T12:41:23Z

We had this last year for System.Net.Requests.Tests too: #55618.

This FailFast happens when some other code in the process reaps a child process that was started by .NET.

danmoseley · 2022-08-30T13:34:37Z

In such a case, it's a fail fast necessary? Would it be reasonable to just continue?

tmds · 2022-08-30T13:56:33Z

Someone proposed such a change. We didn't take it because we obtain the exit code when we reap the child (from waitpid). So this affects the further state of the program. (#70705 (comment)).

danmoseley · 2022-08-30T17:47:34Z

Someone proposed such a change. We didn't take it because we obtain the exit code when we reap the child (from waitpid). So this affects the further state of the program. (#70705 (comment)).

I read that comment as a general statement that we prefer to fail fast than continue in a corrupted or undefined state (cc @jkotas). In this case, can we reasonably recover from this error? It seems we use the error code to release resources -- can we just do that? Or is a leak inevitable here?

It seems fragile to terminate the app suddenly in the case some child process terminates or is killed for some reason. I guess this does not happen on Windows.

wfurt · 2022-08-30T18:20:30Z

One another questions is what this tart happening now. It can certainly be race condition that always existed. I tried to reproduce it locally (on ARM64) but so far no luck.
Only change I'm aware of is we updated test images to include MsQuic. That really should not impact this test set but timing may changed. I did quick search through MsQuic code and I did not see any attempts to deal with process management.
So the question is is what is reaping the process...

jkotas · 2022-08-30T19:07:51Z

than continue in a corrupted or undefined state (cc @jkotas).

The corrupted state here is akin to calling free on a pointer that you do not own.

The situation is not triggered by child process terminating or being killed for some reason. The situation is triggered by some code in the current process handling it incorrectly.

It is also possible that the root cause has nothing to do with process reaping. Instead, the root cause may be a memory corruption somewhere else that this crash is a side-effect of.

wfurt · 2022-08-31T01:42:44Z

I did check the test suites and I don't see any other attempts to reap or execute besides our RemoteExecutor @tmds.

16623 execve("/media/5bfe43cb-dd6e-413b-8b94-00f69f90ecc2/wfurt/runtime/artifacts/bin/testhost/net7.0-Linux-Debug-arm64/dotnet", ["/media/5bfe43cb-dd6e-413b-8b94-0"..., "exec", "--runtimeconfig", "System.Net.Requests.Tests.runtim"..., "--depsfile", "System.Net.Requests.Tests.deps.j"..., "xunit.console.dll", "System.Net.Requests.Tests.dll", "-xml", "testResults.xml", "-nologo", "-notrait", "category=OuterLoop", "-notrait", "category=failing", "-nocolor"], 0x7fe0751af8 /* 13 vars */) = 0
16644 execve("/mnt/runtime/artifacts/bin/testhost/net7.0-Linux-Debug-arm64/dotnet", ["/mnt/runtime/artifacts/bin/testh"..., "exec", "--runtimeconfig", "/mnt/runtime/artifacts/bin/Syste"..., "--depsfile", "/mnt/runtime/artifacts/bin/Syste"..., "/mnt/runtime/artifacts/bin/Syste"..., "System.Net.Requests.Tests, Versi"..., "System.Net.Tests.WebRequestTest+"..., "<DefaultWebProxy_SetThenGet_Valu"..., "/tmp/31jqjssf.ryi"], 0x7e8803b9a0 /* 13 vars */) = 0
16651 execve("/mnt/runtime/artifacts/bin/testhost/net7.0-Linux-Debug-arm64/dotnet", ["/mnt/runtime/artifacts/bin/testh"..., "exec", "--runtimeconfig", "/mnt/runtime/artifacts/bin/Syste"..., "--depsfile", "/mnt/runtime/artifacts/bin/Syste"..., "/mnt/runtime/artifacts/bin/Syste"..., "System.Net.Requests.Tests, Versi"..., "System.Net.Tests.GlobalProxySele"..., "<Select_Success>b__1_0", "/tmp/0rb3jbxa.g4r"], 0x7e90005e40 /* 13 vars */) = 0
16640 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16644, si_uid=1001, si_status=42, si_utime=10, si_stime=6} ---
16630 waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16644, si_uid=1001, si_status=42, si_utime=0, si_stime=0}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16630 wait4(16644, [{WIFEXITED(s) && WEXITSTATUS(s) == 42}], WNOHANG, NULL) = 16644
16630 waitid(P_ALL, 0, {}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16638 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16651, si_uid=1001, si_status=42, si_utime=18, si_stime=10} ---
16663 execve("/mnt/runtime/artifacts/bin/testhost/net7.0-Linux-Debug-arm64/dotnet", ["/mnt/runtime/artifacts/bin/testh"..., "exec", "--runtimeconfig", "/mnt/runtime/artifacts/bin/Syste"..., "--depsfile", "/mnt/runtime/artifacts/bin/Syste"..., "/mnt/runtime/artifacts/bin/Syste"..., "System.Net.Requests.Tests, Versi"..., "System.Net.Tests.WebRequestTest+"..., "<SendGetRequest_WithGlobalCacheP"..., "/tmp/dkgtmmfz.ado"], 0x7e8803d100 /* 13 vars */) = 0
16630 waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16651, si_uid=1001, si_status=42, si_utime=0, si_stime=0}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16630 wait4(16651, [{WIFEXITED(s) && WEXITSTATUS(s) == 42}], WNOHANG, NULL) = 16651
16630 waitid(P_ALL, 0, {}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16664 execve("/mnt/runtime/artifacts/bin/testhost/net7.0-Linux-Debug-arm64/dotnet", ["/mnt/runtime/artifacts/bin/testh"..., "exec", "--runtimeconfig", "/mnt/runtime/artifacts/bin/Syste"..., "--depsfile", "/mnt/runtime/artifacts/bin/Syste"..., "/mnt/runtime/artifacts/bin/Syste"..., "System.Net.Requests.Tests, Versi"..., "System.Net.Tests.AuthenticationM"..., "<CustomTargetNameDictionary_Vali"..., "/tmp/3ibo3kk5.gfq"], 0x7e90051de0 /* 13 vars */) = 0
16637 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16664, si_uid=1001, si_status=42, si_utime=10, si_stime=6} ---
16630 waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16664, si_uid=1001, si_status=42, si_utime=0, si_stime=0}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16630 wait4(16664, [{WIFEXITED(s) && WEXITSTATUS(s) == 42}], WNOHANG, NULL) = 16664
16630 waitid(P_ALL, 0, {}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16678 execve("/mnt/runtime/artifacts/bin/testhost/net7.0-Linux-Debug-arm64/dotnet", ["/mnt/runtime/artifacts/bin/testh"..., "exec", "--runtimeconfig", "/mnt/runtime/artifacts/bin/Syste"..., "--depsfile", "/mnt/runtime/artifacts/bin/Syste"..., "/mnt/runtime/artifacts/bin/Syste"..., "System.Net.Requests.Tests, Versi"..., "System.Net.Tests.AuthenticationM"..., "<Register_Unregister_ModuleCount"..., "/tmp/tuctrifw.tyv"], 0x7e9002e400 /* 13 vars */) = 0
16689 execve("/mnt/runtime/artifacts/bin/testhost/net7.0-Linux-Debug-arm64/dotnet", ["/mnt/runtime/artifacts/bin/testh"..., "exec", "--runtimeconfig", "/mnt/runtime/artifacts/bin/Syste"..., "--depsfile", "/mnt/runtime/artifacts/bin/Syste"..., "/mnt/runtime/artifacts/bin/Syste"..., "System.Net.Requests.Tests, Versi"..., "System.Net.Tests.HttpWebRequestT"..., "<DefaultMaximumResponseHeadersLe"..., "/tmp/5edbvy3h.1x1"], 0x7e8c07b110 /* 13 vars */) = 0
16637 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16678, si_uid=1001, si_status=42, si_utime=12, si_stime=9} ---
16690 execve("/mnt/runtime/artifacts/bin/testhost/net7.0-Linux-Debug-arm64/dotnet", ["/mnt/runtime/artifacts/bin/testh"..., "exec", "--runtimeconfig", "/mnt/runtime/artifacts/bin/Syste"..., "--depsfile", "/mnt/runtime/artifacts/bin/Syste"..., "/mnt/runtime/artifacts/bin/Syste"..., "System.Net.Requests.Tests, Versi"..., "System.Net.Tests.HttpWebRequestT"..., "<GetResponseAsync_ParametersAreC"..., "/tmp/ad05mcf0.phu", "False"], 0x7e8403e720 /* 13 vars */) = 0
16630 waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16678, si_uid=1001, si_status=42, si_utime=0, si_stime=0}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16630 wait4(16678, [{WIFEXITED(s) && WEXITSTATUS(s) == 42}], WNOHANG, NULL) = 16678
16630 waitid(P_ALL, 0, {}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16691 execve("/mnt/runtime/artifacts/bin/testhost/net7.0-Linux-Debug-arm64/dotnet", ["/mnt/runtime/artifacts/bin/testh"..., "exec", "--runtimeconfig", "/mnt/runtime/artifacts/bin/Syste"..., "--depsfile", "/mnt/runtime/artifacts/bin/Syste"..., "/mnt/runtime/artifacts/bin/Syste"..., "System.Net.Requests.Tests, Versi"..., "System.Net.Tests.AuthenticationM"..., "<CredentialPolicy_Roundtrip>b__6"..., "/tmp/jqxqxkup.irr"], 0x7e9002d320 /* 13 vars */) = 0
16638 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16689, si_uid=1001, si_status=42, si_utime=11, si_stime=6} ---
16630 waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16689, si_uid=1001, si_status=42, si_utime=0, si_stime=0}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16630 wait4(16689, [{WIFEXITED(s) && WEXITSTATUS(s) == 42}], WNOHANG, NULL) = 16689
16630 waitid(P_ALL, 0, {}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16637 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16691, si_uid=1001, si_status=42, si_utime=11, si_stime=11} ---
16630 waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16691, si_uid=1001, si_status=42, si_utime=0, si_stime=0}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16630 wait4(16691, [{WIFEXITED(s) && WEXITSTATUS(s) == 42}], WNOHANG, NULL) = 16691
16630 waitid(P_ALL, 0, {}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16711 execve("/mnt/runtime/artifacts/bin/testhost/net7.0-Linux-Debug-arm64/dotnet", ["/mnt/runtime/artifacts/bin/testh"..., "exec", "--runtimeconfig", "/mnt/runtime/artifacts/bin/Syste"..., "--depsfile", "/mnt/runtime/artifacts/bin/Syste"..., "/mnt/runtime/artifacts/bin/Syste"..., "System.Net.Requests.Tests, Versi"..., "System.Net.Tests.AuthenticationM"..., "<Register_UnregisterByScheme_Mod"..., "/tmp/hzkvicop.2dw"], 0x7e9002d3c0 /* 13 vars */) = 0
16728 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16711, si_uid=1001, si_status=42, si_utime=13, si_stime=5} ---
16630 waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16711, si_uid=1001, si_status=42, si_utime=0, si_stime=0}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16630 wait4(16711, [{WIFEXITED(s) && WEXITSTATUS(s) == 42}], WNOHANG, NULL) = 16711
16630 waitid(P_ALL, 0, {}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16718 --- SIGRTMIN {si_signo=SIGRTMIN, si_code=SI_TKILL, si_pid=16663, si_uid=1001} ---
16719 --- SIGRTMIN {si_signo=SIGRTMIN, si_code=SI_TKILL, si_pid=16663, si_uid=1001} ---
16728 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16663, si_uid=1001, si_status=42, si_utime=95, si_stime=32} ---
16630 waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16663, si_uid=1001, si_status=42, si_utime=0, si_stime=0}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
16630 wait4(16663, [{WIFEXITED(s) && WEXITSTATUS(s) == 42}], WNOHANG, NULL) = 16663
16630 waitid(P_ALL, 0, {}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0
....

I'm running the test suite in loop but as usually no repro. It is interesting that the dump above has _usesTerminal=1. I was thinking about differences in CI and one is that the test suite is executed from background Helix daemon vs running in foreground from interactive shell. I don't think it should make much difference for process management but the process code has some specific processing.

tmds · 2022-08-31T05:58:05Z

The process exit handling code hasn't changed much in the past years. I don't expect it to have a bug as it has been used probably millions of times in various situations without users reporting the errno = 10 case.

I think we should treat this as a glitch unless it reproduces more.

wfurt · 2022-08-31T16:44:34Z

I run it whole night on ARM machine without repro. However it fails somewhat often in CI. As @jkotas mentioned it may be data corruption. But I did not see any signs and it fails in same way on different OSes and versions.

On the glitch path .... is there chance that is some kernel bug or behavior? I was thinking to run it ci with tracing enabled (and dump it on failure) but it seems like the process code does not really have any. What would we need to come to bottom of this (and how do we expect users to debug it?) runnings strace may not be practical in many cases IMHO.

tmds · 2022-08-31T17:11:15Z

However it fails somewhat often in CI.

I thought this was a single occurrence. Since when did it start failing in CI?

wfurt · 2022-08-31T17:26:53Z

It feels like it start happening more in last few weeks. But as far as I can tell the #55618 was never solved, right?
There was #73972 so it is hard to know all the details without going through the logs by hand. (Kusto reports same result)
With the corruption we would generally get crash but the waitpid issue is different and consistent.

I see ~ 200+ failures in last two weeks. Mono was disabled in #74668 (see #74667)

tmds · 2022-08-31T19:10:38Z

#55618 was reported over a year ago, and it was closed after the issue had no longer occurred for 3 weeks. Since then it wasn't reported until this issue was created this week.

Having an idea of when it started to occur more can give us a clue as to what change triggered it.

I see ~ 200+ failures in last two weeks. Mono was disabled in #74668 (see #74667)

We've only seen this on arm64?

Does it occur less with coreclr than with mono?

The code asks the kernel if there are children that terminated using waitid:

16630 waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16711, si_uid=1001, si_status=42, si_utime=0, si_stime=0}, WNOHANG|WEXITED|WNOWAIT, NULL) = 0

If there is one, it passes the pid (from si_pid) to reap the child:

16630 wait4(16711, [{WIFEXITED(s) && WEXITSTATUS(s) == 42}], WNOHANG, NULL) = 16711

In between these calls the child suddenly goes missing.

So either something has messed up the pid, or something else has reaped the child in the meanwhile.

If it's something else reaping the child, it's expected to be a native library making a blocking waitpid(-1, &status, 0) call (or wait(&status)) on a thread. This thread will race with .NET signal handling as soon as the child terminates.

If there is no native library doing this, then somehow the pid changed.
Or there is a bug in the kernel.

Note that the signal handling is running on is native pthread. It's not a managed thread.

wfurt · 2022-08-31T19:32:58Z

I see it recently only on ARM. Some of the older cases are gone e.g. there is note of crash on Kusto but we no longer have console so we cannot check if the pattern is the same. I searched our code base (and msquic) and I did not see any direct uses of wait*. So I'm running out of ideas.

Now, since you mentioned the -1: #72830 is another mysterious case when Socket's SafeHandle turns into -1. There is also no local repro so far and insight is minimal.

wfurt · 2022-08-31T19:50:30Z

actually, the runs are containers. I'm wondering if this is applicable:

runtime/src/native/libs/System.Native/pal_signal.c

Lines 349 to 354 in 3afb168

    
           // In two cases we reap all processes (and may inadvertently reap non-managed processes): 
        
           // - When the original disposition is SIG_IGN, children that terminated did not become zombies. 
        
           //   Because overwrote the disposition, we have become responsible for reaping those processes. 
        
           // - pid 1 (the init daemon) is responsible for reaping orphaned children. 
        
           //   Because containers usually don't have an init daemon .NET may be pid 1. 
        
           bool reapAll = g_pid == 1 || IsSigIgn(OrigActionFor(signalCode));

That would eventually call

 pid = Interop.Sys.WaitPidExitedNoHang(-1, out _);

jkotas · 2022-08-31T22:12:01Z

Now, since you mentioned the -1: #72830 is another mysterious case when Socket's SafeHandle turns into -1.

Yes, both of these can have the case root cause.

Notice that both of these issues have been only seen in System.Net.Requests.Tests.

One possible explanation is that a method passes a pointer to a stack location to some asynchronous logic. The method goes out of scope, and some completely unrelated code starts running. And finally, the asynchronous logic wakes up and overwrites the location on the stack that is used for something completely different by now.

Does it ring any bells? Do we have a code in sockets that is passing a pointer to a stack location to some asynchronous logic?

wfurt · 2022-08-31T22:15:21Z

I will take a look. #72830 happens in System.Net.Mail. That has really only one functional change recently and I don't see how that would cause either of the problems.

jkotas · 2022-08-31T22:41:00Z

I will take a look. #72830 happens in System.Net.Mail.

Right, but the mysterious crash is still in sockets.

tmds · 2022-09-01T06:20:29Z

actually, the runs are containers. I'm wondering if this is applicable:

It isn't. It is applicable only for the main process in the container (pid 1). You can see reapAll = false in your debugger output.

I don't see any reports of this on x64.

Was Mono disabled because it happens more on Mono than it does on coreclr?

jkotas · 2022-09-10T18:54:38Z

Our tests have a specific pattern that is prone to hit the lttng bug: The tests are initializing Quic on one thread while there is a remote executor exec storm of short-lived processes on other threads. I think that it is unlikely for real world applications out there to have the same pattern.

Until we get lttng fix in place, we may want to set QUIC_LTTng=0 in our CI docker images.

wfurt · 2022-09-11T22:50:42Z

This is not just the docker, right? Since this is showing up so far only on ARM64, we may choose to update only that.
I put up #75424 with alternative approach to load msquic early before tests runs. That also seems sufficient.

I'm working on issue for LTTNG. They don't use GitHub for tracking and my registration to submit issue is pending for approval. I look at the code again and changing wait to waitpid should be safe and it it should fix the observed behavior.

karelz · 2022-09-12T14:24:54Z

@wfurt do we understand why it is happening only on System.Net.Request on not on other test suites?
Will we need to initialize msquic in more test suites in future?

karelz · 2022-09-12T14:27:18Z

Also, does it impact also release/7.0 branch? Should we backport it as test-only change?

jkotas · 2022-09-12T16:38:43Z

@wfurt do we understand why it is happening only on System.Net.Request on not on other test suites?
Will we need to initialize msquic in more test suites in future?

Any test that launches child processes and ends up loading Quic is prone to hitting this issue. I expect that we will see this problem in a few more tests over time.

An alternative big hammer fix that would address it everywhere would be to set QUIC_LTTng=0 in the environment. I think we try the proposed fix and see how it works, and only go back to the big hammer fix if we get signal that the problem is wide-spread.

does it impact also release/7.0 branch? Should we backport it as test-only change?

Yes.

wfurt · 2022-09-12T16:41:16Z

Note this is initialization race. So when Quic is initialized befe test runs and child processes it is OK. That for example happens when we conditional tests where XUnit evaluates conditions before test run.

ghost · 2022-09-12T17:38:53Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Disabled tests:

9/8 - Mono Linux arm64 in PR Disable System.Net.Requests.Tests on Mono Linux arm64 #75266 - see assert in ProcessWaitState on Linux arm64 #74795 (comment)

Crash in System.Net.Requests.Tests Work Item - Last 7 days in Runfo:

9/8 - 142x failures out of which 115x are on arm64-Mono

Day	Run	Details
9/7	Rolling run 6968	Core Dump - net7.0-Linux-Release-arm64-NativeAOT_Release-(Ubuntu.1804.ArmArch.Open)Ubuntu.1804.ArmArch.Open
9/6	Rolling run 6173	2x Core Dump - net7.0-Linux-Release-arm64-Mono_release-(Ubuntu.1804.ArmArch.Open)Ubuntu.1804.ArmArch.Open & net7.0-Linux-Release-arm64-Mono_release-(Debian.11.Arm64.Open)Ubuntu.1804.Armarch.Open
9/4	Rolling run 4317	Code Rump - net7.0-Linux-Release-arm64-CoreCLR_release-(Ubuntu.2204.Arm64.Open)Ubuntu.1804.ArmArch.Open
8/30	Rolling run 1975580	2x Core Dump - net7.0-Linux-Release-arm64-Mono_release-(Ubuntu.1804.ArmArch.Open)Ubuntu.1804.ArmArch.Open & net7.0-Linux-Release-arm64-Mono_release-(Debian.11.Arm64.Open)Ubuntu.1804.Armarch.Open
8/29	Rolling run 1973318	Code Dump - net7.0-Linux-Release-arm64-Mono_release-(Ubuntu.1804.ArmArch.Open)Ubuntu.1804.ArmArch.Open
8/28	Rolling run 1972808	Core Dump - net7.0-Linux-Release-arm64-Mono_release-(Debian.11.Arm64.Open)Ubuntu.1804.Armarch.Open
8/28	Rolling run 1972465	Core Dump - net7.0-Linux-Release-arm64-Mono_release-(Ubuntu.1804.ArmArch.Open)Ubuntu.1804.ArmArch.Open
8/26	Rolling run 1970381	Core Dump - net7.0-Linux-Release-arm64-CoreCLR_release-(Ubuntu.2204.Arm64.Open)Ubuntu.1804.ArmArch.Open

Related to #69125.
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-aaacaf8e0a7f46c4ad/System.Net.Requests.Tests/1/console.1429bd54.log?%3Fhelixlogtype%3Dresult

/root/helix/work/workitem/e /root/helix/work/workitem/e
  Discovering: System.Net.Requests.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Net.Requests.Tests (found 349 of 367 test cases)
  Starting:    System.Net.Requests.Tests (parallel test collections = on, max threads = 4)
Process terminated. Error while reaping child. errno = 10
   at System.Environment.FailFast(System.String)
   at System.Diagnostics.ProcessWaitState.TryReapChild(Boolean)
   at System.Diagnostics.ProcessWaitState.CheckChildren(Boolean, Boolean)
   at System.Diagnostics.Process.OnSigChild(Int32, Int32)
./RunTests.sh: line 168:    21 Aborted                 (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Net.Requests.Tests.runtimeconfig.json --depsfile System.Net.Requests.Tests.deps.json xunit.console.dll System.Net.Requests.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE

dump: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-aaacaf8e0a7f46c4ad/System.Net.Requests.Tests/1/core.1001.21

errno 10 -> ENOCHILD

> clrstack -a
OS Thread Id: 0x1c (0)
        Child SP               IP Call Site
0000007EBADC5460 0000007fa357f200 [HelperMethodFrame_1OBJ: 0000007ebadc5460] System.Environment.FailFast(System.String)
0000007EBADC55E0 0000007F64861C34 System.Diagnostics.ProcessWaitState.TryReapChild(Boolean) [/_/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessWaitState.Unix.cs @ 589]
    PARAMETERS:
        this (0x0000007EBADC5630) = 0x000000156f599f80
        configureConsole (0x0000007EBADC562C) = 0x0000000000000001
    LOCALS:
        0x0000007EBADC5620 = 0x000000156f599fc8
        0x0000007EBADC5618 = 0x0000000000000001
        0x0000007EBADC5610 = 0x0000000060230000
        0x0000007EBADC560C = 0x00000000ffffffff
        0x0000007EBADC5608 = 0x0000000000000001
        0x0000007EBADC5600 = 0x000000000000000a

0000007EBADC5640 0000007F648610C8 System.Diagnostics.ProcessWaitState.CheckChildren(Boolean, Boolean) [/_/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessWaitState.Unix.cs @ 614]
    PARAMETERS:
        reapAll (0x0000007EBADC5724) = 0x0000000000000000
        configureConsole (0x0000007EBADC5720) = 0x0000000000000001
    LOCALS:
        0x0000007EBADC5718 = 0x000000156f42a280
        0x0000007EBADC5710 = 0x0000000000000001
        0x0000007EBADC570C = 0x0000000000000000
        0x0000007EBADC5708 = 0x000000000000003c
        0x0000007EBADC5700 = 0x000000156f599f80
        0x0000007EBADC56F8 = 0x0000000000000000
        0x0000007EBADC56F0 = 0x0000000000000000
        0x0000007EBADC56E8 = 0x0000000000000000
        0x0000007EBADC56C0 = 0x0000000000000000
        0x0000007EBADC56B0 = 0x0000000000000000
        0x0000007EBADC56A8 = 0x0000000000000000
        0x0000007EBADC5690 = 0x0000000000000000
        0x0000007EBADC5688 = 0x0000000000000000
        0x0000007EBADC5680 = 0x0000000000000000

0000007EBADC5730 0000007F64860B40 System.Diagnostics.Process.OnSigChild(Int32, Int32) [/_/src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Unix.cs @ 1104]
    PARAMETERS:
        reapAll = <no data>
        configureConsole (<CLR reg>) = 0x0000000000000001
    LOCALS:
        <CLR reg> = 0x0000000000000001
        <no data>
        <no data>

> dumpobj 0x000000156f599f80
Name:        System.Diagnostics.ProcessWaitState
MethodTable: 0000007f6480b9a8
EEClass:     0000007f64827760
Tracked Type: false
Size:        72(0x48) bytes
File:        /root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/System.Diagnostics.Process.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
0000007f62d49320  4000179        8        System.Object  0 instance 000000156f599fc8 _gate
0000007f63b4e388  400017a       20         System.Int32  1 instance               60 _processId
0000007f63b4ae88  400017b       28       System.Boolean  1 instance                1 _isChild
0000007f63b4ae88  400017c       29       System.Boolean  1 instance                1 _usesTerminal
0000007f63e07350  400017d       10 ...eading.Tasks.Task  0 instance 0000000000000000 _waitInProgress
0000007f63b4e388  400017e       24         System.Int32  1 instance                2 _outstandingRefCount
0000007f63b4ae88  400017f       2a       System.Boolean  1 instance                0 _exited
0000007f63c7cc10  4000180       2c ...Private.CoreLib]]  1 instance 000000156f599fac _exitCode
0000007f63f72c08  4000181       38      System.DateTime  1 instance 000000156f599fb8 _exitTime
0000007f645345b0  4000182       18 ....ManualResetEvent  0 instance 000000156f599fe0 _exitedEvent
0000007f6480d118  4000177       60 ...nostics.Process]]  0   static 000000156f42a230 s_processWaitStates
0000007f6480d118  4000178       68 ...nostics.Process]]  0   static 000000156f42a280 s_childProcessWaitStates

and with (some) symbols

> clrstack -i -a



Dumping managed stack and managed variables using ICorDebug.
=============================================================================
Child SP         IP               Call Site
0000007EBADC4A60 0000007fa357f200 [NativeStackFrame]
0000007EBADC5460 (null) [Internal call: 0000007EBADC5460]
0000007EBADC55E0 0000007f64861c34 [DEFAULT] [hasThis] Boolean System.Diagnostics.ProcessWaitState.TryReapChild(Boolean) (/root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/System.Diagnostics.Process.dll)

PARAMETERS:
  + System.Diagnostics.ProcessWaitState this @ 0x156f599f80
  + bool configureConsole  = true

LOCALS:
  + (Error 0x80004005 retrieving local variable 'local_0')
  + (Error 0x80004005 retrieving local variable 'local_1')
  + int exitCode  = 1612906496
  + int waitResult  = -1
  + (Error 0x80004005 retrieving local variable 'local_4')
  + int errorCode  = 10

0000007EBADC5640 0000007f648610c8 [DEFAULT] Void System.Diagnostics.ProcessWaitState.CheckChildren(Boolean,Boolean) (/root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/System.Diagnostics.Process.dll)

PARAMETERS:
  + bool reapAll  = false
  + bool configureConsole  = true

LOCALS:
  + (Error 0x80004005 retrieving local variable 'local_0')
  + (Error 0x80004005 retrieving local variable 'local_1')
  + bool checkAll  = false
  + int pid  = 60
  + System.Diagnostics.ProcessWaitState pws @ 0x156f599f80
  + int errorCode  = 0
  + System.Diagnostics.ProcessWaitState firstToRemove = null
  + System.Collections.Generic.List`1&lt;System.Diagnostics.ProcessWaitState&gt; additionalToRemove = null
  + (Error 0x80004005 retrieving local variable 'local_8')
  + System.Collections.Generic.KeyValuePair`2&lt;int,System.Diagnostics.ProcessWaitState&gt; kv @ 0x7ebadc56b0
  + System.Diagnostics.ProcessWaitState pws = null
  + (Error 0x80004005 retrieving local variable 'local_11')
  + System.Diagnostics.ProcessWaitState pws = null
  + (Error 0x80004005 retrieving local variable 'local_13')

0000007EBADC5730 0000007f64860b40 [DEFAULT] I4 System.Diagnostics.Process.OnSigChild(I4,I4) (/root/helix/work/correlation/shared/Microsoft.NETCore.App/8.0.0/System.Diagnostics.Process.dll)

PARAMETERS:
  + (Error 0x80131304 retrieving parameter 'reapAll')
  + int configureConsole  = 1

LOCALS:
  + bool childrenUsingTerminalPre  = true
  + (Error 0x80004005 retrieving local variable 'childrenUsingTerminalPost')
  + (Error 0x80004005 retrieving local variable 'local_2')

0000007EBADC5790 0000007f60218260 [NativeStackFrame]
Stack walk complete.
=============================================================================

cc: @tmds

{ "ErrorMessage":"Error while reaping child. errno = 10" }

Report

Summary

24-Hour Hit Count	7-Day Hit Count	1-Month Count
0	0	0

Author:	wfurt
Assignees:	wfurt
Labels:	`arch-arm64`, `disabled-test`, `os-linux`, `blocking-clean-ci`, `area-System.Net.Quic`, `in-pr`, `Known Build Error`
Milestone:	7.0.0

wfurt · 2022-09-12T17:43:39Z

reopening for 7.0

karelz · 2022-09-12T21:24:21Z

@wfurt can you please also re-enable the disabled test from PR #75266?

wfurt · 2022-09-12T21:50:07Z

That was part of #75424 unless I missed something. @jkotas already asked for it during review.

wfurt · 2022-09-12T21:52:34Z

#75266 does not seems to make it to 7.0 so that part is missing the back-port PR.

karelz · 2022-09-12T22:39:49Z

Yep, sorry - I looked accidentally only at 7.0 change where it was not disabled ... I updated to post and removed the disabled-test label.

karelz · 2022-09-13T13:55:31Z

Fixed in main (8.0) in PR #75424 and in 7.0 (RC2) in PR #75466.

tmds · 2022-09-15T06:14:26Z

For reference, the lttng issue created by @wfurt: https://bugs.lttng.org/issues/1359.

wfurt added arch-arm64 area-System.Diagnostics.Process os-linux Linux OS (any supported distro) labels Aug 30, 2022

ghost added the untriaged New issue has not been triaged by the area owner label Aug 30, 2022

wfurt mentioned this issue Aug 30, 2022

segfault when calling MarshalNative::GCHandleInternalGet in System.Net.Requests.Tests in rolling CI #69125

Closed

rzikm mentioned this issue Aug 30, 2022

Fix possible stack overflow in HttpListener #74644

Merged

jkotas added the Known Build Error Use this to report build issues in the .NET Helix tab label Aug 30, 2022

jozkee added this to the Future milestone Aug 30, 2022

ghost removed the untriaged New issue has not been triaged by the area owner label Aug 30, 2022

GrabYourPitchforks mentioned this issue Aug 30, 2022

Fix incorrect "illegal XML comment chars" check & add regression tests #74787

Merged

wfurt mentioned this issue Aug 30, 2022

Mono crash in System.Net.Requests tests on linux-arm64 #74667

Closed

wfurt mentioned this issue Sep 11, 2022

load msquic early to avoid interference with RemoteExecutor #75424

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 11, 2022

karelz mentioned this issue Sep 12, 2022

[release/7.0] Unload MsQuic after checking for QUIC support to free resources (#75163) #75330

Closed

danmoseley added area-System.Net.Quic and removed area-System.Diagnostics.Process labels Sep 12, 2022

wfurt closed this as completed in #75424 Sep 12, 2022

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 12, 2022

wfurt reopened this Sep 12, 2022

wfurt mentioned this issue Sep 12, 2022

[release/7.0] load msquic early to avoid interference with RemoteExecutor #75466

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 12, 2022

karelz mentioned this issue Sep 12, 2022

System.Net.Security crash on arm64-Mono #75484

Closed

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 12, 2022

karelz removed the disabled-test The test is disabled in source code against the issue label Sep 12, 2022

karelz closed this as completed Sep 13, 2022

karelz mentioned this issue Sep 14, 2022

[release/7.0] load quic early to prevent interaction with tests #75603

Merged

ghost locked as resolved and limited conversation to collaborators Oct 15, 2022

assert in ProcessWaitState on Linux arm64 #74795

assert in ProcessWaitState on Linux arm64 #74795

Comments

wfurt commented Aug 30, 2022 • edited by karelz Loading

Report

Summary

ghost commented Aug 30, 2022

tmds commented Aug 30, 2022

danmoseley commented Aug 30, 2022

tmds commented Aug 30, 2022

danmoseley commented Aug 30, 2022

wfurt commented Aug 30, 2022

jkotas commented Aug 30, 2022

wfurt commented Aug 31, 2022

tmds commented Aug 31, 2022

wfurt commented Aug 31, 2022

tmds commented Aug 31, 2022

wfurt commented Aug 31, 2022

tmds commented Aug 31, 2022

wfurt commented Aug 31, 2022

wfurt commented Aug 31, 2022

jkotas commented Aug 31, 2022

wfurt commented Aug 31, 2022

jkotas commented Aug 31, 2022

tmds commented Sep 1, 2022

jkotas commented Sep 10, 2022

wfurt commented Sep 11, 2022

karelz commented Sep 12, 2022

karelz commented Sep 12, 2022

jkotas commented Sep 12, 2022

wfurt commented Sep 12, 2022

ghost commented Sep 12, 2022

Report

Summary

wfurt commented Sep 12, 2022

karelz commented Sep 12, 2022 • edited Loading

wfurt commented Sep 12, 2022

wfurt commented Sep 12, 2022

karelz commented Sep 12, 2022

karelz commented Sep 13, 2022

tmds commented Sep 15, 2022

wfurt commented Aug 30, 2022 •

edited by karelz

Loading

karelz commented Sep 12, 2022 •

edited

Loading