Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"The active test run was aborted. Reason: Test host process crashed" with no further explanation #4376

Closed
keyboardDrummer opened this issue Apr 6, 2023 · 8 comments

Comments

@keyboardDrummer
Copy link

Description

This test run calls dotnet test, which then outputs

The active test run was aborted. Reason: Test host process crashed
Results File: /home/runner/work/dafny/dafny/dafny/Source/IntegrationTests/TestResults/_fv-az446-243_2023-04-06_10_29_09.trx

Test Run Aborted with error System.Exception: One or more errors occurred.
 ---> System.Exception: Unable to read beyond the end of the stream.
   at System.IO.BinaryReader.Read7BitEncodedInt()
   at System.IO.BinaryReader.ReadString()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.NotifyDataAvailable()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TcpClientExtensions.MessageLoopAsync(TcpClient client, ICommunicationChannel channel, Action`1 errorHandler, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---.

Looking at the referenced results file (here), there is no indication of anything going wrong.

How can I get a clue of what caused the test host process to crash? Are there particular exceptions that would cause it to crash in this way such as out of memory or stack overflows?

Steps to reproduce

I don't have a small reproduction example, but this issue occurs when the CI for this PR is run

Expected behavior

I expect some sort of stack trace or explanation of why the test host process crashed, that doesn't point to from the test host process such as Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TcpClientExtensions.MessageLoopAsync

Actual behavior

No explanation is given

Diagnostic logs

Environment

Occurs on Ubuntu

@microsoft-github-policy-service microsoft-github-policy-service bot added the needs-triage This item should be discussed in the next triage meeting. label Apr 6, 2023
@dave-yotta
Copy link

dave-yotta commented Apr 11, 2023

We've started getting this since around 6th april 10:30 GMT, across unrelated topic branches so appears not due to a change we've made. We use dotnet install scripts ./dotnet-install.sh --channel 7.0 --install-dir /usr/share/dotnet so will pull anything new on there, has anything happened super recently?

Edit: Could this be related a change in .NET sdk 7.0.104 somehow reaching us around this time?

Edit2: No, I can't see any differences in the sdk, runtime or install-scripts.sh versions that got pulled between passing and failing builds. Will try disabling running in seperate process so I can see the actual error hopefully.

@nohwnd
Copy link
Member

nohwnd commented Apr 12, 2023

You can add --diag:logs/log.txt to your run and that should give your more information. Also depending on the version of TestPlatform you are using, we may fail to capture the info on the process crash, because the Exit event comes before the output is flushed (only on Linux), so we fail to capture that error output on older versions of TP (pre 17.4 I think).

You can also add --blame-crash to see if that crash will produce a memory dump, and then analyze that with dotnet dump analyze https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-dump#dotnet-dump-analyze

@DalekBaldwin
Copy link

We've started getting this since around 6th april 10:30 GMT, across unrelated topic branches so appears not due to a change we've made. We use dotnet install scripts ./dotnet-install.sh --channel 7.0 --install-dir /usr/share/dotnet so will pull anything new on there, has anything happened super recently?

Edit: Could this be related a change in .NET sdk 7.0.104 somehow reaching us around this time?

Edit2: No, I can't see any differences in the sdk, runtime or install-scripts.sh versions that got pulled between passing and failing builds. Will try disabling running in seperate process so I can see the actual error hopefully.

Right after updating SDK 6 to 6.0.311 (the latest), I'm getting a very similar error when testing unrelated branches/old commits that previously ran fine. So I have a strong suspicion it may be a recent change incorporated into multiple parallel SDK versions.

The active test run was aborted. Reason: Unable to communicate with test host process.
Test Run Aborted with error System.Exception: One or more errors occurred.
 ---> System.Exception: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..
 ---> System.Exception: An existing connection was forcibly closed by the remote host.
   at System.Net.Sockets.NetworkStream.Read(Span`1 buffer)
   --- End of inner exception stack trace ---
   at System.Net.Sockets.NetworkStream.Read(Span`1 buffer)
   at System.Net.Sockets.NetworkStream.ReadByte()
   at System.IO.BinaryReader.Read7BitEncodedInt()
   at System.IO.BinaryReader.ReadString()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.NotifyDataAvailable()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TcpClientExtensions.MessageLoopAsync(TcpClient client, ICommunicationChannel channel, Action`1 errorHandler, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---.

@dave-yotta
Copy link

dave-yotta commented Apr 17, 2023

Full disclosure: The test host was being terminated due to out of memory (oomkiller probabbly). No idea why this suddenly started happening unless there's been a sudden change to the GC; but also made sure versions are pinned to same as when everything was passing, and the issue persists. So this one might be on us.

Edit: it came back in tests that I really really doubt are OOM.

Edit2: I shouldn't have doubted - they were OOM.

@ChrisMH
Copy link

ChrisMH commented Apr 21, 2023

I'm having exactly the same issue running in a BitBucket pipe line with:
.NET: 7.0.203
Microsoft.NET.Test.Sdk: 17.5.0

Viir added a commit to pine-vm/pine that referenced this issue May 24, 2023
Mainly to support the investigation of weird crashes of test runners on MacOS environments.
---
On MacOS, `dotnet  test` often crashed with an output like this:
--------------
The active test run was aborted. Reason: Test host process crashed : #
# Fatal error in , line 0
# Check failed: 12 == (*__error()).
#
#
#
#FailureMessage Object: 0x7000099fac00

Results File: /Users/runner/work/elm-time/elm-time/implement/test-elm-time/TestResults/_Mac-1684933830638_2023-05-24_13_11_43.trx

Test Run Aborted with error System.Exception: One or more errors occurred.
Passed!  - Failed:     0, Passed:     4, Skipped:     0, Total:     4, Duration: 4 s - test-elm-time.dll (net7.0)
 ---> System.Exception: Unable to read beyond the end of the stream.
   at System.IO.BinaryReader.Read7BitEncodedInt()
   at System.IO.BinaryReader.ReadString()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.NotifyDataAvailable()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TcpClientExtensions.MessageLoopAsync(TcpClient client, ICommunicationChannel channel, Action`1 errorHandler, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---.
Error: Process completed with exit code 1.
------------
See microsoft/vstest#4376 and microsoft/vstest#2952
@nieznanysprawiciel
Copy link

I had very similar issue to this one.
It turned out, that I was throwing exception in async void method. It seems that since Exceptions in async void methods are handled outside of caller context, than it broke dotnet test.

Changing method signature to async Task FuncName() solved the problem.

@tig
Copy link

tig commented Jul 8, 2024

Likely dupe of

#2952

Super frustrating that the vstest team does not seem to think the lack of diagnosabiliy here is a priority. I've wasted hours and hours on this.

@nohwnd
Copy link
Member

nohwnd commented Jul 9, 2024

Closing as duplicate of the above issue.

@nohwnd nohwnd closed this as not planned Won't fix, can't repro, duplicate, stale Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants