Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VSTest stops process execution early #1900

Closed
ViktorHofer opened this issue Jan 24, 2019 · 17 comments
Closed

VSTest stops process execution early #1900

ViktorHofer opened this issue Jan 24, 2019 · 17 comments
Labels

Comments

@ViktorHofer
Copy link
Member

Coverlet is a code coverage tool which collects hit counts in methods and writes the result either into a file or memory. The result writing is done in the ProcessExit event handler of the current AppDomain. It seems that when using dotnet test, the testing framework doesn't wait until the AppDomain's ProcessExit execution is finished and terminates the test process early which leads to no coverage numbers.

coverlet-coverage/coverlet#210 (comment)

Can you please let me know which code path in vstest is responsible for shutting down the current process and explain why it doesn't wait until the process is correctly terminated (ProcessExit chain finished).

@ViktorHofer
Copy link
Member Author

cc @sharwell

@mayankbansal018
Copy link
Contributor

@ViktorHofer is this the FullCLR coverage scenario? I assumed so, since you are pointing us to wait for AppDomain Shutdown.

However test executor(testhost.exe) waits for test adapters to finish executing tests, once that is over, we send "ExecutionComplete" Message to vstest.console, on receiving this message vstest, send a message to testhost process, to exit normally, which is handled by testhost here, following this the process exits normally.

@ViktorHofer
Copy link
Member Author

is this the FullCLR coverage scenario? I assumed so, since you are pointing us to wait for AppDomain Shutdown.

The question applies to both fullclr and coreclr but I'm more interested in the .NET Core part here. CoreCLR still has an OnProcessExit method which invokes ProcessExit event handlers. When using vstest they don't complete and stop after about half a second. There is a simple repro to test this:

  1. dotnet new mstest
  2. Paste code from Spurious 'Unable to read beyond end of stream' in CI while collecting results coverlet-coverage/coverlet#210 (comment)
  3. Open in VS and debug the Test1 test method in VS Test Explorer.
  4. Set a breakpoint in the ProcessExit method.

You will notice that the break point is hit for ~500ms and then lost and the debugging stops. Thanks for providing a link to the respective code. I will take a look in the meantime till you have time to try my sample out.

@mayankbansal018
Copy link
Contributor

@ViktorHofer few more code pointers, vstest.console sends request to testhost to shutdown, code at.
However if testhost doesn't shut down within 100ms(as the execution is completed, we expect it to shutdown fast). vstest.console forcefully kills the process.

You can debug the process, by following https://github.com/Microsoft/vstest-docs/blob/master/docs/diagnose.md#debug-test-platform-components

@sharwell
Copy link
Member

@mayankbansal018 Great! This timeout is causing serious problems for teams using code coverage in CI scenarios, where machines may not be fast (most recently dotnet/machinelearning). How can we disable it?

@mayankbansal018
Copy link
Contributor

@sharwell , @ViktorHofer we still don't understand that why the timeout has to be increased. Ideally Coverlet integration with vstest should happen as datacollector. I'm not sure how it is today, but if it is as datacollector, then coverlet would get sufficient time to write whatever data it needs to log, & vstest would then try to close the testhost process appropriately.

@sharwell
Copy link
Member

@mayankbansal018 coverlet isn't specific to unit tests and can't have any dependencies outside mscorlib. It works for many kinds of application, and makes minimal assumptions that assemblies will behave according to runtime rules. VSTest made its own rules for application lifetime to become the unique case where coverlet does not work.

@sharwell
Copy link
Member

I guess the short form is VSTest made an assumption about test execution lifetime and used that assumption to claim that terminating a process would not have side effects. Coverlet is proof that the assumption was not correct, so the forced termination is not correct in its current form and needs to be changed or removed.

@mayankbansal018
Copy link
Contributor

@sharwell I agree that VSTest has made that assumption on when to terminate process, but from VSTest perspective, it has no other way to close the session. VSTest sends a message to testhost to close the session, it waits for 100ms, & then kill the process, if it fails to exit. The reason it does so is because it has no way of knowing whether the testhost is genuinely doing something, or is simply in a stuck state, due to unknown reasons.

The way I understand, coverlet will gather data only in context of current test session, which means that when the actual test execution has completed coverlet should have gathered all the data it needed to for those tests, which is why we have a datacollector infra. A datacollector remains active for entire test session, meaning that even if actual test(s) have executed, a datacollector can still complete it's own tasks, for e.g. if it needs to do some cleanups.

/cc @cltshivash

@ViktorHofer
Copy link
Member Author

In .NET Framework, the total execution time of all ProcessExit event handlers is limited, just as the total execution time of all finalizers is limited at process shutdown. The default is two seconds. An unmanaged host can change this execution time by calling the ICLRPolicyManager::SetTimeout method with the OPR_ProcessExit enumeration value. This time limit does not exist in .NET Core.

(https://docs.microsoft.com/en-us/dotnet/api/system.appdomain.processexit?view=netframework-4.7.2). As this limitation has been lifted from .NET Core, it's unfortunate that now VSTest applies a timeout.

The way I understand, coverlet will gather data only in context of current test session, which means that when the actual test execution has completed coverlet should have gathered all the data it needed to for those tests, which is why we have a datacollector infra.

Coverlet injects code in test assemblies which 1) count code hits and 2) writes the number of hits into a temporary location (memory, file IO, ...) so that the host process coverlet can use that information to create its reports. As 2) happens when the test assembly's process is requested to terminate it hits the limitation applied by VSTest. The injected code for 2) can't have any dependencies besides mscorlib/System.Private.CoreLib as that would break coverage for other repositories like corefx & coreclr.

The reason it does so is because it has no way of knowing whether the testhost is genuinely doing something, or is simply in a stuck state, due to unknown reasons.

Why does the process need to be terminated from the outside at all? Isn't requesting a termination enough so that the process will stop itself? If it's related to an attached debugger, can't the debugger just disconnect and let the process finish cleaning up?

We need a way to disable that artificial limitation, preferably without any user interaction (i.e. removing the timeout entirely as interfering with an application's life cycle is a don't) or with a configuration (i.e. RunSettings or command line argument).

@mayankbansal018
Copy link
Contributor

As 2) happens when the test assembly's process is requested to terminate it hits the limitation applied by VSTest

Why can't this happen asynchronously? Why does coverlet needs to hold entire data in memory, & only start writing when the tests have executed, it can record this data as we do in dynamic code coverage via a separate process. Also if it needs to wait for test execution to complete, it would be great to have coverlet integration as datacollector , where it will receive explicit event that test session has completed, & it can complete the relevant tasks.

Why does the process need to be terminated from the outside at all? Isn't requesting a termination enough so that the process will stop itself?

This is an extremely common scenario we have seen where users code, leaves the testhost process in a state where it cannot kill itself, if we don't kill the process explicitly then in such cases we will leave the dotnet(testhost.dll) process alive, which will have user's assemblies loaded, preventing any subsequent build requested by user, till the process is explicitly killed by user.
This can be a far too common scenario is LiveUnitTesting where the code is constantly built in the background, as user is changing it, & we keep on running test against it. If we leave processes lingering around it would horribly break this scenario.
Another example is UWP testing on remote machines, where the App normally doesn't shuts-down, & we have to forcefully terminate such apps, or we tend to leave machines with multiple apps running.

@ViktorHofer I would suggest that we have a discussion on this internally, please set up a meeting, & I'll add additional folks from test platform side, so that we can resolve this w/o breaking other scenarios.

/cc @cltshivash , @singhsarab

@mayankbansal018
Copy link
Contributor

Any update on this?

@ViktorHofer
Copy link
Member Author

@sharwell submitted a WIP PR: coverlet-coverage/coverlet#329. He still needs to register the data-collector with the RunSettings file.

@sharwell
Copy link
Member

sharwell commented Mar 9, 2019

At this point I'm leaning towards rewriting the assembly on disk to replace the 'ldc.i4 100' with 'ldc.i4.m1' followed by enough 'nop' instructions that the code doesn't change size. I just need this to work and this feels like an achievable means to an end.

@sharwell
Copy link
Member

Rewriting the binary in-place appears to have resolved the issue: dotnet/machinelearning#2843

@tremblaysimon
Copy link

@sharwell, I don't figure out what do you mean by rewriting the binary in-place and the pull request you referenced. Is it only the SingleHit property you added?

Does it fix coverlet-coverage/coverlet#110?

@cltshivash
Copy link
Contributor

The recommendation for this ask has been to use the inproc data collector model.

@microsoft microsoft locked as resolved and limited conversation to collaborators Mar 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants