Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BinderTracingTest.ResolutionFlow times out #104670

Open
jakobbotsch opened this issue Jul 10, 2024 · 7 comments
Open

BinderTracingTest.ResolutionFlow times out #104670

jakobbotsch opened this issue Jul 10, 2024 · 7 comments
Assignees
Labels
arch-x86 area-Tracing-coreclr Known Build Error Use this to report build issues in the .NET Helix tab
Milestone

Comments

@jakobbotsch
Copy link
Member

jakobbotsch commented Jul 10, 2024

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=735589
Build error leg or test failing: Loader/binding/tracing/BinderTracingTest.ResolutionFlow/BinderTracingTest.ResolutionFlow.cmd
Pull request: #104603

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "",
  "ErrorPattern": "BinderTracingTest.ResolutionFlow.* Timed Out",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=735589
Error message validated: [BinderTracingTest.ResolutionFlow.* Timed Out]
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 7/10/2024 11:13:33 AM UTC

Report

Build Definition Test Pull Request
840745 dotnet/runtime Loader/binding/tracing/BinderTracingTest.ResolutionFlow/BinderTracingTest.ResolutionFlow.cmd #108757

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 1
@jakobbotsch jakobbotsch added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab labels Jul 10, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jul 10, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-infrastructure-libraries
See info in area-owners.md if you want to be subscribed.

@jakobbotsch
Copy link
Member Author

cc @davmason

@tommcdon tommcdon removed the untriaged New issue has not been triaged by the area owner label Jul 12, 2024
@tommcdon tommcdon added this to the 9.0.0 milestone Jul 12, 2024
@mdh1418
Copy link
Member

mdh1418 commented Jul 15, 2024

This seems to be a continuation of #97735 and #94390. Given that the BinderTracingTests are already skipped for Jitstress and GCStress, there is probably another culprit for these tests timing out.

It seems flakey given the low hit count, and that previous counterpart #97735 at one point had 0 hit counts in 30 days.

I haven't been able to repro the timeout locally, and given how flakey it is on CI, I'm not sure if I'd be able to reliably repro in CI. From this build instance it seems like this is causing the hang


based off of

0:000> !clrstack -f -all
OS Thread Id: 0x23ec
Child SP       IP Call Site
0097E0EC 77B3F3EC ntdll!NtWaitForMultipleObjects + 12
0097E284 72246A85 coreclr!Thread::DoAppropriateAptStateWait + 199
0097E308 72246FB3 coreclr!Thread::DoAppropriateWaitWorker + 998
0097E464 7224D4AD coreclr!`Thread::DoAppropriateWait'::`9'::__Body::Run + 90
0097E4B8 72246B3C coreclr!Thread::DoAppropriateWait + 149
0097E51C 722BEC16 coreclr!WaitHandleNative::CorWaitOneNative + 294
0097E560          [HelperMethodFrame: 0097e560] System.Private.CoreLib.dll!System.Threading.WaitHandle.WaitOneCore(IntPtr, Int32, Boolean)
0097E5F4 71022A5C System_Private_CoreLib!System.Boolean System.Threading.WaitHandle::WaitOneNoCheck(System.Int32, System.Boolean, System.Object, System.Diagnostics.Tracing.NativeRuntimeEventSource+WaitHandleWaitSourceMap)$##6003CB2 + 188
0097E5F8 71022A5C System.Private.CoreLib.dll!System.Threading.WaitHandle.WaitOneNoCheck(Int32, Boolean, System.Object, WaitHandleWaitSourceMap) + 188
0097E630 71022984 System_Private_CoreLib!System.Boolean System.Threading.WaitHandle::WaitOne(System.Int32)$##6003CB1 + 20
0097E63C 71022984 System.Private.CoreLib.dll!System.Threading.WaitHandle.WaitOne(Int32) + 20
0097E644 094C5B7B system.diagnostics.process.dll!System.Diagnostics.Process.WaitForExitCore(Int32) + 123
0097E67C 094C0800 BinderTracingTest.ResolutionFlow.dll!BinderTracingTests.BinderTracingTest.RunTestInSeparateProcess(System.Reflection.MethodInfo) + 816
0097E6DC 0817237E BinderTracingTest.ResolutionFlow.dll!BinderTracingTests.BinderTracingTest.RunAllTests() + 446
0097E704 0817204F BinderTracingTest.ResolutionFlow.dll!BinderTracingTests.BinderTracingTest.Main(System.String[]) + 39

From looking at what BinderTracingTests does, I'm not quite sure what is causing the separate subprocess to hang. Given how this test is still hanging even without GCStress/Jitstress, I would've expected for the test to hit the BinderEventListener's 30s timeout at

int waitTimeoutInMs = Environment.GetEnvironmentVariable("DOTNET_GCStress") == null
? 30 * 1000
if the test made it to
ValidateSingleBind(listener, expected.AssemblyName, expected);

@elinor-fung / @davmason any other ideas on what might be causing the hang?

@tommcdon tommcdon removed the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Jul 29, 2024
@mdh1418
Copy link
Member

mdh1418 commented Aug 1, 2024

762118 was from a PR that caused a deadlock, so it isn't the same cause as the first singluar hit in 735589

@mdh1418 mdh1418 modified the milestones: 9.0.0, 10.0.0 Aug 2, 2024
@jakobbotsch
Copy link
Member Author

@mdh1418
Copy link
Member

mdh1418 commented Aug 5, 2024

Hit in https://dev.azure.com/dnceng-public/public/_build/results?buildId=765128&view=results

Isn't that a jitstress pipeline? Why did the test get run there if it was marked JitOptimizationSensitive=true in #102842?

In any case, the latest failure console log shows a hang presumable when waiting for this test

public static BindOperation ApplicationAssemblies()
to finish running as a separate process (since the tests seem to be ran sequentially + FindInLoadContext_DefaultALC_IncompatibleVersion finished + the stack trace has system.diagnostics.process.dll!System.Diagnostics.Process.WaitForExitCore).

I guess the console.writelines from a subprocess don't actually get written immediately, given that we don't see either of these

Console.WriteLine($"[{DateTime.Now:T}] Launching process for {method.Name}...");
using (Process p = Process.Start(startInfo))
{
Console.WriteLine($"Started subprocess {p.Id} for {method.Name}...");
given that the test hangs afterwards.

Also the subprocess dump doesn't get generated even though we are expecting to create a dump for all processes related to corerun.

@hoyosjs any ideas on what to tweak to be able to capture the dump for the hanging subprocess in this case?

@jakobbotsch
Copy link
Member Author

Hit in https://dev.azure.com/dnceng-public/public/_build/results?buildId=765128&view=results

Isn't that a jitstress pipeline? Why did the test get run there if it was marked JitOptimizationSensitive=true in #102842?

The jitstress pipelines runs tests under many different configurations. This particular configuration does not set any of the "jitstress" environment variables, it only sets the following:

set DOTNET_TieredCompilation=0
set DOTNET_EnableAVX=0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-x86 area-Tracing-coreclr Known Build Error Use this to report build issues in the .NET Helix tab
Projects
None yet
Development

No branches or pull requests

3 participants