Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure BasicEventSourceTests.TestsWrite.Test_Write_T_ETW #91769

Closed
v-wenyuxu opened this issue Sep 8, 2023 · 13 comments · Fixed by #94324
Closed

Test failure BasicEventSourceTests.TestsWrite.Test_Write_T_ETW #91769

v-wenyuxu opened this issue Sep 8, 2023 · 13 comments · Fixed by #94324
Assignees
Labels
arch-arm64 area-System.Diagnostics.Tracing JitStress CLR JIT issues involving JIT internal stress modes os-windows
Milestone

Comments

@v-wenyuxu
Copy link

Failed in: runtime-coreclr libraries-jitstress 20230907.1

Failed tests:

net9.0-windows-Release-arm64-CoreCLR_checked-jitminopts-Windows.11.Arm64.Open
    - BasicEventSourceTests.TestsWrite.Test_Write_T_ETW

Error message:

System.Runtime.InteropServices.COMException : The instance name passed was not recognized as valid by a WMI data provider. (0x80071069)

Stack trace:

   at System.Runtime.InteropServices.Marshal.ThrowExceptionForHR(Int32 errorCode) in /_/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/Marshal.cs:line 849
   at Microsoft.Diagnostics.Tracing.Session.TraceEventSession.Flush()
   at BasicEventSourceTests.EtwListener.Dispose() in /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/Harness/EtwListener.cs:line 97
   at BasicEventSourceTests.TestsWrite.Test_Write_T_ETW() in /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/TestsWrite.Etw.cs:line 27
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InterpretedInvoke_Method(Object obj, IntPtr* args) in /_/src/coreclr/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.CoreCLR.cs:line 36
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr) in /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs:line 57
@v-wenyuxu v-wenyuxu added arch-arm64 os-windows JitStress CLR JIT issues involving JIT internal stress modes labels Sep 8, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Sep 8, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 8, 2023
@ghost
Copy link

ghost commented Sep 8, 2023

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

Failed in: runtime-coreclr libraries-jitstress 20230907.1

Failed tests:

net9.0-windows-Release-arm64-CoreCLR_checked-jitminopts-Windows.11.Arm64.Open
    - BasicEventSourceTests.TestsWrite.Test_Write_T_ETW

Error message:

System.Runtime.InteropServices.COMException : The instance name passed was not recognized as valid by a WMI data provider. (0x80071069)

Stack trace:

   at System.Runtime.InteropServices.Marshal.ThrowExceptionForHR(Int32 errorCode) in /_/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/Marshal.cs:line 849
   at Microsoft.Diagnostics.Tracing.Session.TraceEventSession.Flush()
   at BasicEventSourceTests.EtwListener.Dispose() in /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/Harness/EtwListener.cs:line 97
   at BasicEventSourceTests.TestsWrite.Test_Write_T_ETW() in /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/TestsWrite.Etw.cs:line 27
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InterpretedInvoke_Method(Object obj, IntPtr* args) in /_/src/coreclr/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.CoreCLR.cs:line 36
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr) in /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs:line 57
Author: v-wenyuxu
Assignees: -
Labels:

arch-arm64, os-windows, JitStress, area-Diagnostics-coreclr, untriaged, needs-area-label

Milestone: -

@tommcdon tommcdon added this to the 9.0.0 milestone Sep 8, 2023
@tommcdon tommcdon added area-System.Diagnostics.Tracing and removed untriaged New issue has not been triaged by the area owner area-Diagnostics-coreclr labels Sep 8, 2023
@ghost
Copy link

ghost commented Sep 8, 2023

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti
See info in area-owners.md if you want to be subscribed.

Issue Details

Failed in: runtime-coreclr libraries-jitstress 20230907.1

Failed tests:

net9.0-windows-Release-arm64-CoreCLR_checked-jitminopts-Windows.11.Arm64.Open
    - BasicEventSourceTests.TestsWrite.Test_Write_T_ETW

Error message:

System.Runtime.InteropServices.COMException : The instance name passed was not recognized as valid by a WMI data provider. (0x80071069)

Stack trace:

   at System.Runtime.InteropServices.Marshal.ThrowExceptionForHR(Int32 errorCode) in /_/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/Marshal.cs:line 849
   at Microsoft.Diagnostics.Tracing.Session.TraceEventSession.Flush()
   at BasicEventSourceTests.EtwListener.Dispose() in /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/Harness/EtwListener.cs:line 97
   at BasicEventSourceTests.TestsWrite.Test_Write_T_ETW() in /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/TestsWrite.Etw.cs:line 27
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InterpretedInvoke_Method(Object obj, IntPtr* args) in /_/src/coreclr/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.CoreCLR.cs:line 36
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr) in /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs:line 57
Author: v-wenyuxu
Assignees: -
Labels:

arch-arm64, area-System.Diagnostics.Tracing, os-windows, JitStress, area-Diagnostics-coreclr, needs-area-label

Milestone: 9.0.0

@tommcdon tommcdon removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Sep 8, 2023
@v-wenyuxu
Copy link
Author

Failed again in: runtime-coreclr libraries-jitstress 20230912.1

Failed tests:

net9.0-windows-Release-arm64-CoreCLR_checked-zapdisable-Windows.11.Arm64.Open
    - BasicEventSourceTests.TestsWrite.Test_Write_T_ETW

Error message:

System.Runtime.InteropServices.COMException : The instance name passed was not recognized as valid by a WMI data provider. (0x80071069)

Stack trace:

   at Microsoft.Diagnostics.Tracing.Session.TraceEventSession.Flush()
   at BasicEventSourceTests.EtwListener.Dispose() in /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/Harness/EtwListener.cs:line 97
   at BasicEventSourceTests.TestsWrite.Test_Write_T_ETW() in /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/TestsWrite.Etw.cs:line 27
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr) in /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs:line 57

@danmoseley
Copy link
Member

danmoseley commented Oct 10, 2023

I see two problems

  1. the tests in this assembly are expected to run entirely serially (for example, many do new EtwListener() which means they all use the same file:
        public EtwListener(string dataFileName = "EventSourceTestData.etl", string sessionName = "EventSourceTestSession")

for this reason they try hard to run serially, disabling parallelism 3 ways --

[assembly: CollectionBehavior(CollectionBehavior.CollectionPerAssembly, DisableTestParallelization = true, MaxParallelThreads = 1)]

this is apparently not working however

  Discovering: System.Diagnostics.Tracing.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Diagnostics.Tracing.Tests (found 45 of 49 test cases)
  Starting:    System.Diagnostics.Tracing.Tests (parallel test collections = on, max threads = 4) <<<<<<<<<<<<<

and that may be causing this

      System.Runtime.InteropServices.COMException : The instance name passed was not recognized as valid by a WMI data provider. (0x80071069)
      Stack Trace:
           at Microsoft.Diagnostics.Tracing.Session.TraceEventSession.Flush()
        /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/Harness/EtwListener.cs(97,0): at BasicEventSourceTests.EtwListener.Dispose()
        /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/TestsWrite.Etw.cs(27,0): at BasicEventSourceTests.TestsWrite.Test_Write_T_ETW()
           at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
        /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs(57,0): at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)

Question is -- why is parallelism happening

  1. several tests create TraceEventSessions, and those are not safe for parallel construction, because they have some static members, including
    https://github.com/microsoft/perfview/blob/main/src/TraceEvent/TraceEventSession.cs#L1724
    which is a SortedDictionary, and there is no locking to prevent it being populated by one thread while it's being read by another
    https://github.com/microsoft/perfview/blob/main/src/TraceEvent/TraceEventSession.cs#L1634
    SortedDictionary is not safe for concurrent read and write.

hence this output

Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
   at System.Collections.Generic.SortedSet`1.Node.get_Is4Node() in /_/src/libraries/System.Collections/src/System/Collections/Generic/SortedSet.cs:line 1622
   at System.Collections.Generic.SortedSet`1.AddIfNotPresent(T item) in /_/src/libraries/System.Collections/src/System/Collections/Generic/SortedSet.cs:line 339
   at System.Collections.Generic.TreeSet`1.AddIfNotPresent(T item) in /_/src/libraries/System.Collections/src/System/Collections/Generic/SortedDictionary.cs:line 901
   at Microsoft.Diagnostics.Tracing.Session.TraceEventSession.get_ProviderNameToGuid()
   at Microsoft.Diagnostics.Tracing.Session.TraceEventProviders.GetProviderGuidByName(String name)
   at Microsoft.Diagnostics.Tracing.Session.TraceEventSession.EnableProvider(String providerName, TraceEventLevel providerLevel, UInt64 matchAnyKeywords, TraceEventProviderOptions options)
   at BasicEventSourceTests.TestsManifestGeneration.<Test_EventSource_EtwManifestGenerationRollover>b__3_0() in /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/TestsManifestGeneration.Etw.cs:line 98

I first assumed the first issue is simply exposing the second. But these are happening in remote exec processes. Only one thing should be happening in the process. Not clear what is going on. But whatever is causing this to be exposed -- is it a bug that should be fixed in TraceEventSession? (in dotnet/perfview repo -- @brianrob )?

https://helixre107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-heads-main-c3a451dbf00f46df93/System.Diagnostics.Tracing.Tests/1/console.a62658f8.log?helixlogtype=result

@noahfalk @tarekgh

@tarekgh
Copy link
Member

tarekgh commented Oct 10, 2023

CC @davmason as it is EventSource issue.

@brianrob
Copy link
Member

My best guess is that the NullReferenceException is due to unprotected concurrent calls to TraceEventSession.ProviderNameToGuid. Once this fails, the session likely doesn't get fully created and calling Flush results in further failures because the underlying session wasn't actually created.

From what I can see, the fix is to address the unexpected concurrency. TraceEventSession, especially with the same session name and file, is not designed to be used concurrently and can result in all sorts of unexpected behaviors.

@danmoseley
Copy link
Member

TraceEventSession, especially with the same session name and file, is not designed to be used concurrently and can result in all sorts of unexpected behaviors.

@brianrob is constructing two TraceEventSession concurrently (with different file and name parameters) expected to be unsafe?

@brianrob
Copy link
Member

TraceEventSession, especially with the same session name and file, is not designed to be used concurrently and can result in all sorts of unexpected behaviors.

@brianrob is constructing two TraceEventSession concurrently (with different file and name parameters) expected to be unsafe?

This should be fine. The key is that the name and file parameters are used to create machine-wide constructs, and that's why it's a problem if they are the same (unless you're explicitly attaching to an existing session, but that's another story and not what you're doing here). If you wanted to run tests in parallel, you could change the session names and ETL file paths and that should be OK. You just don't want too much parallelism because each machine has a maximum number of concurrent sessions, and it's not difficult to hit that, especially for Microsoft machines.

@danmoseley
Copy link
Member

@brianrob the reason I'm asking is that it appears that is not safe eg., the static SortedDictionary can get written by two concurrent threads.

@brianrob
Copy link
Member

I don't think there is a strong statement around these APIs as being thread-safe, but many of them are. It feels to me like you should be able to call EnableProvider on two different sessions concurrently. I'll make a fix for this.

@brianrob
Copy link
Member

Posted microsoft/perfview#1934 to address this.

@danmoseley
Copy link
Member

OK, this issue is now tracking why our elaborate protections against any test concurrency in System.Diagnostics.Tracing.Tests are not doing the job...

@davmason
Copy link
Member

@danmoseley is there a better person to assign this to since we're now tracking test concurrency issues? I don't believe I'm the right person to investigate issues in the test runner

@davmason davmason modified the milestones: 9.0.0, Future Nov 2, 2023
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 2, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Nov 3, 2023
@v-wenyuxu v-wenyuxu reopened this Nov 10, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Dec 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-System.Diagnostics.Tracing JitStress CLR JIT issues involving JIT internal stress modes os-windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants