Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure System.Diagnostics.Tests.ProcessTests.TotalProcessorTime_PerformLoop_TotalProcessorTimeValid #86739

Closed
v-wenyuxu opened this issue May 25, 2023 · 15 comments · Fixed by #94254
Labels
arch-arm64 area-System.Diagnostics.Process JitStress CLR JIT issues involving JIT internal stress modes os-linux Linux OS (any supported distro)
Milestone

Comments

@v-wenyuxu
Copy link

Failed in: runtime-coreclr libraries-jitstress 20230524.1

Failed tests:

net8.0-linux-Release-arm64-CoreCLR_checked-jitstress1_tiered-(Ubuntu.1804.Arm64.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm64v8

- System.Diagnostics.Tests.ProcessTests.TotalProcessorTime_PerformLoop_TotalProcessorTimeValid

Error message:

Assert.InRange() Failure
Range:  (0 - 1)
Actual: 1.0463972542536049

Stack trace:

   at System.Diagnostics.Tests.ProcessTests.TotalProcessorTime_PerformLoop_TotalProcessorTimeValid() in /_/src/libraries/System.Diagnostics.Process/tests/ProcessTests.cs:line 910
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr) in /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodInvoker.cs:line 59
@v-wenyuxu v-wenyuxu added arch-arm64 os-linux Linux OS (any supported distro) JitStress CLR JIT issues involving JIT internal stress modes labels May 25, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label May 25, 2023
@ghost
Copy link

ghost commented May 25, 2023

Tagging subscribers to this area: @dotnet/area-system-diagnostics-process
See info in area-owners.md if you want to be subscribed.

Issue Details

Failed in: runtime-coreclr libraries-jitstress 20230524.1

Failed tests:

net8.0-linux-Release-arm64-CoreCLR_checked-jitstress1_tiered-(Ubuntu.1804.Arm64.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm64v8

- System.Diagnostics.Tests.ProcessTests.TotalProcessorTime_PerformLoop_TotalProcessorTimeValid

Error message:

Assert.InRange() Failure
Range:  (0 - 1)
Actual: 1.0463972542536049

Stack trace:

   at System.Diagnostics.Tests.ProcessTests.TotalProcessorTime_PerformLoop_TotalProcessorTimeValid() in /_/src/libraries/System.Diagnostics.Process/tests/ProcessTests.cs:line 910
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr) in /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodInvoker.cs:line 59
Author: v-wenyuxu
Assignees: -
Labels:

arch-arm64, area-System.Diagnostics.Process, os-linux, JitStress

Milestone: -

@adamsitnik adamsitnik added this to the Future milestone May 25, 2023
@adamsitnik adamsitnik added good first issue Issue should be easy to implement, good for first-time contributors help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels May 25, 2023
@adamsitnik
Copy link
Member

adamsitnik commented May 25, 2023

The person who is willing to work on this should verify in what scenario the asserted value could be more than 1 and either fix the test or find underlying bug.

TimeSpan processorTimeAfterSpin = Process.GetCurrentProcess().TotalProcessorTime;
DateTime endTime = DateTime.UtcNow;
double timeDiff = (endTime - startTime).TotalMilliseconds;
double cpuTimeDiff = (processorTimeAfterSpin - processorTimeBeforeSpin).TotalMilliseconds;
double cpuUsage = cpuTimeDiff / (timeDiff * Environment.ProcessorCount);
Assert.InRange(cpuUsage, 0, 1);

@jakobbotsch
Copy link
Member

jakobbotsch commented May 25, 2023

The loop in that function may also go through OSR making it significantly slower (and consuming a lot more CPU), and given the high average load on test machines it does not surprise me that the test is flakey. There's a few more failures recently: https://runfo.azurewebsites.net/search/tests/?q=started%3A%7E7+name%3ATotalProcessorTimeValid

Maybe the test should just be deleted, or it should retry a few times before failing if the result is above the expected range.

@jakobbotsch
Copy link
Member

The comment on the loop talks about 1 second of CPU time, yet the assert seems to try to validate that we expended less than 1 millisecond of CPU time. So maybe that's just what should be fixed? With OSR it's quite expected that this may take more than 1 ms since once it hits the threshold it will involve a full rejitting of the method.

@adamsitnik
Copy link
Member

We could also just assert that the time is more than 0?

@danmoseley
Copy link
Member

I cannot see how this can happen, unless it's some glitchy data in the /proc/./stat file. I will add logging.

@jakobbotsch
Copy link
Member

@danmoseley what makes you say this has to be glitchy data? The failures show that the CPU time is just barely more than 1 millisecond – not that much given that a full rejitting may have occurred.

@danmoseley
Copy link
Member

Am I misreading..cpu time is the numerator here, and the result is unexpectedly high. It means cpu time was high?

@danmoseley
Copy link
Member

If you merge my PR we will discover..

@jakobbotsch
Copy link
Member

Am I misreading..cpu time is the numerator here, and the result is unexpectedly high. It means cpu time was high?

No, I was misreading it...

@danmoseley
Copy link
Member

@adamsitnik adamsitnik removed good first issue Issue should be easy to implement, good for first-time contributors help wanted [up-for-grabs] Good issue for external contributors labels May 29, 2023
@danmoseley
Copy link
Member

Rather disappointed that it hasn't happened again yet.

@danmoseley
Copy link
Member

Nothing
https://runfo.azurewebsites.net/search/tests/?q=started%3A%7E14+name%3ATotalProcessorTimeValid+result%3Afailed+targetBranch%3Amain

I will close and if it happens again we will know more.

@ghost ghost locked as resolved and limited conversation to collaborators Jul 21, 2023
@jakobbotsch
Copy link
Member

jakobbotsch commented Oct 31, 2023

We hit this in superpmi-collect: https://helixri107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-heads-main-e4ba4e2b24654b5292/System.Diagnostics.Process.Tests/1/console.f3cae831.log?sv=2021-08-06&se=2024-01-27T17%3A55%3A37Z&sr=c&sp=rl&sig=cEizO0GVk3rmTXFRQ0nozUzyNav4Ro9rpqRZSgQNYik%3D

    System.Diagnostics.Tests.ProcessTests.TotalProcessorTime_PerformLoop_TotalProcessorTimeValid [FAIL]
      Assertion failed. 1.1390753259881705 is not in range [0,1]. proc time before:51296.875 proc time after:51484.375 timeDiff:82.3036 cpuTimeDiff:187.5 Environment.ProcessorCount:2
      Stack Trace:
        /_/src/libraries/System.Diagnostics.Process/tests/ProcessTests.cs(922,0): at System.Diagnostics.Tests.ProcessTests.TotalProcessorTime_PerformLoop_TotalProcessorTimeValid()
           at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
        /_/src/coreclr/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.CoreCLR.cs(36,0): at System.Reflection.MethodBaseInvoker.InterpretedInvoke_Method(Object obj, IntPtr* args)
        /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs(57,0): at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)

Is it possible that using DateTime.UtcNow here is the problem? IIRC, it might have bad resolution (up to 16 ms) in some cases? If we add 16ms to the timeDiff then the ratio becomes 0.95. Maybe we should switch to Stopwatch?

@JulieLeeMSFT JulieLeeMSFT reopened this Oct 31, 2023
@danmoseley
Copy link
Member

@jakobbotsch Could be - whatever the cause it is probably reasonable to just make the wait longer. Do you want to make that change?

jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue Nov 1, 2023
Also double the number of iterations (it was executing in about 80 ms
before on an x86 CI run).

Fix dotnet#86739
(presumably, it fails very rarely, so let's reopen if it happens again)
jozkee pushed a commit that referenced this issue Nov 1, 2023
…id (#94254)

Also double the number of iterations (it was executing in about 80 ms
before on an x86 CI run).

Fix #86739
(presumably, it fails very rarely, so let's reopen if it happens again)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-System.Diagnostics.Process JitStress CLR JIT issues involving JIT internal stress modes os-linux Linux OS (any supported distro)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants