[Perf] Linux/arm64: Regressions in System.Collections.Concurrent.IsEmpty<Int32> #88483

performanceautofiler · 2023-07-06T08:00:58Z

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 20.04
Queue	AmpereUbuntu
Baseline	cf50b7d86a1456e9aad09ddeafd5ff8701f9afdc
Compare	bba7a9c67e5b6728833bedbb368f732fb499bcc1
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Concurrent.IsEmpty<Int32>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
Queue - Duration of single invocation	9.82 ns	14.76 ns	1.50	0.18	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Concurrent.IsEmpty&lt;Int32&gt;*'

Payloads

Baseline
Compare

System.Collections.Concurrent.IsEmpty<Int32>.Queue(Size: 0)

ETL Files

Histogram

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 14.755232225143695 > 10.287517983976874.
IsChangePoint: Marked as a change because one of 5/19/2023 4:22:12 AM, 6/28/2023 11:03:41 PM, 7/5/2023 5:05:14 PM falls between 6/27/2023 3:42:18 AM and 7/5/2023 5:05:14 PM.
IsRegressionStdDev: Marked as regression because -25.250357327274703 (T) = (0 -14.96177257450248) / Math.Sqrt((0.19790267175372064 / (10)) + (0.2506987029667144 / (13))) is less than -2.079613844716807 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (10) + (13) - 2, .025) and -0.5006118702171165 = (9.970447969559064 - 14.96177257450248) / 9.970447969559064 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

The text was updated successfully, but these errors were encountered:

ghost · 2023-07-06T16:38:13Z

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 20.04
Queue	AmpereUbuntu
Baseline	cf50b7d86a1456e9aad09ddeafd5ff8701f9afdc
Compare	bba7a9c67e5b6728833bedbb368f732fb499bcc1
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Concurrent.IsEmpty<Int32>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
Queue - Duration of single invocation	9.82 ns	14.76 ns	1.50	0.18	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Concurrent.IsEmpty&lt;Int32&gt;*'

Payloads

Baseline
Compare

System.Collections.Concurrent.IsEmpty<Int32>.Queue(Size: 0)

ETL Files

Histogram

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 14.755232225143695 > 10.287517983976874.
IsChangePoint: Marked as a change because one of 5/19/2023 4:22:12 AM, 6/28/2023 11:03:41 PM, 7/5/2023 5:05:14 PM falls between 6/27/2023 3:42:18 AM and 7/5/2023 5:05:14 PM.
IsRegressionStdDev: Marked as regression because -25.250357327274703 (T) = (0 -14.96177257450248) / Math.Sqrt((0.19790267175372064 / (10)) + (0.2506987029667144 / (13))) is less than -2.079613844716807 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (10) + (13) - 2, .025) and -0.5006118702171165 = (9.970447969559064 - 14.96177257450248) / 9.970447969559064 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author:	performanceautofiler[bot]
Assignees:	-
Labels:	`arch-arm64`, `area-System.Collections`, `os-linux`, `runtime-coreclr`
Milestone:	-

EgorBo · 2023-07-06T16:38:54Z

Either Physical promotion (cc @jakobbotsch) or #88073 cc @MichalPetryka

EgorBo · 2023-07-06T16:42:35Z

Same on win-arm64: dotnet/perf-autofiling-issues#19571

MichalPetryka · 2023-07-06T17:18:06Z

Either Physical promotion (cc @jakobbotsch) or #88073 cc @MichalPetryka

#88073 has diffs here but they seem to be improvements, I don't think it's that.

eiriktsarpalis · 2023-07-25T12:15:54Z

Looking at the diff it's not evident to me what might have caused this. Given that I don't have the hardware to run a git-bisect and current performance seems to be on par with 2022 runs, I'd be inclined to close this. Transferring to the codegen team for a second opinion.

ghost · 2023-07-25T12:16:11Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 20.04
Queue	AmpereUbuntu
Baseline	cf50b7d86a1456e9aad09ddeafd5ff8701f9afdc
Compare	bba7a9c67e5b6728833bedbb368f732fb499bcc1
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Concurrent.IsEmpty<Int32>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
Queue - Duration of single invocation	9.82 ns	14.76 ns	1.50	0.18	False

Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Concurrent.IsEmpty&lt;Int32&gt;*'

Payloads

Baseline
Compare

System.Collections.Concurrent.IsEmpty<Int32>.Queue(Size: 0)

ETL Files

Histogram

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 14.755232225143695 > 10.287517983976874.
IsChangePoint: Marked as a change because one of 5/19/2023 4:22:12 AM, 6/28/2023 11:03:41 PM, 7/5/2023 5:05:14 PM falls between 6/27/2023 3:42:18 AM and 7/5/2023 5:05:14 PM.
IsRegressionStdDev: Marked as regression because -25.250357327274703 (T) = (0 -14.96177257450248) / Math.Sqrt((0.19790267175372064 / (10)) + (0.2506987029667144 / (13))) is less than -2.079613844716807 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (10) + (13) - 2, .025) and -0.5006118702171165 = (9.970447969559064 - 14.96177257450248) / 9.970447969559064 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author:	performanceautofiler[bot]
Assignees:	-
Labels:	`arch-arm64`, `os-linux`, `tenet-performance`, `tenet-performance-benchmarks`, `area-CodeGen-coreclr`, `runtime-coreclr`
Milestone:	8.0.0

JulieLeeMSFT · 2023-07-28T16:19:30Z

@jakobbotsch PTAL.

Either Physical promotion (cc @jakobbotsch) or #88073 cc @MichalPetryka

jakobbotsch · 2023-07-31T19:12:35Z

Seems like #88073 causes the JIT to make different inlining decisions:

-2 ldfld or stfld over arguments which are structs.  Multiplier increased to 1.
-Inline candidate has 1 foldable branches.  Multiplier increased to 5.
-Inline candidate callsite is boring.  Multiplier increased to 6.3.
-Inline has 2 backward jumps (loops?).  Multiplier decreased to 4.41.
-calleeNativeSizeEstimate=687
-callsiteNativeSizeEstimate=145
-benefit multiplier=4.41
-threshold=639
-Native estimate for function size exceeds threshold for inlining 68.7 > 63.9 (multiplier = 4.41)
+2 ldfld or stfld over arguments which are structs.  Multiplier increased to 1.
+Inline candidate has 1 foldable branches.  Multiplier increased to 5.
+Inline has 2 intrinsics.  Multiplier increased to 6.6.
+Inline candidate callsite is boring.  Multiplier increased to 7.9.
+Inline has 2 backward jumps (loops?).  Multiplier decreased to 5.53.
+calleeNativeSizeEstimate=687
+callsiteNativeSizeEstimate=145
+benefit multiplier=5.53
+threshold=801
+Native estimate for function size is within threshold for inlining 68.7 <= 80.1 (multiplier = 5.53)

We end up with:

**************** Inline Tree

 Inlines into 06001A9D [via ExtendedDefaultPolicy] System.Collections.Concurrent.IsEmpty`1[int]:Queue():bool:this:
   [INL01 IL=0006 TR=000003 06007FE2] [INLINED: callee: below ALWAYS_INLINE size] System.Collections.Concurrent.ConcurrentQueue`1[int]:get_IsEmpty():bool:this
-    [INL00 IL=0004 TR=000009 06007FF1] [FAILED: call site: unprofitable inline] System.Collections.Concurrent.ConcurrentQueue`1[int]:TryPeek(byref,bool):bool:this
+    [INL02 IL=0004 TR=000009 06007FF1] [INLINED: call site: profitable inline] System.Collections.Concurrent.ConcurrentQueue`1[int]:TryPeek(byref,bool):bool:this
+      [INL00 IL=0024 TR=000028 06007FFE] [FAILED: callee: too many il bytes] System.Collections.Concurrent.ConcurrentQueueSegment`1[int]:TryPeek(byref,bool):bool:this

The interesting thing is that when we compile System.Collections.Concurrent.ConcurrentQueue`1[int]:TryPeek(byref,bool):bool:this in the baseline, we do end up inlining System.Collections.Concurrent.ConcurrentQueueSegment`1[int]:TryPeek(byref,bool):bool:this:

Inlines into 06007FF1 [via ExtendedDefaultPolicy] System.Collections.Concurrent.ConcurrentQueue`1[int]:TryPeek(byref,bool):bool:this:
  [INL01 IL=0015 TR=000006 06003822] [INLINED: callee: below ALWAYS_INLINE size] System.Threading.Volatile:Read[System.__Canon](byref):System.__Canon
  [INL02 IL=0024 TR=000013 06007FFE] [INLINED: call site: profitable inline] System.Collections.Concurrent.ConcurrentQueueSegment`1[int]:TryPeek(byref,bool):bool:this
    [INL03 IL=0041 TR=000055 06003810] [INLINED: callee: below ALWAYS_INLINE size] System.Threading.Volatile:Read(byref):int
    [INL04 IL=0068 TR=000068 06003810] [INLINED: callee: below ALWAYS_INLINE size] System.Threading.Volatile:Read(byref):int
    [INL05 IL=0146 TR=000092 06003810] [INLINED: callee: below ALWAYS_INLINE size] System.Threading.Volatile:Read(byref):int
    [INL06 IL=0167 TR=000116 06007FFB] [INLINED: callee: below ALWAYS_INLINE size] System.Collections.Concurrent.ConcurrentQueueSegment`1[int]:get_FreezeOffset():int:this
    [INL00 IL=0190 TR=000113 06003705] [FAILED: call site: unprofitable inline] System.Threading.SpinWait:SpinOnce(int):this
  [INL07 IL=0046 TR=000024 06003822] [INLINED: callee: below ALWAYS_INLINE size] System.Threading.Volatile:Read[System.__Canon](byref):System.__Canon

The result is the following codegen from my RPi (without RCPC support): https://www.diffchecker.com/xlM2tLdm/

Note that I see not regression when running the benchmark on my RPi.

Changing the inliner is not possible at this point, but it also looks like the benchmarks are improving and almost back to their own level with #70794, so I'm going to close this.

performanceautofiler bot added arch-arm64 os-linux Linux OS (any supported distro) runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Jul 6, 2023

EgorBo removed the untriaged New issue has not been triaged by the area owner label Jul 6, 2023

EgorBo changed the title ~~[Perf] Linux/arm64: 1 Regression on 6/29/2023 8:27:03 AM~~ [Perf] Linux/arm64: Regressions in System.Collections.Concurrent.IsEmpty<Int32> Jul 6, 2023

EgorBo transferred this issue from dotnet/perf-autofiling-issues Jul 6, 2023

dotnet-issue-labeler bot added the area-System.Collections label Jul 6, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Jul 6, 2023

EgorBo added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels Jul 6, 2023

eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Jul 13, 2023

eiriktsarpalis added this to the 8.0.0 milestone Jul 13, 2023

eiriktsarpalis added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Collections labels Jul 25, 2023

JulieLeeMSFT assigned jakobbotsch Jul 28, 2023

jakobbotsch closed this as completed Jul 31, 2023

ghost locked as resolved and limited conversation to collaborators Aug 31, 2023

[Perf] Linux/arm64: Regressions in System.Collections.Concurrent.IsEmpty<Int32> #88483

[Perf] Linux/arm64: Regressions in System.Collections.Concurrent.IsEmpty<Int32> #88483

Comments

performanceautofiler bot commented Jul 6, 2023

Run Information

Regressions in System.Collections.Concurrent.IsEmpty<Int32>

Repro

Payloads

System.Collections.Concurrent.IsEmpty<Int32>.Queue(Size: 0)

ETL Files

Histogram

Description of detection logic

JIT Disasms

Docs

ghost commented Jul 6, 2023

Run Information

Regressions in System.Collections.Concurrent.IsEmpty<Int32>

Repro

Payloads

System.Collections.Concurrent.IsEmpty<Int32>.Queue(Size: 0)

ETL Files

Histogram

Description of detection logic

JIT Disasms

Docs

EgorBo commented Jul 6, 2023

EgorBo commented Jul 6, 2023

MichalPetryka commented Jul 6, 2023

eiriktsarpalis commented Jul 25, 2023

ghost commented Jul 25, 2023

Run Information

Regressions in System.Collections.Concurrent.IsEmpty<Int32>

Repro

Payloads

System.Collections.Concurrent.IsEmpty<Int32>.Queue(Size: 0)

ETL Files

Histogram

Description of detection logic

JIT Disasms

Docs

JulieLeeMSFT commented Jul 28, 2023

jakobbotsch commented Jul 31, 2023