Regressions in System.Collections.Tests.Perf_BitArray and System.Collections.IndexerSet<Int32> #66769

performanceautofiler · 2022-03-17T05:59:24Z

Run Information

Architecture	arm64
OS	Windows 10.0.19041
Baseline	63fd977f7d57e0c0e434caecfff0dd0a07d2f27f
Compare	466b0f66aafe63a31696c16dca43a727cd4bbce5
Diff	Diff

Regressions in System.Collections.Tests.Perf_BitArray

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
BitArraySetLengthShrink - Duration of single invocation	237.51 ns	257.46 ns	1.08	0.00	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.Tests.Perf_BitArray*'

Payloads

Baseline
Compare

Histogram

System.Collections.Tests.Perf_BitArray.BitArraySetLengthShrink(Size: 512)

Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 257.45578083352854 > 250.07199194800694.
IsChangePoint: Marked as a change because one of 3/15/2022 12:53:39 AM, 3/17/2022 12:08:34 AM falls between 3/8/2022 5:51:58 AM and 3/17/2022 12:08:34 AM.
IsRegressionStdDev: Marked as regression because -11.78109322334737 (T) = (0 -258.6891193534734) / Math.Sqrt((8.595811911146967 / (36)) + (13.040673937399216 / (6))) is less than -2.0210753903043583 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (36) + (6) - 2, .025) and -0.07611578445829842 = (240.391529507853 - 258.6891193534734) / 240.391529507853 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

### Run Information

Architecture	arm64
OS	Windows 10.0.19041
Baseline	63fd977f7d57e0c0e434caecfff0dd0a07d2f27f
Compare	466b0f66aafe63a31696c16dca43a727cd4bbce5
Diff	Diff

Regressions in System.Collections.IndexerSet<Int32>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
Span - Duration of single invocation	177.00 ns	187.44 ns	1.06	0.07	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.IndexerSet&lt;Int32&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.IndexerSet<Int32>.Span(Size: 512)

Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 187.44322490494397 > 180.31185587212346.
IsChangePoint: Marked as a change because one of 3/15/2022 12:53:39 AM, 3/17/2022 12:08:34 AM falls between 3/8/2022 5:51:58 AM and 3/17/2022 12:08:34 AM.
IsRegressionStdDev: Marked as regression because -17.407749272978624 (T) = (0 -188.7349199381894) / Math.Sqrt((6.275661946704857 / (36)) + (4.068162474412199 / (6))) is less than -2.0210753903043583 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (36) + (6) - 2, .025) and -0.0930788703937596 = (172.66358819121757 - 188.7349199381894) / 172.66358819121757 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

category:performance
theme:benchmarks

The text was updated successfully, but these errors were encountered:

EgorBo · 2022-03-17T16:33:51Z

Most likely #66618 cc @AndyAyersMS

ghost · 2022-03-18T16:56:37Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture	arm64
OS	Windows 10.0.19041
Baseline	63fd977f7d57e0c0e434caecfff0dd0a07d2f27f
Compare	466b0f66aafe63a31696c16dca43a727cd4bbce5
Diff	Diff

Regressions in System.Collections.Tests.Perf_BitArray

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
BitArraySetLengthShrink - Duration of single invocation	237.51 ns	257.46 ns	1.08	0.00	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.Tests.Perf_BitArray*'

Payloads

Baseline
Compare

Histogram

System.Collections.Tests.Perf_BitArray.BitArraySetLengthShrink(Size: 512)

Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 257.45578083352854 > 250.07199194800694.
IsChangePoint: Marked as a change because one of 3/15/2022 12:53:39 AM, 3/17/2022 12:08:34 AM falls between 3/8/2022 5:51:58 AM and 3/17/2022 12:08:34 AM.
IsRegressionStdDev: Marked as regression because -11.78109322334737 (T) = (0 -258.6891193534734) / Math.Sqrt((8.595811911146967 / (36)) + (13.040673937399216 / (6))) is less than -2.0210753903043583 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (36) + (6) - 2, .025) and -0.07611578445829842 = (240.391529507853 - 258.6891193534734) / 240.391529507853 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

### Run Information

Architecture	arm64
OS	Windows 10.0.19041
Baseline	63fd977f7d57e0c0e434caecfff0dd0a07d2f27f
Compare	466b0f66aafe63a31696c16dca43a727cd4bbce5
Diff	Diff

Regressions in System.Collections.IndexerSet<Int32>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
Span - Duration of single invocation	177.00 ns	187.44 ns	1.06	0.07	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.IndexerSet&lt;Int32&gt;*'

Payloads

Baseline
Compare

Histogram

System.Collections.IndexerSet<Int32>.Span(Size: 512)

Description of detection logic

IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 187.44322490494397 > 180.31185587212346.
IsChangePoint: Marked as a change because one of 3/15/2022 12:53:39 AM, 3/17/2022 12:08:34 AM falls between 3/8/2022 5:51:58 AM and 3/17/2022 12:08:34 AM.
IsRegressionStdDev: Marked as regression because -17.407749272978624 (T) = (0 -188.7349199381894) / Math.Sqrt((6.275661946704857 / (36)) + (4.068162474412199 / (6))) is less than -2.0210753903043583 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (36) + (6) - 2, .025) and -0.0930788703937596 = (172.66358819121757 - 188.7349199381894) / 172.66358819121757 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author:	performanceautofiler[bot]
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `untriaged`, `refs/heads/main`, `RunKind=micro`, `Windows 10.0.19041`, `Regression`, `CoreClr`, `arm64`
Milestone:	-

AndyAyersMS · 2022-05-02T18:17:14Z

Both of these have persisted since

AndyAyersMS · 2022-06-29T01:16:16Z

Local repros (base is 63fd977, diff is cd88b84)

Method	Job	Toolchain	Size	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Gen 0	Allocated	Alloc Ratio
BitArraySetLengthShrink	Job-PUHVHZ	\base-rel\corerun.exe	512	138.76 ns	2.033 ns	1.901 ns	138.91 ns	135.67 ns	141.87 ns	1.00	0.00	0.0904	568 B	1.00
BitArraySetLengthShrink	Job-JSGEQE	\diff-rel\corerun.exe	512	217.63 ns	2.341 ns	2.076 ns	216.98 ns	214.36 ns	222.05 ns	1.57	0.03	0.0902	568 B	1.00

Method	Job	Toolchain	Size	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Allocated	Alloc Ratio
Span	Job-JEUKJG	\base-rel\corerun.exe	512	148.4 ns	1.23 ns	1.09 ns	148.3 ns	146.1 ns	150.1 ns	1.00	0.00	-	NA
Span	Job-EXQWVI	\diff-rel\corerun.exe	512	149.2 ns	2.84 ns	2.66 ns	148.9 ns	146.1 ns	154.8 ns	1.00	0.02	-	NA

So Span seems ok locally.

[Edit: actually the original report was vs arm64; the data just above & assembly below are for x64. Not sure it matters, but I'll take a look at arm64 too].

AndyAyersMS · 2022-06-29T15:32:38Z

All the time in BitArraySetLengthShrink is in BitArray..ctor. This is dominated by its loop. There are two codegen diffs in the loop:

we no longer try and align the loop top (and so the jae ends up crossing a 32 byte boundary)
we miss a CSE

Both of these happen because there is some PGO data for bits and pieces of the loop body, and so we don't reweight the blocks. This leads to a failed weight check for alignment and a lower-weight CSE candidate.

G_M47086_IG03:  ;; offset=0070H                             G_M47086_IG03: ;; offset=0066H

4D8BCA          mov      r9, r10                            4D8BC1         mov      r8, r9
4C8BC7          mov      r8, rdi                            488BC7         mov      rax, rdi
83FB04          cmp      ebx, 4                             83FB04         cmp      ebx, 4
0F8C4A010000    jl       G_M47086_IG12                      0F8C4B010000   jl       G_M47086_IG12

458B00          mov      r8d, dword ptr [r8]                8B00           mov      eax, dword ptr [rax]
413B5108        cmp      edx, dword ptr [r9+8]              413B5008       cmp      edx, dword ptr [r8+8]
0F834E010000    jae      G_M47086_IG14                      0F8352010000   jae      G_M47086_IG14          ;; JCC

448BDA          mov      r11d, edx                          448BD2         mov      r10d, edx
4789449910      mov      dword ptr [r9+4*r11+16], r8d       4389449010     mov      dword ptr [r8+4*r10+16], eax
83FB04          cmp      ebx, 4                             83FB04         cmp      ebx, 4
0F8237010000    jb       G_M47086_IG13                      0F823A010000   jb       G_M47086_IG13

83C3FC          add      ebx, -4                            4883C704       add      rdi, 4
4883C704        add      rdi, 4                             83C3FC         add      ebx, -4
FFC2            inc      edx                                FFC2           inc      edx
4C63C2          movsxd   r8, edx                            4863C2         movsxd   rax, edx
                                                            448BC1         mov      r8d, ecx              ;; CSE
4C3BC0          cmp      r8, rax                            493BC0         cmp      rax, r8
7CC2            jl       SHORT G_M47086_IG03                7CC0           jl       SHORT G_M47086_IG03

Also interesting (but common to both versions) is that we don't hoist the array length fetch.

AndyAyersMS · 2022-06-29T16:51:35Z

This is really an instance of a more general problem with the way the jit handles methods with partial PGO data. I don't think it is possible to fix this without some major revisions to this whole area, and that seems out of scope for .NET 7.

In particular:

when we combine two blocks if either is profiled we use the max weight for the combined block. In a partially profiled method this causes the effective "range" of PGO to grow. In particular if we have a non-pgo loop with a bit of pgo inside, the pgo range can grow to encompass the loop and prevent loop scale-up (more or less what happens in the above).
when we do general scale-down we scale down blocks with PGO
when we do loop scale-up we won't scale up blocks with PGO

All this needs to be rethought; in a mixed PGO method the incoming PGO data should be used to provide exit probabilities and we need a general algorithm to synthesize the same for non-PGO blocks and then from there deduce consistent block and edge counts.

Related work:

optSetBlockWeights should not scale blocks that have PGO weights #66678
Dynamic PGO #43618 (a number of the sub items)

Going to move this to future.

AndyAyersMS · 2022-06-29T21:39:33Z

Local arm64 data for same pair of commits as above. Not showing any serious regression.

Method	Job	Toolchain	Size	Mean	Error	StdDev	Median	Min	Max	Ratio	Gen 0	Allocated	Alloc Ratio
BitArraySetLengthShrink	Job-CCSRPV	\base-rel\corerun.exe	512	135.7 ns	0.94 ns	0.88 ns	135.4 ns	134.4 ns	137.4 ns	1.00	0.1355	568 B	1.00
BitArraySetLengthShrink	Job-UOKGVV	\diff-rel\corerun.exe	512	139.5 ns	0.54 ns	0.48 ns	139.5 ns	138.7 ns	140.2 ns	1.03	0.1353	568 B	1.00

Disassembly shows similar artifacts as x64 -- no alignment, extra mov

;; BASE

G_M47086_IG03:              ;; offset=0080H
        AA0603E5          mov     x5, x6
        AA0103E4          mov     x4, x1
        710012BF          cmp     w21, #4
        54000B6B          blt     G_M47086_IG13
        B9400084          ldr     w4, [x4]
        B94008A7          ldr     w7, [x5,#8]
        6B07005F          cmp     w2, w7
        54000B82          bhs     G_M47086_IG15
; ............................... 32B boundary ...............................
        910040A5          add     x5, x5, #16
        B82258A4          str     w4, [x5, w2, UXTW #2]
        710012BF          cmp     w21, #4
        54000AC3          blo     G_M47086_IG14
        510012B5          sub     w21, w21, #4
        91001021          add     x1, x1, #4
        11000442          add     w2, w2, #1
        93407C44          sxtw    x4, w2
; ............................... 32B boundary ...............................
        EB03009F          cmp     x4, x3
        54FFFDEB          blt     G_M47086_IG03

;; DIFF
G_M47086_IG03:              ;; offset=0078H
        AA0503E4          mov     x4, x5
        AA0103E3          mov     x3, x1
; ............................... 32B boundary ...............................
        710012BF          cmp     w21, #4
        54000D6B          blt     G_M47086_IG12
        B9400063          ldr     w3, [x3]
        B9400886          ldr     w6, [x4,#8]
        6B06005F          cmp     w2, w6
        54000E82          bhs     G_M47086_IG14
        91004084          add     x4, x4, #16
        B8225883          str     w3, [x4, w2, UXTW #2]
; ............................... 32B boundary ...............................
        710012BF          cmp     w21, #4
        54000D43          blo     G_M47086_IG13
        91001021          add     x1, x1, #4
        510012B5          sub     w21, w21, #4
        11000442          add     w2, w2, #1
        93407C43          sxtw    x3, w2
        2A0003E4          mov     w4, w0    ;; CSE
        EB04007F          cmp     x3, x4
; ............................... 32B boundary ...............................
        54FFFDCB          blt     G_M47086_IG03

AndyAyersMS · 2022-11-01T15:02:36Z

BitArray seems to have recovered since the above

but not IndexerSet

AndyAyersMS · 2023-05-16T03:02:33Z

Both have since improved and are as fast or faster than they were when this report was filed.

BitArray possibly from #81095, IndexerSet from #85275.

performanceautofiler bot added arm64 untriaged New issue has not been triaged by the area owner labels Mar 17, 2022

EgorBo changed the title ~~[Perf] Changes at 3/15/2022 4:36:24 AM~~ Regressions in System.Collections.Tests.Perf_BitArray and System.Collections.IndexerSet<Int32> Mar 17, 2022

EgorBo transferred this issue from dotnet/perf-autofiling-issues Mar 17, 2022

jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 18, 2022

JulieLeeMSFT assigned AndyAyersMS Mar 21, 2022

JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Mar 21, 2022

JulieLeeMSFT added this to the 7.0.0 milestone Mar 21, 2022

am11 added arch-arm64 and removed arm64 labels Mar 30, 2022

AndyAyersMS modified the milestones: 7.0.0, Future Jun 29, 2022

JulieLeeMSFT modified the milestones: Future, 8.0.0 Oct 5, 2022

jeffhandley added os-windows runtime-coreclr specific to the CoreCLR runtime and removed Windows 10.0.19041 labels Dec 28, 2022

jeffhandley removed CoreClr labels Dec 28, 2022

BruceForstall removed the os-windows label Jan 23, 2023

AndyAyersMS closed this as completed May 16, 2023

ghost locked as resolved and limited conversation to collaborators Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regressions in System.Collections.Tests.Perf_BitArray and System.Collections.IndexerSet<Int32> #66769

Regressions in System.Collections.Tests.Perf_BitArray and System.Collections.IndexerSet<Int32> #66769

performanceautofiler bot commented Mar 17, 2022 •

edited by BruceForstall

Loading

Payloads

Histogram

System.Collections.Tests.Perf_BitArray.BitArraySetLengthShrink(Size: 512)

Description of detection logic

Docs

Payloads

Histogram

System.Collections.IndexerSet<Int32>.Span(Size: 512)

Description of detection logic

Docs

EgorBo commented Mar 17, 2022

ghost commented Mar 18, 2022

Run Information

Regressions in System.Collections.Tests.Perf_BitArray

Repro

Payloads

Histogram

System.Collections.Tests.Perf_BitArray.BitArraySetLengthShrink(Size: 512)

Description of detection logic

Docs

Regressions in System.Collections.IndexerSet<Int32>

Repro

Payloads

Histogram

System.Collections.IndexerSet<Int32>.Span(Size: 512)

Description of detection logic

Docs

AndyAyersMS commented May 2, 2022

AndyAyersMS commented Jun 29, 2022 •

edited

Loading

AndyAyersMS commented Jun 29, 2022 •

edited

Loading

AndyAyersMS commented Jun 29, 2022

AndyAyersMS commented Jun 29, 2022 •

edited

Loading

AndyAyersMS commented Nov 1, 2022

AndyAyersMS commented May 16, 2023

Regressions in System.Collections.Tests.Perf_BitArray and System.Collections.IndexerSet<Int32> #66769

Regressions in System.Collections.Tests.Perf_BitArray and System.Collections.IndexerSet<Int32> #66769

Comments

performanceautofiler bot commented Mar 17, 2022 • edited by BruceForstall Loading

Run Information

Regressions in System.Collections.Tests.Perf_BitArray

Repro

Payloads

Histogram

System.Collections.Tests.Perf_BitArray.BitArraySetLengthShrink(Size: 512)

Description of detection logic

Docs

Regressions in System.Collections.IndexerSet<Int32>

Repro

Payloads

Histogram

System.Collections.IndexerSet<Int32>.Span(Size: 512)

Description of detection logic

Docs

EgorBo commented Mar 17, 2022

ghost commented Mar 18, 2022

Run Information

Regressions in System.Collections.Tests.Perf_BitArray

Repro

Payloads

Histogram

System.Collections.Tests.Perf_BitArray.BitArraySetLengthShrink(Size: 512)

Description of detection logic

Docs

Regressions in System.Collections.IndexerSet<Int32>

Repro

Payloads

Histogram

System.Collections.IndexerSet<Int32>.Span(Size: 512)

Description of detection logic

Docs

AndyAyersMS commented May 2, 2022

AndyAyersMS commented Jun 29, 2022 • edited Loading

AndyAyersMS commented Jun 29, 2022 • edited Loading

AndyAyersMS commented Jun 29, 2022

AndyAyersMS commented Jun 29, 2022 • edited Loading

AndyAyersMS commented Nov 1, 2022

AndyAyersMS commented May 16, 2023

performanceautofiler bot commented Mar 17, 2022 •

edited by BruceForstall

Loading

AndyAyersMS commented Jun 29, 2022 •

edited

Loading

AndyAyersMS commented Jun 29, 2022 •

edited

Loading

AndyAyersMS commented Jun 29, 2022 •

edited

Loading