-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regressions in System.Collections.Tests.Perf_BitArray and System.Collections.IndexerSet<Int32> #66769
Comments
Most likely #66618 cc @AndyAyersMS |
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsRun Information
Regressions in System.Collections.Tests.Perf_BitArray
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.Tests.Perf_BitArray*' PayloadsHistogramSystem.Collections.Tests.Perf_BitArray.BitArraySetLengthShrink(Size: 512)
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository
Regressions in System.Collections.IndexerSet<Int32>
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Collections.IndexerSet<Int32>*' PayloadsHistogramSystem.Collections.IndexerSet<Int32>.Span(Size: 512)
Description of detection logic
DocsProfiling workflow for dotnet/runtime repository
|
Local repros (base is 63fd977, diff is cd88b84)
So [Edit: actually the original report was vs arm64; the data just above & assembly below are for x64. Not sure it matters, but I'll take a look at arm64 too]. |
All the time in
Both of these happen because there is some PGO data for bits and pieces of the loop body, and so we don't reweight the blocks. This leads to a failed weight check for alignment and a lower-weight CSE candidate. G_M47086_IG03: ;; offset=0070H G_M47086_IG03: ;; offset=0066H
4D8BCA mov r9, r10 4D8BC1 mov r8, r9
4C8BC7 mov r8, rdi 488BC7 mov rax, rdi
83FB04 cmp ebx, 4 83FB04 cmp ebx, 4
0F8C4A010000 jl G_M47086_IG12 0F8C4B010000 jl G_M47086_IG12
458B00 mov r8d, dword ptr [r8] 8B00 mov eax, dword ptr [rax]
413B5108 cmp edx, dword ptr [r9+8] 413B5008 cmp edx, dword ptr [r8+8]
0F834E010000 jae G_M47086_IG14 0F8352010000 jae G_M47086_IG14 ;; JCC
448BDA mov r11d, edx 448BD2 mov r10d, edx
4789449910 mov dword ptr [r9+4*r11+16], r8d 4389449010 mov dword ptr [r8+4*r10+16], eax
83FB04 cmp ebx, 4 83FB04 cmp ebx, 4
0F8237010000 jb G_M47086_IG13 0F823A010000 jb G_M47086_IG13
83C3FC add ebx, -4 4883C704 add rdi, 4
4883C704 add rdi, 4 83C3FC add ebx, -4
FFC2 inc edx FFC2 inc edx
4C63C2 movsxd r8, edx 4863C2 movsxd rax, edx
448BC1 mov r8d, ecx ;; CSE
4C3BC0 cmp r8, rax 493BC0 cmp rax, r8
7CC2 jl SHORT G_M47086_IG03 7CC0 jl SHORT G_M47086_IG03 Also interesting (but common to both versions) is that we don't hoist the array length fetch. |
This is really an instance of a more general problem with the way the jit handles methods with partial PGO data. I don't think it is possible to fix this without some major revisions to this whole area, and that seems out of scope for .NET 7. In particular:
All this needs to be rethought; in a mixed PGO method the incoming PGO data should be used to provide exit probabilities and we need a general algorithm to synthesize the same for non-PGO blocks and then from there deduce consistent block and edge counts. Related work:
Going to move this to future. |
Local arm64 data for same pair of commits as above. Not showing any serious regression.
Disassembly shows similar artifacts as x64 -- no alignment, extra mov ;; BASE
G_M47086_IG03: ;; offset=0080H
AA0603E5 mov x5, x6
AA0103E4 mov x4, x1
710012BF cmp w21, #4
54000B6B blt G_M47086_IG13
B9400084 ldr w4, [x4]
B94008A7 ldr w7, [x5,#8]
6B07005F cmp w2, w7
54000B82 bhs G_M47086_IG15
; ............................... 32B boundary ...............................
910040A5 add x5, x5, #16
B82258A4 str w4, [x5, w2, UXTW #2]
710012BF cmp w21, #4
54000AC3 blo G_M47086_IG14
510012B5 sub w21, w21, #4
91001021 add x1, x1, #4
11000442 add w2, w2, #1
93407C44 sxtw x4, w2
; ............................... 32B boundary ...............................
EB03009F cmp x4, x3
54FFFDEB blt G_M47086_IG03
;; DIFF
G_M47086_IG03: ;; offset=0078H
AA0503E4 mov x4, x5
AA0103E3 mov x3, x1
; ............................... 32B boundary ...............................
710012BF cmp w21, #4
54000D6B blt G_M47086_IG12
B9400063 ldr w3, [x3]
B9400886 ldr w6, [x4,#8]
6B06005F cmp w2, w6
54000E82 bhs G_M47086_IG14
91004084 add x4, x4, #16
B8225883 str w3, [x4, w2, UXTW #2]
; ............................... 32B boundary ...............................
710012BF cmp w21, #4
54000D43 blo G_M47086_IG13
91001021 add x1, x1, #4
510012B5 sub w21, w21, #4
11000442 add w2, w2, #1
93407C43 sxtw x3, w2
2A0003E4 mov w4, w0 ;; CSE
EB04007F cmp x3, x4
; ............................... 32B boundary ...............................
54FFFDCB blt G_M47086_IG03 |
Run Information
Regressions in System.Collections.Tests.Perf_BitArray
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Collections.Tests.Perf_BitArray.BitArraySetLengthShrink(Size: 512)
Description of detection logic
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
Regressions in System.Collections.IndexerSet<Int32>
Test Report
Repro
Payloads
Baseline
Compare
Histogram
System.Collections.IndexerSet<Int32>.Span(Size: 512)
Description of detection logic
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
category:performance
theme:benchmarks
The text was updated successfully, but these errors were encountered: