-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EgorBot for AndyAyersMS in #109209 #137
Comments
Benchmark results on
|
Method | Toolchain | Mean | Error | Ratio |
---|---|---|---|---|
Test | Main | 1.025 ns | 0.0006 ns | 1.00 |
Test | PR | 1.082 ns | 0.0007 ns | 1.06 |
Profile for Bench_Test
:
Flame graphs: Main vs PR 🔥
Speedscope: Main vs PR
Hot asm: Main vs PR
Hot functions: Main vs PR
Counters: Main vs PR
cc @AndyAyersMS (logs) |
Benchmark results on
|
Method | Toolchain | Mean | Error | Ratio |
---|---|---|---|---|
Test | Main | 4.0075 ns | 0.0021 ns | 1.00 |
Test | PR | 0.6158 ns | 0.0013 ns | 0.15 |
Profile for Bench_Test
:
Flame graphs: Main vs PR 🔥
Speedscope: Main vs PR
Hot asm: Main vs PR
Hot functions: Main vs PR
Counters: Main vs PR
cc @AndyAyersMS (logs) |
@AndyAyersMS not sure why perf is the same on arm64, perhaps GDV check is expensive on arm64?, JitDisasm.asm output (see BDN_Artifacts.zip) does show that PR has a different codegen: Main: ; Assembly listing for method Bench:TestInner(System.Collections.Generic.ICollection`1[System.String]):int:this (Tier1)
; Emitting BLENDED_CODE for generic ARM64 - Unix
; Tier1 code
; optimized code
; fp based frame
; partially interruptible
G_M000_IG01: ;; offset=0x0000
stp fp, lr, [sp, #-0x10]!
mov fp, sp
G_M000_IG02: ;; offset=0x0008
mov x0, x1
movz x11, #0x5B0
; ............................... 32B boundary ...............................
movk x11, #0xB805 LSL #16
movk x11, #0xFAC8 LSL #32
ldr xip0, [x11]
blr xip0
G_M000_IG03: ;; offset=0x0020
ldp fp, lr, [sp], #0x10
ret lr
; Total bytes of code 40 PR: ; Assembly listing for method Bench:TestInner(System.Collections.Generic.ICollection`1[System.String]):int:this (Tier1)
; Emitting BLENDED_CODE for generic ARM64 - Unix
; Tier1 code
; optimized code
; fp based frame
; partially interruptible
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
G_M000_IG01: ;; offset=0x0000
stp fp, lr, [sp, #-0x10]!
mov fp, sp
G_M000_IG02: ;; offset=0x0008
ldr x0, [x1]
movz x11, #0x9088
; ............................... 32B boundary ...............................
movk x11, #0x1D1C LSL #16
movk x11, #0xE000 LSL #32
cmp x0, x11
bne G_M000_IG05
G_M000_IG03: ;; offset=0x0020
ldr w0, [x1, #0x08]
G_M000_IG04: ;; offset=0x0024
ldp fp, lr, [sp], #0x10
ret lr
G_M000_IG05: ;; offset=0x002C
mov x0, x1
; ............................... 32B boundary ...............................
movz x11, #0x5B0
movk x11, #0x1C02 LSL #16
movk x11, #0xE000 LSL #32
ldr xip0, [x11]
blr xip0
b G_M000_IG04
; Total bytes of code 72
|
x64: Main: ; Assembly listing for method Bench:TestInner(System.Collections.Generic.ICollection`1[System.String]):int:this (Tier1)
; Emitting BLENDED_CODE for X64 with AVX512 - Unix
; Tier1 code
; optimized code
; rbp based frame
; partially interruptible
G_M000_IG01: ;; offset=0x0000
push rbp
mov rbp, rsp
G_M000_IG02: ;; offset=0x0004
mov rdi, rsi
mov r11, 0x79AD0B0605B0
call [r11]System.Collections.Generic.ICollection`1[System.__Canon]:get_Count():int:this
nop
G_M000_IG03: ;; offset=0x0015
pop rbp
ret
; Total bytes of code 23 PR: ; Assembly listing for method Bench:TestInner(System.Collections.Generic.ICollection`1[System.String]):int:this (Tier1)
; Emitting BLENDED_CODE for X64 with AVX512 - Unix
; Tier1 code
; optimized code
; rbp based frame
; partially interruptible
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
G_M000_IG01: ;; offset=0x0000
push rbp
mov rbp, rsp
G_M000_IG02: ;; offset=0x0004
mov rdi, 0x7714F98998B0
cmp qword ptr [rsi], rdi
jne SHORT G_M000_IG05
G_M000_IG03: ;; offset=0x0013
mov eax, dword ptr [rsi+0x08]
G_M000_IG04: ;; offset=0x0016
pop rbp
ret
G_M000_IG05: ;; offset=0x0018
mov rdi, rsi
mov r11, 0x7714F88505B0
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (mov: 5) 32B boundary ...............................
call [r11]System.Collections.Generic.ICollection`1[System.__Canon]:get_Count():int:this
jmp SHORT G_M000_IG04
; Total bytes of code 42 |
Yeah, seems like it might be the cost of forming the constant for the type. Also interesting that we can't tail call ... need to investigate that. With the advent of CET/CFG tail calling is probably becoming more valuable than it used to be (one less return anyways). |
Processing dotnet/runtime#109209 (comment) command:
Command
-intel -arm64 -profiler --envvars DOTNET_JitDisasm:TestInner
(EgorBot will reply in this issue)
The text was updated successfully, but these errors were encountered: