-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extra stloc and ldloc in switch statement compared to if statement #10343
Labels
area-CodeGen-coreclr
CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
JitUntriaged
CLR JIT issues needing additional triage
optimization
Milestone
Comments
@dotnet/jit-contrib |
We see this stloc/ldloc pattern elsewhere fairly frequently. If there is just that one load (which we can check for in our up-front IL scan) we should just optimize this pattern to a nop and forward the value to the consumer. This is a fairly cheap and safe form of forward sub (see #4655). |
This was fixed at some point... 7.0 codegen below. but sharplab shows the same for 6.0. ; Assembly listing for method X:If():int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 this [V00,T01] ( 3, 3 ) ref -> rcx this class-hnd single-def
; V01 OutArgs [V01 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V02 tmp1 [V02,T00] ( 3, 6 ) ref -> rax single-def "arr expr"
;
; Lcl frame size = 40
G_M50435_IG01: ;; offset=0000H
4883EC28 sub rsp, 40
;; bbWeight=1 PerfScore 0.25
G_M50435_IG02: ;; offset=0004H
488B4108 mov rax, gword ptr [rcx+8]
83780802 cmp dword ptr [rax+8], 2
7617 jbe SHORT G_M50435_IG07
8078126C cmp byte ptr [rax+18], 108
750A jne SHORT G_M50435_IG05
;; bbWeight=1 PerfScore 10.00
G_M50435_IG03: ;; offset=0014H
B801000000 mov eax, 1
;; bbWeight=0.50 PerfScore 0.12
G_M50435_IG04: ;; offset=0019H
4883C428 add rsp, 40
C3 ret
;; bbWeight=0.50 PerfScore 0.62
G_M50435_IG05: ;; offset=001EH
33C0 xor eax, eax
;; bbWeight=0.50 PerfScore 0.12
G_M50435_IG06: ;; offset=0020H
4883C428 add rsp, 40
C3 ret
;; bbWeight=0.50 PerfScore 0.62
G_M50435_IG07: ;; offset=0025H
E8A6B6015F call CORINFO_HELP_RNGCHKFAIL
CC int3
;; bbWeight=0 PerfScore 0.00
; Total bytes of code 43, prolog size 4, PerfScore 16.05, instruction count 14, allocated bytes for code 43 (MethodHash=94d33afc) for method X:If():int:this
; Assembly listing for method X:Switch():int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 this [V00,T01] ( 3, 3 ) ref -> rcx this class-hnd single-def
; V01 OutArgs [V01 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V02 tmp1 [V02,T00] ( 3, 6 ) ref -> rax single-def "arr expr"
;
; Lcl frame size = 40
G_M11134_IG01: ;; offset=0000H
4883EC28 sub rsp, 40
;; bbWeight=1 PerfScore 0.25
G_M11134_IG02: ;; offset=0004H
488B4108 mov rax, gword ptr [rcx+8]
83780802 cmp dword ptr [rax+8], 2
7617 jbe SHORT G_M11134_IG07
8078126C cmp byte ptr [rax+18], 108
750A jne SHORT G_M11134_IG05
;; bbWeight=1 PerfScore 10.00
G_M11134_IG03: ;; offset=0014H
B801000000 mov eax, 1
;; bbWeight=0.50 PerfScore 0.12
G_M11134_IG04: ;; offset=0019H
4883C428 add rsp, 40
C3 ret
;; bbWeight=0.50 PerfScore 0.62
G_M11134_IG05: ;; offset=001EH
33C0 xor eax, eax
;; bbWeight=0.50 PerfScore 0.12
G_M11134_IG06: ;; offset=0020H
4883C428 add rsp, 40
C3 ret
;; bbWeight=0.50 PerfScore 0.62
G_M11134_IG07: ;; offset=0025H
E846B6015F call CORINFO_HELP_RNGCHKFAIL
CC int3
;; bbWeight=0 PerfScore 0.00
; Total bytes of code 43, prolog size 4, PerfScore 16.05, instruction count 14, allocated bytes for code 43 (MethodHash=0ec8d481) for method X:Switch():int:this |
ghost
locked as resolved and limited conversation to collaborators
Feb 12, 2022
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
area-CodeGen-coreclr
CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
JitUntriaged
CLR JIT issues needing additional triage
optimization
I noticed an extra stloc and ldloc in the following switch statement compared to the equivalent if statement (this is just a shortened version of a longer and nested statement).
The il and disassembly looks pretty much the same except for the
stloc.0
andldloc.0
in the switch version, which adds an extramov
to a register and then compares the register against the value, instead of doing a direct memory location compare against the value (like in the if version).I looked for other bugs similar to this, but couldn't find anything specific.
The disassembly produced by benchmark.net looks like the following (including source and il annotation):
If
Switch
The text was updated successfully, but these errors were encountered: