Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra stloc and ldloc in switch statement compared to if statement #10343

Closed
Tornhoof opened this issue May 17, 2018 · 3 comments
Closed

Extra stloc and ldloc in switch statement compared to if statement #10343

Tornhoof opened this issue May 17, 2018 · 3 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitUntriaged CLR JIT issues needing additional triage optimization
Milestone

Comments

@Tornhoof
Copy link
Contributor

I noticed an extra stloc and ldloc in the following switch statement compared to the equivalent if statement (this is just a shortened version of a longer and nested statement).
The il and disassembly looks pretty much the same except for the stloc.0 and ldloc.0 in the switch version, which adds an extra mov to a register and then compares the register against the value, instead of doing a direct memory location compare against the value (like in the if version).
I looked for other bugs similar to this, but couldn't find anything specific.

The disassembly produced by benchmark.net looks like the following (including source and il annotation):
If

00007ffe`f18327b0 ConsoleApp23.LookupBenchmark.If()
            if (Bytes[2] == (byte)'l')
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
IL_0000: ldsfld System.Byte[] ConsoleApp23.LookupBenchmark::Bytes
IL_0005: ldc.i4.2
IL_0006: ldelem.u1
IL_0007: ldc.i4.s 108
IL_0009: bne.un.s IL_000d
00007ffe`f18327b4 48b900f98df1fe7f0000 mov rcx,7FFEF18DF900h
00007ffe`f18327be ba06000000      mov     edx,6
00007ffe`f18327c3 e828e4ab5f      call    coreclr!coreclr_shutdown_2+0x39680 (00007fff`512f0bf0)
00007ffe`f18327c8 48b8386d0010a9020000 mov rax,2A910006D38h
00007ffe`f18327d2 488b00          mov     rax,qword ptr [rax]
00007ffe`f18327d5 83780802        cmp     dword ptr [rax+8],2
00007ffe`f18327d9 7617            jbe     00007ffe`f18327f2
00007ffe`f18327db 8078126c        cmp     byte ptr [rax+12h],6Ch
00007ffe`f18327df 750a            jne     00007ffe`f18327eb
                return 1;
                ^^^^^^^^^
IL_000b: ldc.i4.1
IL_000c: ret
00007ffe`f18327e1 b801000000      mov     eax,1
00007ffe`f18327e6 4883c428        add     rsp,28h
00007ffe`f18327ea c3              ret
            return 0;
            ^^^^^^^^^
IL_000d: ldc.i4.0
IL_000e: ret
00007ffe`f18327eb 33c0            xor     eax,eax

Switch

00007ffe`f18327b0 ConsoleApp23.LookupBenchmark.Switch()
            switch (Bytes[2])
            ^^^^^^^^^^^^^^^^^
IL_0000: ldsfld System.Byte[] ConsoleApp23.LookupBenchmark::Bytes
IL_0005: ldc.i4.2
IL_0006: ldelem.u1
IL_0007: stloc.0
00007ffe`f18327b4 48b900f98df1fe7f0000 mov rcx,7FFEF18DF900h
00007ffe`f18327be ba06000000      mov     edx,6
00007ffe`f18327c3 e828e4ab5f      call    coreclr!coreclr_shutdown_2+0x39680 (00007fff`512f0bf0)
00007ffe`f18327c8 48b8386d265871020000 mov rax,27158266D38h
00007ffe`f18327d2 488b00          mov     rax,qword ptr [rax]
00007ffe`f18327d5 83780802        cmp     dword ptr [rax+8],2
00007ffe`f18327d9 761a            jbe     00007ffe`f18327f5
00007ffe`f18327db 0fb64012        movzx   eax,byte ptr [rax+12h]
                    return 1;
                    ^^^^^^^^^
IL_0008: ldloc.0
IL_0009: ldc.i4.s 108
IL_000b: bne.un.s IL_000f
00007ffe`f18327df 83f86c          cmp     eax,6Ch
00007ffe`f18327e2 750a            jne     00007ffe`f18327ee
IL_000d: ldc.i4.1
IL_000e: ret
00007ffe`f18327e4 b801000000      mov     eax,1
00007ffe`f18327e9 4883c428        add     rsp,28h
00007ffe`f18327ed c3              ret
            return 0;
            ^^^^^^^^^
IL_000f: ldc.i4.0
IL_0010: ret
00007ffe`f18327ee 33c0            xor     eax,eax
private static readonly byte[] Bytes = Encoding.UTF8.GetBytes("Hello");

[Benchmark]
public int If()
{
    if (Bytes[2] == (byte)'l')
    {
        return 1;
    }

    return 0;
}

[Benchmark]
public int Switch()
{
    switch (Bytes[2])
    {
        case (byte)'l':
        {
            return 1;
        }
    }

    return 0;
}
category:cq theme:basic-cq skill-level:beginner cost:small
@RussKeldorph
Copy link
Contributor

@dotnet/jit-contrib

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@AndyAyersMS
Copy link
Member

We see this stloc/ldloc pattern elsewhere fairly frequently. If there is just that one load (which we can check for in our up-front IL scan) we should just optimize this pattern to a nop and forward the value to the consumer.

This is a fairly cheap and safe form of forward sub (see #4655).

@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@AndyAyersMS
Copy link
Member

This was fixed at some point... 7.0 codegen below. but sharplab shows the same for 6.0.

; Assembly listing for method X:If():int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;  V00 this         [V00,T01] (  3,  3   )     ref  ->  rcx         this class-hnd single-def
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;  V02 tmp1         [V02,T00] (  3,  6   )     ref  ->  rax         single-def "arr expr"
;
; Lcl frame size = 40

G_M50435_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
						;; bbWeight=1    PerfScore 0.25
G_M50435_IG02:              ;; offset=0004H
       488B4108             mov      rax, gword ptr [rcx+8]
       83780802             cmp      dword ptr [rax+8], 2
       7617                 jbe      SHORT G_M50435_IG07
       8078126C             cmp      byte  ptr [rax+18], 108
       750A                 jne      SHORT G_M50435_IG05
						;; bbWeight=1    PerfScore 10.00
G_M50435_IG03:              ;; offset=0014H
       B801000000           mov      eax, 1
						;; bbWeight=0.50 PerfScore 0.12
G_M50435_IG04:              ;; offset=0019H
       4883C428             add      rsp, 40
       C3                   ret      
						;; bbWeight=0.50 PerfScore 0.62
G_M50435_IG05:              ;; offset=001EH
       33C0                 xor      eax, eax
						;; bbWeight=0.50 PerfScore 0.12
G_M50435_IG06:              ;; offset=0020H
       4883C428             add      rsp, 40
       C3                   ret      
						;; bbWeight=0.50 PerfScore 0.62
G_M50435_IG07:              ;; offset=0025H
       E8A6B6015F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3     
						;; bbWeight=0    PerfScore 0.00

; Total bytes of code 43, prolog size 4, PerfScore 16.05, instruction count 14, allocated bytes for code 43 (MethodHash=94d33afc) for method X:If():int:this


; Assembly listing for method X:Switch():int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;  V00 this         [V00,T01] (  3,  3   )     ref  ->  rcx         this class-hnd single-def
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;  V02 tmp1         [V02,T00] (  3,  6   )     ref  ->  rax         single-def "arr expr"
;
; Lcl frame size = 40

G_M11134_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
						;; bbWeight=1    PerfScore 0.25
G_M11134_IG02:              ;; offset=0004H
       488B4108             mov      rax, gword ptr [rcx+8]
       83780802             cmp      dword ptr [rax+8], 2
       7617                 jbe      SHORT G_M11134_IG07
       8078126C             cmp      byte  ptr [rax+18], 108
       750A                 jne      SHORT G_M11134_IG05
						;; bbWeight=1    PerfScore 10.00
G_M11134_IG03:              ;; offset=0014H
       B801000000           mov      eax, 1
						;; bbWeight=0.50 PerfScore 0.12
G_M11134_IG04:              ;; offset=0019H
       4883C428             add      rsp, 40
       C3                   ret      
						;; bbWeight=0.50 PerfScore 0.62
G_M11134_IG05:              ;; offset=001EH
       33C0                 xor      eax, eax
						;; bbWeight=0.50 PerfScore 0.12
G_M11134_IG06:              ;; offset=0020H
       4883C428             add      rsp, 40
       C3                   ret      
						;; bbWeight=0.50 PerfScore 0.62
G_M11134_IG07:              ;; offset=0025H
       E846B6015F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3     
						;; bbWeight=0    PerfScore 0.00

; Total bytes of code 43, prolog size 4, PerfScore 16.05, instruction count 14, allocated bytes for code 43 (MethodHash=0ec8d481) for method X:Switch():int:this

@ghost ghost locked as resolved and limited conversation to collaborators Feb 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI JitUntriaged CLR JIT issues needing additional triage optimization
Projects
None yet
Development

No branches or pull requests

5 participants