Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branchless code generation for ternaries #32632

Closed
lpereira opened this issue Feb 21, 2020 · 5 comments
Closed

Branchless code generation for ternaries #32632

lpereira opened this issue Feb 21, 2020 · 5 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@lpereira
Copy link
Contributor

lpereira commented Feb 21, 2020

Might be an opportunity to improve codegen for ternaries where both the expr-if-true and expr-if-false are constants.

I played a little bit with Compiler Explorer today to get a feeling of how GCC and Clang generate the code for this. Here's what I found from looking at the assembly output, with the accompanying C code for reference.

Method 1: sete/setne

int m1(int a) { return a == 5 ? 2 : 3; }

Is generated as, by both Clang and GCC:

xor eax, eax
cmp edi, 5  
setne al    
add eax, 2
ret

Playing a bit with Compiler Explorer, it seems that if both expressions in the ternary are apart by 1, then it can reuse the result of "cmp" as part of the expression. ("sete" is generated if the ternary is inverted, as expected: "a == 5 ? 3 : 2".)

Clang seems to go a bit further with this method before switching to use conditional moves as with "m3" below, by issuing an additional "add" instruction if values are apart by 2, for example.

I suspect a simplified version of this codegen could be used for things like Convert.ToInt32(bool b) too.

Method 2: lea

int m2(int a) { return a == 5 ? 10 : 1; }

Is generated as, by GCC:

xor eax, eax
cmp edi, 5
sete al
lea eax, [rax+1+rax*8]
ret

Clang generates similar code but changes the "+1" inside the LEA to an "add
eax, 1" before the "ret".

This is a more generic extension to the previous method, and seems to be preferred to the next method, when the operands would fit the constraints of the LEA instruction arguments.

Method 3: cmov

int m3(int a) { return a == 5 ? 1 : 37; }

Is generated as, by both Clang and GCC (order varies a bit):

cmp edi, 5 
mov edx, 37
mov eax, 1
cmovne eax, edx
ret

Conditional moves seems to be generated when the above conditions do not hold.

category:cq
theme:optimization
skill-level:expert
cost:medium

@lpereira lpereira added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 21, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Feb 21, 2020
@EgorBo
Copy link
Member

EgorBo commented Feb 21, 2020

cmove was discussed here: #6749

Regarding int m1(int a) { return a == 5 ? 2 : 3; }
Turns out it depends on profile data (PGO) in clang/LLVM, see https://godbolt.org/z/88UpqV

@EgorBo
Copy link
Member

EgorBo commented Feb 21, 2020

But might be quite easy to implement (the non-LEA one):

    [000003] ------------              *  JTRUE     void  
    [000002] ------------              \--*  EQ        int   
    [000000] ------------                 +--*  LCL_VAR   int    V00 arg0         
    [000001] ------------                 \--*  CNS_INT   int    5

    [000007] ------------              *  RETURN    int                       // or GT_ASG
    [000006] ------------              \--*  CNS_INT   int    3

    [000005] ------------              *  RETURN    int   
    [000004] ------------              \--*  CNS_INT   int    2

To

    [000007] ------------              *  ADD       int   
    [000005] ------------              +--*  EQ        int   
    [000000] ------------              |  +--*  LCL_VAR   int    V00 arg0         
    [000004] ------------              |  \--*  CNS_INT   int    5
    [000006] ------------              \--*  CNS_INT   int    2

Same for condition ? x : x + 1 (for non cns x)

@scalablecory
Copy link
Contributor

scalablecory commented Feb 27, 2020

Same for condition ? x : x + 1 (for non cns x)

Hopefully this same optimization could be applied if this were written as:

if(!condition) ++x;

@lpereira
Copy link
Contributor Author

lpereira commented Mar 4, 2020

Same for condition ? x : x + 1 (for non cns x)

Hopefully this same optimization could be applied if this were written as:

if(!condition) ++x;

These are different, right? The latter statement has an expression with side effects on x, whereas the former only evaluates x. But if the JIT could match this pattern and generate the code as if it were written as x += condition ? 0 : 1, though, then it would be indeed possible.

Do we have anything to query a source tree semantically, like Coccigrep can do in C?

@BruceForstall BruceForstall added this to the Future milestone Apr 4, 2020
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label Apr 4, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 25, 2020
@BruceForstall
Copy link
Member

In the absence of a compelling real-world code example, I'm going to close this.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

5 participants