Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvement - Ternary operator with constants #61480

Closed
hopperpl opened this issue Nov 11, 2021 · 7 comments
Closed

Performance Improvement - Ternary operator with constants #61480

hopperpl opened this issue Nov 11, 2021 · 7 comments
Labels
tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner

Comments

@hopperpl
Copy link

Description

Using the ternary operator with constants creates a lot of unnecessary branches and jumps. It can be factored into a common base and based on the boolean result, the difference can be multiplied with the condition state and then added -- or a "-1" mask (negated state) on the difference can be used.

clang, gcc, msvc all do that.

public int Value(bool cond) => cond ? c_ConstA : c_ConstB;

It's a common code pattern to return 2 different constants based on a flag, on a condition, etc. Like makeUpper ? 'A' : 'a'

public int Value(bool cond) => c_ConstA + cond * (c_ConstB - c_ConstA);
public int Value(bool cond) => c_ConstA + (-cond & (c_ConstB - c_ConstA));

Note: C# cannot easily convert a boolean into an integer (0/1) but the JIT will have no issue with that.

Configuration

.NET 6.0.0 (6.0.21.52210)

Regression?

No

Data

Analysis

    [Benchmark]
    public int A() => Cond ? 171 : 239;

    [Benchmark]
    public int B() => 171 + CondInt * (239 - 171);

    volatile bool Cond;
    volatile int  CondInt; // bool as int as there is no friction-free C# conversion

.NET 6.0.0 (6.0.21.52210), X64 RyuJIT

; Benchmark+Bench.A()
       7FFB738AD610 cmp       byte ptr [rcx+18],0
       7FFB738AD614 jne       short M00_L00
       7FFB738AD616 mov       eax,0EF
       7FFB738AD61B ret
M00_L00:
       7FFB738AD61C mov       eax,0AB
       7FFB738AD621 ret
; Total bytes of code 18

.NET 6.0.0 (6.0.21.52210), X64 RyuJIT

; Benchmark+Bench.B()
       7FFB7389C4E0 imul      eax,[rcx+14],44
       7FFB7389C4E4 add       eax,0AB
       7FFB7389C4E9 ret
; Total bytes of code 10

I did some benchmarking, but added a 1000-loop to reduce the call-frame jitter, and the difference is significant. And that doesn't even include branch misprediction which real-world code will have at around 50%. Here, misprediction is 0% as the code is too artificial and predictable. Nonetheless, it's twice as fast and with bad branch prediction it will be an order of magnitude when the 18-cycle penalty has an effect.

    [Benchmark]
    public bool A()
    {
      for (var nn = 0 ; nn < 1000 ; ++nn) Value = Cond ? 171 : 239;
      return true;
    }

    [Benchmark]
    public bool B()
    {
      for (var nn = 0 ; nn < 1000 ; ++nn) Value = 171 + CondI * (239 - 171);
      return true;
    }

    volatile int  Value;
    volatile bool Cond;
    volatile int  CondI;
| Method |     Mean |   Error |  StdDev | Code Size |
|------- |---------:|--------:|--------:|----------:|
|      A | 756.3 ns | 2.37 ns | 2.22 ns |      41 B |
|      B | 434.1 ns | 0.26 ns | 0.24 ns |      30 B |

@hopperpl hopperpl added the tenet-performance Performance related issue label Nov 11, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Nov 11, 2021
@EgorBo
Copy link
Member

EgorBo commented Nov 11, 2021

This essentially duplicates #32632 #55364 and a more general issue about CMOV #6749.

This whole class of optimizations is currently missing in jit, but we plan to eventually implement them (e.g. because of conditional arm64 instructions).

cond ? c_ConstA : c_ConstB;

might look simple, but in terms of IR this is 3 different basic-blocks, and all basic-blocks traversals are usually TP-expensive for JIT so the outcome of it should worth the TP regressions.

@hopperpl
Copy link
Author

I have this ternary on a lot of hot-paths that eat up enormous performance. And the core issue is the boolean/int cast to get rid of the jumps. Then I can make a helper function for constants that the JIT will inline.

Any idea how I can convert bool to integer (1/0) without any branches? (cond ? 1 : 0) doesn't work.

@EgorBo
Copy link
Member

EgorBo commented Nov 11, 2021

I have this ternary on a lot of hot-paths that eat up enormous performance. And the core issue is the boolean/int cast to get rid of the jumps. Then I can make a helper function for constants that the JIT will inline.

Any idea how I can convert bool to integer (1/0) without any branches? (cond ? 1 : 0) doesn't work.

bool b = Unsafe.As<int, bool>(ref myInt);

@GrabYourPitchforks
Copy link
Member

That Unsafe.As example code is dangerous. If you're converting int -> bool, it doesn't normalize the int to 1 / 0, nor does it account for endianness. If you're converting bool -> int, it could inadvertently read 3 bytes of memory beyond the bool you're trying to read, which may only take a single byte of memory.

There's tons of discussion on bool -> int conversions over at dotnet/coreclr#16138. I'd suggest reading through that issue first, since it will show you the benefits and pitfalls of various solutions.

@hopperpl
Copy link
Author

    [Benchmark]
    public bool C()
    {
      for (var nn = 0 ; nn < 1000 ; ++nn) Value = Ternary(Cond, 171, 239);
      return true;
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
    public int Ternary(bool cond, int left, int right) => left + (right - left) * Unsafe.As<bool, int>(ref cond);

that makes it a lot worse...

       7FFB7C7D9FA3 movzx     edx,byte ptr [rcx+18]
       7FFB7C7D9FA7 mov       [rsp+4],dl
       7FFB7C7D9FAB mov       edx,[rsp+4]
       7FFB7C7D9FAF movzx     edx,dl
       7FFB7C7D9FB2 imul      edx,44
       7FFB7C7D9FB5 add       edx,0AB

u32-read after an u8-write is a huge penalty. I assume the function inliner cannot get rid of the temporary stack-variable.


@GrabYourPitchforks ... yes, I'm aware of that. that's why I use a helper function to pull the value over a stack variable (or register) so it will always have 8 bytes. The code will only run on little-endian.

@hopperpl
Copy link
Author

hopperpl commented Nov 11, 2021

    [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
    public int Ternary(bool cond, int left, int right) => left + (right - left) * Unsafe.As<bool, byte>(ref cond);
       7FFB7C7E9FA2 movzx     edx,byte ptr [rcx+18]
       7FFB7C7E9FA6 imul      edx,44
       7FFB7C7E9FA9 add       edx,0AB
       7FFB7C7E9FAF mov       [rcx+10],edx

Should've thought about bool -> byte earlier. now it works beautifully.

Thanks for the help.

Method Mean Error StdDev Code Size
A 751.4 ns 1.69 ns 1.58 ns 41 B
B 429.9 ns 1.79 ns 1.59 ns 30 B
C 422.7 ns 1.94 ns 1.82 ns 33 B

@ghost ghost locked as resolved and limited conversation to collaborators Dec 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

3 participants