[ARM64/Linux] Inefficient conditionals branching #12735

TamarChristinaArm · 2019-05-22T11:04:50Z

The following

static int[] Test(int[] a, int n)
{
  int y;
  if (n == 3)
    y = n;
  else
    y = n + 1;
  a[2] = y;
  return a;
}

produces the following conditional (with QuickJIT off)

G_M32566_IG02:
        71000C5F          cmp     w2, dotnet/coreclr#3
        54000061          bne     G_M32566_IG03
        52800061          mov     w1, dotnet/coreclr#3
        14000002          b       G_M32566_IG04

so for both the if and else case it branches, it has no fall through case that doesn't
require the branch.

but also it has generated an extra mov here because it knows n is 3. However change the
condition to

static int[] Test(int[] a, int n)
{
  int y;
  if (n > 3)
    y = n;
  else
    y = n + 1;
  a[2] = y;
  return a;
}

and it now just re-uses the register containing n since it's dead after the if anyway

G_M32566_IG02:
        71000C5F          cmp     w2, dotnet/coreclr#3
        5400004D          ble     G_M32566_IG03
        14000002          b       G_M32566_IG04

G_M32566_IG03:
        11000442          add     w2, w2, dotnet/coreclr#1

G_M32566_IG04:
        52800041          mov     w1, dotnet/coreclr#2
        B9800803          ldrsw   x3, [x0,#8]
        6B03003F          cmp     w1, w3
        54000082          bhs     G_M32566_IG06
        B9001802          str     w2, [x0,#24]

So not quite sure why the second case generates smaller code.

But again the conditional can just be re-arranged to

G_M32566_IG02:
        71000C5F          cmp     w2, dotnet/coreclr#3
        14000002          bgt     G_M32566_IG04

and remove a branch.

/CC @CarolEidt @tannergooding

category:cq
theme:basic-cq
skill-level:intermediate
cost:medium

The text was updated successfully, but these errors were encountered:

mikedn · 2019-05-22T17:13:28Z

Not ARM64 specific, x64 generates:

G_M55886_IG01:
       4883EC28             sub      rsp, 40
       90                   nop
G_M55886_IG02:
       83FA03               cmp      edx, 3
       7507                 jne      SHORT G_M55886_IG03
       B803000000           mov      eax, 3
       EB03                 jmp      SHORT G_M55886_IG04
G_M55886_IG03:
       8D4201               lea      eax, [rdx+1]
G_M55886_IG04:
       83790802             cmp      dword ptr [rcx+8], 2
       760B                 jbe      SHORT G_M55886_IG06
       894118               mov      dword ptr [rcx+24], eax
       488BC1               mov      rax, rcx
G_M55886_IG05:
       4883C428             add      rsp, 40
       C3                   ret
G_M55886_IG06:
       E8169E385F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3

Here assertion propagation isn't very smart and substitutes n with 3 even if it's not beneficial in this case. And I can only guess that it wouldn't be that easy for LSRA to recover from this.

So not quite sure why the second case generates smaller code.

Because of the n > 3 condition that prevents assertion propagation to conclude that n is 3 in that block.

But again the conditional can just be re-arranged to

Yes but it's a bit complicated because this isn't obvious in the IR until after register allocation when you'd need to figure out that the store is useless because both variables have been assigned the same register and that the block is empty otherwise.

TamarChristinaArm · 2019-05-23T16:24:47Z

Yes but it's a bit complicated because this isn't obvious in the IR until after register allocation when you'd need to figure out that the store is useless because both variables have been assigned the same register and that the block is empty otherwise.

But the block doesn't have to be empty to remove the unconditional branch. The unconditional branch is always the last statement in the block, so if the edges were re-arranged such that the block you unconditionally branch to is the next BB after the one with the conditional you could just call through, or alternatively re-order the branch conditions, then you don't have to change the edges.

Unless i'm missing something can't both of these be done before regalloc just purely from the control flow of the IR?

mikedn · 2019-05-24T07:41:16Z

I'm not sure I understand. If neither block is empty then the only flow control transform that you can do here is branch inversion and that doesn't help. Perhaps you have in mind an additional IR transform?

int y;
if (n == 3)
    y = n;
else
    y = n + 1;

could become

int y = n;
if (n != 3)
    y = y + 1;

That would work but yes, the JIT doesn't do anything like this today. Though I happen to have an if conversion experiment that could be extended to handle this case. Maybe one day...

There's a rather similar issue I opened a long time ago - #4326.

TamarChristinaArm · 2019-05-24T16:43:04Z

I'm not sure I understand. If neither block is empty then the only flow control transform that you can do here is branch inversion and that doesn't help.

No that's correct, In my original post I had confused the cases where it generated the mov w1, 3. I mixed the two issues of why it had generated that mov and why in the second case the additional branch was needed.

You're right that if-conversion would certainly give the best code in such cases.

a74nh · 2022-12-06T11:14:27Z

With the latest HEAD, the code gets if converted, but the extra mov of 3 still exists:

  if (n == 3)
    y = n;
  else
    y = n + 1;

IN0001: 000008                    add     w2, w1, #1
IN0002: 00000C                    mov     w3, #3
IN0003: 000010                    cmp     w1, #3
IN0004: 000014                    csel    w1, w2, w3, ne

  if (n > 3)
    y = n;
  else
    y = n + 1;

IN0001: 000008                    add     w2, w1, #1
IN0002: 00000C                    cmp     w1, #3
IN0003: 000010                    csel    w1, w2, w1, le

Assertion propagation happens before the if conversion, so it makes sense that this still happens:

Because of the n > 3 condition that prevents assertion propagation to conclude that n is 3 in that block.

TIHan · 2023-04-18T23:20:15Z

I can confirm the extra mov w3, #3 still occurs.

I did some hacking and got the IfConversion phase to run earlier and indeed the mov w3, #3 disappears. To get the phase to run earlier I had to disable a lot of asserts just so I could see the results - so it's not viable to do at the moment.

Unfortunately, without introducing a new concept or doing IfConversion before AssertionProp, there isn't a safe way to do this.

Also, this pattern:

  if (n == 3) // including other relops
    y = n;
  else
    y = n + 1;

Can be reduced to:

cmp     w0, #3
cinc    w0, w0, <cond>

Related: #82031

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

kunalspathak mentioned this issue Mar 28, 2022

Improving ARM64 Performance in .NET 7.0 #64820

Closed

32 tasks

kunalspathak mentioned this issue Oct 13, 2022

Improving Arm64 Performance in .NET 8.0 #77010

Closed

28 tasks

kunalspathak assigned TIHan Apr 6, 2023

kunalspathak assigned a74nh and unassigned TIHan Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ARM64/Linux] Inefficient conditionals branching #12735

[ARM64/Linux] Inefficient conditionals branching #12735

TamarChristinaArm commented May 22, 2019

mikedn commented May 22, 2019

TamarChristinaArm commented May 23, 2019

mikedn commented May 24, 2019

TamarChristinaArm commented May 24, 2019

a74nh commented Dec 6, 2022

TIHan commented Apr 18, 2023 •

edited

Loading

[ARM64/Linux] Inefficient conditionals branching #12735

[ARM64/Linux] Inefficient conditionals branching #12735

Comments

TamarChristinaArm commented May 22, 2019

mikedn commented May 22, 2019

TamarChristinaArm commented May 23, 2019

mikedn commented May 24, 2019

TamarChristinaArm commented May 24, 2019

a74nh commented Dec 6, 2022

TIHan commented Apr 18, 2023 • edited Loading

TIHan commented Apr 18, 2023 •

edited

Loading